pandas and ticdat

You can convert an object from normal TicDat dict-of-dict representations to a pandas.DataFrame and back again.

This notebook does a quick demo of both, with a bit more detail on the latter.

We assume diet.py and diet_sample_data.sql are in this directory. These files are easy to find if you want to copy them over and reproduce this notebook.


In [1]:
from diet import input_schema
dat = input_schema.sql.create_tic_dat_from_sql("diet_sample_data.sql")

In [2]:
dat.nutrition_quantities


Out[2]:
{(u'chicken', u'calories'): _td:{'Quantity': 420},
 (u'chicken', u'fat'): _td:{'Quantity': 10},
 (u'chicken', u'protein'): _td:{'Quantity': 32},
 (u'chicken', u'sodium'): _td:{'Quantity': 1190},
 (u'fries', u'calories'): _td:{'Quantity': 380},
 (u'fries', u'fat'): _td:{'Quantity': 19},
 (u'fries', u'protein'): _td:{'Quantity': 4},
 (u'fries', u'sodium'): _td:{'Quantity': 270},
 (u'hamburger', u'calories'): _td:{'Quantity': 410},
 (u'hamburger', u'fat'): _td:{'Quantity': 26},
 (u'hamburger', u'protein'): _td:{'Quantity': 24},
 (u'hamburger', u'sodium'): _td:{'Quantity': 730},
 (u'hot dog', u'calories'): _td:{'Quantity': 560},
 (u'hot dog', u'fat'): _td:{'Quantity': 32},
 (u'hot dog', u'protein'): _td:{'Quantity': 20},
 (u'hot dog', u'sodium'): _td:{'Quantity': 1800},
 (u'ice cream', u'calories'): _td:{'Quantity': 330},
 (u'ice cream', u'fat'): _td:{'Quantity': 10},
 (u'ice cream', u'protein'): _td:{'Quantity': 8},
 (u'ice cream', u'sodium'): _td:{'Quantity': 180},
 (u'macaroni', u'calories'): _td:{'Quantity': 320},
 (u'macaroni', u'fat'): _td:{'Quantity': 10},
 (u'macaroni', u'protein'): _td:{'Quantity': 12},
 (u'macaroni', u'sodium'): _td:{'Quantity': 930},
 (u'milk', u'calories'): _td:{'Quantity': 100},
 (u'milk', u'fat'): _td:{'Quantity': 2.5},
 (u'milk', u'protein'): _td:{'Quantity': 8},
 (u'milk', u'sodium'): _td:{'Quantity': 125},
 (u'pizza', u'calories'): _td:{'Quantity': 320},
 (u'pizza', u'fat'): _td:{'Quantity': 12},
 (u'pizza', u'protein'): _td:{'Quantity': 15},
 (u'pizza', u'sodium'): _td:{'Quantity': 820},
 (u'salad', u'calories'): _td:{'Quantity': 320},
 (u'salad', u'fat'): _td:{'Quantity': 12},
 (u'salad', u'protein'): _td:{'Quantity': 31},
 (u'salad', u'sodium'): _td:{'Quantity': 1230}}

In [3]:
pan_dat = input_schema.copy_to_pandas(dat)

In [4]:
pan_dat.nutrition_quantities


Out[4]:
Quantity
Food Category
chicken calories 420.0
fat 10.0
protein 32.0
sodium 1190.0
fries calories 380.0
fat 19.0
protein 4.0
sodium 270.0
hamburger calories 410.0
fat 26.0
protein 24.0
sodium 730.0
hot dog calories 560.0
fat 32.0
protein 20.0
sodium 1800.0
ice cream calories 330.0
fat 10.0
protein 8.0
sodium 180.0
macaroni calories 320.0
fat 10.0
protein 12.0
sodium 930.0
milk calories 100.0
fat 2.5
protein 8.0
sodium 125.0
pizza calories 320.0
fat 12.0
protein 15.0
sodium 820.0
salad calories 320.0
fat 12.0
protein 31.0
sodium 1230.0

In [5]:
{pan_dat.nutrition_quantities.__class__, pan_dat.categories.__class__, pan_dat.foods.__class__}


Out[5]:
{pandas.core.frame.DataFrame}

Now going the other way, convering DataFrame back into TicDat objects. Since pan_dat just has a collection of DataFrames attached to it, this is easy to show.


In [6]:
dat2 = input_schema.TicDat(foods = pan_dat.foods, categories = pan_dat.categories, 
                           nutrition_quantities = pan_dat.nutrition_quantities)

In [7]:
dat2.nutrition_quantities


Out[7]:
{(u'chicken', u'calories'): _td:{'Quantity': 420.0},
 (u'chicken', u'fat'): _td:{'Quantity': 10.0},
 (u'chicken', u'protein'): _td:{'Quantity': 32.0},
 (u'chicken', u'sodium'): _td:{'Quantity': 1190.0},
 (u'fries', u'calories'): _td:{'Quantity': 380.0},
 (u'fries', u'fat'): _td:{'Quantity': 19.0},
 (u'fries', u'protein'): _td:{'Quantity': 4.0},
 (u'fries', u'sodium'): _td:{'Quantity': 270.0},
 (u'hamburger', u'calories'): _td:{'Quantity': 410.0},
 (u'hamburger', u'fat'): _td:{'Quantity': 26.0},
 (u'hamburger', u'protein'): _td:{'Quantity': 24.0},
 (u'hamburger', u'sodium'): _td:{'Quantity': 730.0},
 (u'hot dog', u'calories'): _td:{'Quantity': 560.0},
 (u'hot dog', u'fat'): _td:{'Quantity': 32.0},
 (u'hot dog', u'protein'): _td:{'Quantity': 20.0},
 (u'hot dog', u'sodium'): _td:{'Quantity': 1800.0},
 (u'ice cream', u'calories'): _td:{'Quantity': 330.0},
 (u'ice cream', u'fat'): _td:{'Quantity': 10.0},
 (u'ice cream', u'protein'): _td:{'Quantity': 8.0},
 (u'ice cream', u'sodium'): _td:{'Quantity': 180.0},
 (u'macaroni', u'calories'): _td:{'Quantity': 320.0},
 (u'macaroni', u'fat'): _td:{'Quantity': 10.0},
 (u'macaroni', u'protein'): _td:{'Quantity': 12.0},
 (u'macaroni', u'sodium'): _td:{'Quantity': 930.0},
 (u'milk', u'calories'): _td:{'Quantity': 100.0},
 (u'milk', u'fat'): _td:{'Quantity': 2.5},
 (u'milk', u'protein'): _td:{'Quantity': 8.0},
 (u'milk', u'sodium'): _td:{'Quantity': 125.0},
 (u'pizza', u'calories'): _td:{'Quantity': 320.0},
 (u'pizza', u'fat'): _td:{'Quantity': 12.0},
 (u'pizza', u'protein'): _td:{'Quantity': 15.0},
 (u'pizza', u'sodium'): _td:{'Quantity': 820.0},
 (u'salad', u'calories'): _td:{'Quantity': 320.0},
 (u'salad', u'fat'): _td:{'Quantity': 12.0},
 (u'salad', u'protein'): _td:{'Quantity': 31.0},
 (u'salad', u'sodium'): _td:{'Quantity': 1230.0}}

In [8]:
input_schema._same_data(dat, dat2)


Out[8]:
True

That said, if you drop the indicies then things don't work, so be sure to set DataFrame indicies correctly.


In [9]:
df = pan_dat.nutrition_quantities.reset_index(drop=False)
df


Out[9]:
Food Category Quantity
0 chicken calories 420.0
1 chicken fat 10.0
2 chicken protein 32.0
3 chicken sodium 1190.0
4 fries calories 380.0
5 fries fat 19.0
6 fries protein 4.0
7 fries sodium 270.0
8 hamburger calories 410.0
9 hamburger fat 26.0
10 hamburger protein 24.0
11 hamburger sodium 730.0
12 hot dog calories 560.0
13 hot dog fat 32.0
14 hot dog protein 20.0
15 hot dog sodium 1800.0
16 ice cream calories 330.0
17 ice cream fat 10.0
18 ice cream protein 8.0
19 ice cream sodium 180.0
20 macaroni calories 320.0
21 macaroni fat 10.0
22 macaroni protein 12.0
23 macaroni sodium 930.0
24 milk calories 100.0
25 milk fat 2.5
26 milk protein 8.0
27 milk sodium 125.0
28 pizza calories 320.0
29 pizza fat 12.0
30 pizza protein 15.0
31 pizza sodium 820.0
32 salad calories 320.0
33 salad fat 12.0
34 salad protein 31.0
35 salad sodium 1230.0

So, well this might appear to be ok, it does need the index for the current ticdat implementation.


In [10]:
input_schema.TicDat(foods = pan_dat.foods, categories = pan_dat.categories, 
                           nutrition_quantities = df)


---------------------------------------------------------------------------
TicDatError                               Traceback (most recent call last)
<ipython-input-10-fc88d19fa5eb> in <module>()
      1 input_schema.TicDat(foods = pan_dat.foods, categories = pan_dat.categories, 
----> 2                            nutrition_quantities = df)

/Users/petercacioppi/src/opalytics-ticdat/ticdat/ticdatfactory.pyc in __init__(self, **init_tables)
    593                     if not (goodticdattable(v, t, lambda x : badticdattable.append(x))) :
    594                         raise utils.TicDatError(t + " cannot be treated as a ticDat table : " +
--> 595                                                 badticdattable[-1])
    596                     if pd and isinstance(v, pd.Series):
    597                         v = DataFrame(v)

TicDatError: nutrition_quantities cannot be treated as a ticDat table : Could not find a pandas index matching the primary key for nutrition_quantities