We are reading the USDA National Nutrient Database as created by this project (https://github.com/CraigKelly/nndb-import). The original data can be found at http://www.ars.usda.gov/Services/docs.htm?docid=8964
The US RDA columns are created using data from Dietary Reference Intakes: The Essential Guide to Nutrient Requirements available from the National Academies Press at http://www.nap.edu/catalog/11537.html
Note every RDA value in the text was used. For instance, the US RDA for Iron varies significantly by age and sex, so I deemed it easier to just leave out RDA information for Iron.
And finally, every effort has been made to keep this data as clean and accurate as possible, but no medical warranty, advice, etc should be construed. This was collected by a computer scientist who might have made an error that would not be immediately apparent. Caveat emptor!
Important! The mongo access below assumes that the data was loaded in to a
Docker container named mainmongo, and that this notebook is running in a Jupyter
container that was linked (via --link
on the command line). If you are running
MongoDB and Jupyter "natively", you should be able to change 'mainmongo' to
'localhost' in the code below. Also note that you should have already installed
pandas.
In [1]:
!pip install pymongo
import pandas as pd
import pymongo
In [2]:
COLUMNS = [
'ID', 'FoodGroup', 'ShortDescrip', 'Descrip', 'CommonName', 'MfgName', 'ScientificName',
'Energy_kcal', 'Protein_g', 'Fat_g', 'Carb_g', 'Sugar_g', 'Fiber_g',
'VitA_mcg', 'VitB6_mg', 'VitB12_mcg', 'VitC_mg', 'VitE_mg', 'Folate_mcg',
'Niacin_mg', 'Riboflavin_mg', 'Thiamin_mg', 'Calcium_mg', 'Copper_mcg',
'Iron_mg', 'Magnesium_mg', 'Manganese_mg', 'Phosphorus_mg', 'Selenium_mcg',
'Zinc_mg',
]
def flat(r):
nutr = dict([
( int(n['nutrient_id']), n['nutrient_val'] )
for n in r.get('nutrients', [])
])
def n(i):
return float(nutr.get(i, 0.0))
return {
'ID': r['ndb_num'],
'FoodGroup': r['food_group_descrip'],
'ShortDescrip': r['short_descrip'],
'Descrip': r['descrip'],
'CommonName': r['common_name'],
'MfgName': r['mfg_name'],
'ScientificName': r['scientific_name'],
'Energy_kcal': n(208),
'Protein_g': n(203),
'Fat_g': n(204),
'Carb_g': n(205),
'Sugar_g': n(269),
'Fiber_g': n(291),
'VitA_mcg': n(320),
'VitB6_mg': n(415),
'VitB12_mcg': n(418),
'VitC_mg': n(401),
'VitE_mg': n(323),
'Folate_mcg': n(435),
'Niacin_mg': n(406),
'Riboflavin_mg': n(405),
'Thiamin_mg': n(404),
'Calcium_mg': n(301),
'Copper_mcg': n(312),
'Iron_mg': n(303),
'Magnesium_mg': n(304),
'Manganese_mg': n(315),
'Phosphorus_mg': n(305),
'Selenium_mcg': n(317),
'Zinc_mg': n(309),
}
data = pd.DataFrame.from_records(
(flat(r) for r in pymongo.MongoClient('mainmongo', 27017).nutrition.nndb.find()),
columns=COLUMNS
)
In [3]:
data['VitA_USRDA'] = data['VitA_mcg'] / 900.0
data['VitB6_USRDA'] = data['VitB6_mg'] / 1.7
data['VitB12_USRDA'] = data['VitB12_mcg'] / 2.4
data['VitC_USRDA'] = data['VitC_mg'] / 90.0
data['VitE_USRDA'] = data['VitE_mg'] / 15.0
data['Folate_USRDA'] = data['Folate_mcg'] / 400.0
data['Niacin_USRDA'] = data['Niacin_mg'] / 16.0
data['Riboflavin_USRDA'] = data['Riboflavin_mg'] / 1.3
data['Thiamin_USRDA'] = data['Thiamin_mg'] / 1.2
data['Calcium_USRDA'] = data['Calcium_mg'] / 1200.0
data['Copper_USRDA'] = data['Copper_mcg'] / 900.0
data['Magnesium_USRDA'] = data['Magnesium_mg'] / 420.0
data['Phosphorus_USRDA'] = data['Phosphorus_mg'] / 700.0
data['Selenium_USRDA'] = data['Selenium_mcg'] / 55.0
data['Zinc_USRDA'] = data['Zinc_mg'] / 11.0
In [4]:
data.describe()
Out[4]:
In [7]:
data.to_csv('./nndb_flat.csv', index=False)