Jennifer Mammon
May 2016
"Overweight" is defined as having excess body weight for a particular height from fat, muscle, bone, water, or a combination of these factors. "Obesity" on the other hand, is defined as having excess body fat. In the past 30 years, childhood obesity has more than doubled in the US. This is a major problem especially for children, because children who are obese are more likely to be obese as adults as well. Being obese causes many major health problems, including: heart disease, certain types of cancer, diabetes, and the list goes on.
Physical activity and healthy eating help prevent obesity. I would like to see whether there is a relationship between a family's income and how healthy that family's children are most likely going to be. Are families who make more money more likely to live better, healthier lives? I am using the Federal Poverty Level (FPL) to measure income because FPL takes into account a family's income based on the number of people within a family. For example, 100% FPL indicates a family who is directly at the Federal Povery Level. For a family of four, 100% FPL would mean the family has an income of $24,250 per year.
Importing Packages
In [1]:
%matplotlib inline
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
I obtained the data from ChildHealthData and did a survey search of children aged 10-17 in all fifty states of the US and Washington DC. They are separated by column in underweight, healthy, overweight, and obese. I downloaded the excel files onto my computer and imported them into iPython Notebook.
An underweight child was classified as any child below the 5th percentile. A healthy weight is a child within the range of the 5th-84th percentile. An overweight child is one who lies in the range of 85th-94th percentile. And an obese child is one who weighs in above the 95th percentile.
In [2]:
file_path = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/basicstats.xls"
df = pd.read_excel(file_path)
df1 = df.set_index("Region")
df2 = df1.rename(columns ={"Underweight (less than 5th percentile) %":"Under_Weight", "Healthy weight (5th to 84th percentile) %": "Healthy_Weight", "Overweight (85th to 94th percentile) %":"Over_weight","Obese (95th percentile or above) %": "Obese"})
df2.head(5)
Out[2]:
In [3]:
df2["Healthy_Weight"].mean()
Out[3]:
On average, 63.86% of children are in the range considered to be a healthy weight.
Now I would like to see how far away children are from this average if they are from low or high income families.
The following chart shows the percentage of children who are a healthy weight according to their household income.
In [4]:
file = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Incomestats.xls"
healthy = pd.read_excel(file)
healthy = healthy.rename(columns ={"State":"Region"})
healthy = healthy.set_index("Region")
healthy.head(5)
Out[4]:
The following chart shows the percentage of children who are overweight according to their family's income level.
In [5]:
newfile= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Overweight.xls"
over= pd.read_excel(newfile)
over= over.set_index("Region")
over.head(5)
Out[5]:
The following chart shows the percentage of children who are obese according to their family's income level.
In [6]:
files= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Obese.xls"
obese= pd.read_excel(files)
obese=obese.set_index("Region")
obese.head(5)
Out[6]:
The following graph shows children's weight status across all states. We can see that majority of children in the US are of a healthy weight. Few children are underweight and more are either over weight or obese.
In [7]:
fig, ax = plt.subplots(figsize=(10,5))
df2.plot(ax=ax)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')
Out[7]:
The following graph shows the percentage of children in each state based on their health status. We can see that a majority of the children are of a healthy weight status nationwide.
In [8]:
fig, ax = plt.subplots(figsize=(10,5))
df3= df2.T
df3.plot(ax=ax,
legend=False)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')
df3["Mississippi"].plot(linewidth=7)
df3["Colorado"].plot(linewidth=7)
Out[8]:
Colorado, represented by the turquoise line, has one of the highest percentage of healthy children. On the other hand, Mississippi, represented by the red line, has the lowest percentage of healthy children and the highest percentage of obese children.
The following graph shows the frequency of obesity in America. The left side of the graph indicates a lower percentage of obese children, whereas the right side represents the most obese. We see that the turquoise, representing the highest income bracket, is completely on the left side of the graph. Inversely, the dark blue, representing the lowest income bracket, takes up the right side of the graph. Therefore, we can conclude a direct relationship between income and obesity; the higher the income, the lower the prevalence of obesity.
In [9]:
fig, ax=plt.subplots()
obese.plot(ax=ax,kind='hist', bins=50, figsize=(10,10))
ax.set_title("Income:Obesity", fontsize=16)
ax.set_ylabel("Number of States", fontsize=13)
ax.set_xlabel("Percentage of Obsese Population", fontsize=13)
Out[9]:
The following graph further proves the correlation between income and obesity. The middle line shows the total percentage of obesity in each state for all levels of income. The line representing low income lies completely above this line and the line representing high income, lies completely below this line.
In [17]:
fig,ax=plt.subplots()
obese["0 - 99% FPL %"].plot()
obese["400% FPL or higher %"].plot()
df2["Obese"].plot()
ax.legend(["Low Income", "High Income", "All Incomes"])
ax.set_title("% Obese According to Income in each State", fontsize=15)
ax.set_ylabel("Percentage of Children Obese", fontsize=13)
ax.set_xlabel(" ")
Out[17]:
One reason I think contributes to the higher percentage of obesity among lower income families, is the rising prices of healthier foods. At the University of Cambridge, a study was done to see how much costs of food have risen, comparing 2002 to 2012. I found the study at Unviersity of Cambridge Research
In [11]:
foodfile= "/Users/Jennifer/Desktop/HealthyNot.xls"
food=pd.read_excel(foodfile)
food = food.rename(columns={"2002.1":"2002", "2012.1":"2012"})
food
Out[11]:
Looking at the dataset, we can immediatly notice that the foods considered "more healthy" have risen a lot in price. Meanwhile, the less healthy foods have barely increased, with the price of frozen pizza actually decreasing from 2002 to 2012.
In conclusion, the data shows a high correlation between low income and high obesity levels. Lower income families are at much higher risk of having obese children compared to high income families. One potential factor causing the high obesity levels could be the rising costs of healthy foods. Some other factors could include the high cost of recreational activities and a lack of nutritional education.
In [ ]: