Are income and Childhood Obesity Correlated?

Jennifer Mammon

May 2016

Childhood Obesity in the USA

"Overweight" is defined as having excess body weight for a particular height from fat, muscle, bone, water, or a combination of these factors. "Obesity" on the other hand, is defined as having excess body fat. In the past 30 years, childhood obesity has more than doubled in the US. This is a major problem especially for children, because children who are obese are more likely to be obese as adults as well. Being obese causes many major health problems, including: heart disease, certain types of cancer, diabetes, and the list goes on.

Physical activity and healthy eating help prevent obesity. I would like to see whether there is a relationship between a family's income and how healthy that family's children are most likely going to be. Are families who make more money more likely to live better, healthier lives? I am using the Federal Poverty Level (FPL) to measure income because FPL takes into account a family's income based on the number of people within a family. For example, 100% FPL indicates a family who is directly at the Federal Povery Level. For a family of four, 100% FPL would mean the family has an income of $24,250 per year.

Importing Packages


In [1]:
%matplotlib inline
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt


/Users/Jennifer/anaconda/lib/python3.5/site-packages/pandas/io/data.py:33: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)

The Data Set

I obtained the data from ChildHealthData and did a survey search of children aged 10-17 in all fifty states of the US and Washington DC. They are separated by column in underweight, healthy, overweight, and obese. I downloaded the excel files onto my computer and imported them into iPython Notebook.

An underweight child was classified as any child below the 5th percentile. A healthy weight is a child within the range of the 5th-84th percentile. An overweight child is one who lies in the range of 85th-94th percentile. And an obese child is one who weighs in above the 95th percentile.


In [2]:
file_path = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/basicstats.xls"
df = pd.read_excel(file_path)
df1 = df.set_index("Region")
df2 = df1.rename(columns ={"Underweight (less than 5th percentile) %":"Under_Weight", "Healthy weight (5th to 84th percentile) %": "Healthy_Weight", "Overweight (85th to 94th percentile) %":"Over_weight","Obese (95th percentile or above) %": "Obese"})
df2.head(5)


Out[2]:
Under_Weight Healthy_Weight Over_weight Obese
Region
Alabama 4.0 59.9 18.2 17.9
Alaska 4.9 61.2 19.8 14.1
Arizona 7.7 61.7 12.7 17.8
Arkansas 4.3 58.2 17.1 20.4
California 5.4 64.1 15.5 15.0

In [3]:
df2["Healthy_Weight"].mean()


Out[3]:
63.86078431372548

On average, 63.86% of children are in the range considered to be a healthy weight.

Now I would like to see how far away children are from this average if they are from low or high income families.

The following chart shows the percentage of children who are a healthy weight according to their household income.


In [4]:
file = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Incomestats.xls"
healthy = pd.read_excel(file)
healthy = healthy.rename(columns ={"State":"Region"})
healthy = healthy.set_index("Region")
healthy.head(5)


Out[4]:
0 - 99% FPL % 100 - 199% FPL % 200 - 399% FPL % 400% FPL or higher %
Region
Alabama 60.0 51.3 61.0 65.1
Alaska 59.2 55.4 61.3 68.3
Arizona 34.5 53.4 65.9 80.0
Arkansas 47.3 57.2 60.7 65.6
California 55.0 61.8 60.1 72.6

The following chart shows the percentage of children who are overweight according to their family's income level.


In [5]:
newfile= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Overweight.xls"
over= pd.read_excel(newfile)
over= over.set_index("Region")
over.head(5)


Out[5]:
0 - 99% FPL % 100 - 199% FPL % 200 - 399% FPL % 400% FPL or higher %
Region
Alabama 15.5 24.0 20.2 13.6
Alaska 17.9 21.0 20.0 19.5
Arizona 12.8 16.1 13.9 8.4
Arkansas 19.5 17.9 14.6 17.7
California 26.2 13.3 19.4 9.2

The following chart shows the percentage of children who are obese according to their family's income level.


In [6]:
files= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Obese.xls"
obese= pd.read_excel(files)
obese=obese.set_index("Region")
obese.head(5)


Out[6]:
0 - 99% FPL % 100 - 199% FPL % 200 - 399% FPL % 400% FPL or higher %
Region
Alabama 21.2 22.3 15.5 14.9
Alaska 14.2 19.7 14.4 8.0
Arizona 40.5 23.5 12.7 5.7
Arkansas 31.1 22.1 18.7 11.2
California 18.2 19.6 14.1 12.0

The following graph shows children's weight status across all states. We can see that majority of children in the US are of a healthy weight. Few children are underweight and more are either over weight or obese.


In [7]:
fig, ax = plt.subplots(figsize=(10,5))
df2.plot(ax=ax)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')


Out[7]:
<matplotlib.text.Text at 0x1083bee80>

The following graph shows the percentage of children in each state based on their health status. We can see that a majority of the children are of a healthy weight status nationwide.


In [8]:
fig, ax = plt.subplots(figsize=(10,5))
df3= df2.T
df3.plot(ax=ax,
        legend=False)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')
df3["Mississippi"].plot(linewidth=7)
df3["Colorado"].plot(linewidth=7)


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x108429860>

Colorado, represented by the turquoise line, has one of the highest percentage of healthy children. On the other hand, Mississippi, represented by the red line, has the lowest percentage of healthy children and the highest percentage of obese children.

The following graph shows the frequency of obesity in America. The left side of the graph indicates a lower percentage of obese children, whereas the right side represents the most obese. We see that the turquoise, representing the highest income bracket, is completely on the left side of the graph. Inversely, the dark blue, representing the lowest income bracket, takes up the right side of the graph. Therefore, we can conclude a direct relationship between income and obesity; the higher the income, the lower the prevalence of obesity.


In [9]:
fig, ax=plt.subplots()
obese.plot(ax=ax,kind='hist', bins=50, figsize=(10,10))
ax.set_title("Income:Obesity", fontsize=16)
ax.set_ylabel("Number of States", fontsize=13)
ax.set_xlabel("Percentage of Obsese Population", fontsize=13)


Out[9]:
<matplotlib.text.Text at 0x108473470>

The following graph further proves the correlation between income and obesity. The middle line shows the total percentage of obesity in each state for all levels of income. The line representing low income lies completely above this line and the line representing high income, lies completely below this line.


In [17]:
fig,ax=plt.subplots()
obese["0 - 99% FPL %"].plot()
obese["400% FPL or higher %"].plot()
df2["Obese"].plot()
ax.legend(["Low Income", "High Income", "All Incomes"])
ax.set_title("% Obese According to Income in each State", fontsize=15)
ax.set_ylabel("Percentage of Children Obese", fontsize=13)
ax.set_xlabel(" ")


Out[17]:
<matplotlib.text.Text at 0x109c9ea20>

One reason I think contributes to the higher percentage of obesity among lower income families, is the rising prices of healthier foods. At the University of Cambridge, a study was done to see how much costs of food have risen, comparing 2002 to 2012. I found the study at Unviersity of Cambridge Research


In [11]:
foodfile= "/Users/Jennifer/Desktop/HealthyNot.xls"
food=pd.read_excel(foodfile)
food = food.rename(columns={"2002.1":"2002", "2012.1":"2012"})
food


Out[11]:
More healthy 2002 2012 Less healthy 2002 2012
0 Tinned tomatoes £4.71 £9.60 Frozen pizza £2.10 £1.58
1 Baked beans £1.05 £2.05 Ice cream £1.50 £1.57
2 Semi-skimmed milk £1.07 £1.73 Pork sausages £1.90 £2.69

Looking at the dataset, we can immediatly notice that the foods considered "more healthy" have risen a lot in price. Meanwhile, the less healthy foods have barely increased, with the price of frozen pizza actually decreasing from 2002 to 2012.

Conclusion

In conclusion, the data shows a high correlation between low income and high obesity levels. Lower income families are at much higher risk of having obese children compared to high income families. One potential factor causing the high obesity levels could be the rising costs of healthy foods. Some other factors could include the high cost of recreational activities and a lack of nutritional education.


In [ ]: