Are income and Childhood Obesity Correlated?

Jennifer Mammon

May 2016

Childhood Obesity in the USA

"Overweight" is defined as having excess body weight for a particular height from fat, muscle, bone, water, or a combination of these factors. "Obesity" on the other hand, is defined as having excess body fat. In the past 30 years, childhood obesity has more than doubled in the US. This is a major problem especially for children, because children who are obese are more likely to be obese as adults as well. Being obese causes many major health problems, including: heart disease, certain types of cancer, diabetes, and the list goes on.

Physical activity and healthy eating help prevent obesity. I would like to see whether there is a relationship between a family's income and how healthy that family's children are most likely going to be. Are families who make more money more likely to live better, healthier lives? I am using the Federal Poverty Level (FPL) to measure income because FPL takes into account a family's income based on the number of people within a family. For example, 100% FPL indicates a family who is directly at the Federal Povery Level. For a family of four, 100% FPL would mean the family has an income of $24,250 per year.

Importing Packages



In [1]:

    
%matplotlib inline
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt









    



/Users/Jennifer/anaconda/lib/python3.5/site-packages/pandas/io/data.py:33: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)

The Data Set

I obtained the data from ChildHealthData and did a survey search of children aged 10-17 in all fifty states of the US and Washington DC. They are separated by column in underweight, healthy, overweight, and obese. I downloaded the excel files onto my computer and imported them into iPython Notebook.

An underweight child was classified as any child below the 5th percentile. A healthy weight is a child within the range of the 5th-84th percentile. An overweight child is one who lies in the range of 85th-94th percentile. And an obese child is one who weighs in above the 95th percentile.



In [2]:

    
file_path = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/basicstats.xls"
df = pd.read_excel(file_path)
df1 = df.set_index("Region")
df2 = df1.rename(columns ={"Underweight (less than 5th percentile) %":"Under_Weight", "Healthy weight (5th to 84th percentile) %": "Healthy_Weight", "Overweight (85th to 94th percentile) %":"Over_weight","Obese (95th percentile or above) %": "Obese"})
df2.head(5)









    Out[2]:






  
    
      
      Under_Weight
      Healthy_Weight
      Over_weight
      Obese
    
    
      Region
      
      
      
      
    
  
  
    
      Alabama
      4.0
      59.9
      18.2
      17.9
    
    
      Alaska
      4.9
      61.2
      19.8
      14.1
    
    
      Arizona
      7.7
      61.7
      12.7
      17.8
    
    
      Arkansas
      4.3
      58.2
      17.1
      20.4
    
    
      California
      5.4
      64.1
      15.5
      15.0



In [3]:

    
df2["Healthy_Weight"].mean()









    Out[3]:





63.86078431372548

On average, 63.86% of children are in the range considered to be a healthy weight.

Now I would like to see how far away children are from this average if they are from low or high income families.

The following chart shows the percentage of children who are a healthy weight according to their household income.



In [4]:

    
file = "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Incomestats.xls"
healthy = pd.read_excel(file)
healthy = healthy.rename(columns ={"State":"Region"})
healthy = healthy.set_index("Region")
healthy.head(5)









    Out[4]:






  
    
      
      0 - 99% FPL %
      100 - 199% FPL %
      200 - 399% FPL %
      400% FPL or higher %
    
    
      Region
      
      
      
      
    
  
  
    
      Alabama
      60.0
      51.3
      61.0
      65.1
    
    
      Alaska
      59.2
      55.4
      61.3
      68.3
    
    
      Arizona
      34.5
      53.4
      65.9
      80.0
    
    
      Arkansas
      47.3
      57.2
      60.7
      65.6
    
    
      California
      55.0
      61.8
      60.1
      72.6

The following chart shows the percentage of children who are overweight according to their family's income level.



In [5]:

    
newfile= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Overweight.xls"
over= pd.read_excel(newfile)
over= over.set_index("Region")
over.head(5)









    Out[5]:






  
    
      
      0 - 99% FPL %
      100 - 199% FPL %
      200 - 399% FPL %
      400% FPL or higher %
    
    
      Region
      
      
      
      
    
  
  
    
      Alabama
      15.5
      24.0
      20.2
      13.6
    
    
      Alaska
      17.9
      21.0
      20.0
      19.5
    
    
      Arizona
      12.8
      16.1
      13.9
      8.4
    
    
      Arkansas
      19.5
      17.9
      14.6
      17.7
    
    
      California
      26.2
      13.3
      19.4
      9.2

The following chart shows the percentage of children who are obese according to their family's income level.



In [6]:

    
files= "/Users/Jennifer/Desktop/Freshmen Spring Semester/Data_Bootcamp/PROJECT/Obesity Project/Obese.xls"
obese= pd.read_excel(files)
obese=obese.set_index("Region")
obese.head(5)









    Out[6]:






  
    
      
      0 - 99% FPL %
      100 - 199% FPL %
      200 - 399% FPL %
      400% FPL or higher %
    
    
      Region
      
      
      
      
    
  
  
    
      Alabama
      21.2
      22.3
      15.5
      14.9
    
    
      Alaska
      14.2
      19.7
      14.4
      8.0
    
    
      Arizona
      40.5
      23.5
      12.7
      5.7
    
    
      Arkansas
      31.1
      22.1
      18.7
      11.2
    
    
      California
      18.2
      19.6
      14.1
      12.0

The following graph shows children's weight status across all states. We can see that majority of children in the US are of a healthy weight. Few children are underweight and more are either over weight or obese.



In [7]:

    
fig, ax = plt.subplots(figsize=(10,5))
df2.plot(ax=ax)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')









    Out[7]:





<matplotlib.text.Text at 0x1083bee80>

The following graph shows the percentage of children in each state based on their health status. We can see that a majority of the children are of a healthy weight status nationwide.



In [8]:

    
fig, ax = plt.subplots(figsize=(10,5))
df3= df2.T
df3.plot(ax=ax,
        legend=False)
ax.set_ylabel('Percent of Children', fontsize=12)
ax.set_xlabel('')
ax.set_title('Childrens Weight Status', fontsize=16, loc='center')
df3["Mississippi"].plot(linewidth=7)
df3["Colorado"].plot(linewidth=7)









    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x108429860>

Colorado, represented by the turquoise line, has one of the highest percentage of healthy children. On the other hand, Mississippi, represented by the red line, has the lowest percentage of healthy children and the highest percentage of obese children.

The following graph shows the frequency of obesity in America. The left side of the graph indicates a lower percentage of obese children, whereas the right side represents the most obese. We see that the turquoise, representing the highest income bracket, is completely on the left side of the graph. Inversely, the dark blue, representing the lowest income bracket, takes up the right side of the graph. Therefore, we can conclude a direct relationship between income and obesity; the higher the income, the lower the prevalence of obesity.



In [9]:

    
fig, ax=plt.subplots()
obese.plot(ax=ax,kind='hist', bins=50, figsize=(10,10))
ax.set_title("Income:Obesity", fontsize=16)
ax.set_ylabel("Number of States", fontsize=13)
ax.set_xlabel("Percentage of Obsese Population", fontsize=13)









    Out[9]:





<matplotlib.text.Text at 0x108473470>

The following graph further proves the correlation between income and obesity. The middle line shows the total percentage of obesity in each state for all levels of income. The line representing low income lies completely above this line and the line representing high income, lies completely below this line.



In [17]:

    
fig,ax=plt.subplots()
obese["0 - 99% FPL %"].plot()
obese["400% FPL or higher %"].plot()
df2["Obese"].plot()
ax.legend(["Low Income", "High Income", "All Incomes"])
ax.set_title("% Obese According to Income in each State", fontsize=15)
ax.set_ylabel("Percentage of Children Obese", fontsize=13)
ax.set_xlabel(" ")









    Out[17]:





<matplotlib.text.Text at 0x109c9ea20>

One reason I think contributes to the higher percentage of obesity among lower income families, is the rising prices of healthier foods. At the University of Cambridge, a study was done to see how much costs of food have risen, comparing 2002 to 2012. I found the study at Unviersity of Cambridge Research



In [11]:

    
foodfile= "/Users/Jennifer/Desktop/HealthyNot.xls"
food=pd.read_excel(foodfile)
food = food.rename(columns={"2002.1":"2002", "2012.1":"2012"})
food









    Out[11]:






  
    
      
      More healthy
      2002
      2012
      Less healthy
      2002
      2012
    
  
  
    
      0
      Tinned tomatoes
      £4.71
      £9.60
      Frozen pizza
      £2.10
      £1.58
    
    
      1
      Baked beans
      £1.05
      £2.05
      Ice cream
      £1.50
      £1.57
    
    
      2
      Semi-skimmed milk
      £1.07
      £1.73
      Pork sausages
      £1.90
      £2.69

Looking at the dataset, we can immediatly notice that the foods considered "more healthy" have risen a lot in price. Meanwhile, the less healthy foods have barely increased, with the price of frozen pizza actually decreasing from 2002 to 2012.

Conclusion

In conclusion, the data shows a high correlation between low income and high obesity levels. Lower income families are at much higher risk of having obese children compared to high income families. One potential factor causing the high obesity levels could be the rising costs of healthy foods. Some other factors could include the high cost of recreational activities and a lack of nutritional education.



In [ ]:

	Under_Weight	Healthy_Weight	Over_weight	Obese
Region
Alabama	4.0	59.9	18.2	17.9
Alaska	4.9	61.2	19.8	14.1
Arizona	7.7	61.7	12.7	17.8
Arkansas	4.3	58.2	17.1	20.4
California	5.4	64.1	15.5	15.0

	0 - 99% FPL %	100 - 199% FPL %	200 - 399% FPL %	400% FPL or higher %
Region
Alabama	60.0	51.3	61.0	65.1
Alaska	59.2	55.4	61.3	68.3
Arizona	34.5	53.4	65.9	80.0
Arkansas	47.3	57.2	60.7	65.6
California	55.0	61.8	60.1	72.6

	0 - 99% FPL %	100 - 199% FPL %	200 - 399% FPL %	400% FPL or higher %
Region
Alabama	21.2	22.3	15.5	14.9
Alaska	14.2	19.7	14.4	8.0
Arizona	40.5	23.5	12.7	5.7
Arkansas	31.1	22.1	18.7	11.2
California	18.2	19.6	14.1	12.0

	More healthy	2002	2012	Less healthy	2002	2012
0	Tinned tomatoes	£4.71	£9.60	Frozen pizza	£2.10	£1.58
1	Baked beans	£1.05	£2.05	Ice cream	£1.50	£1.57
2	Semi-skimmed milk	£1.07	£1.73	Pork sausages	£1.90	£2.69