Author: Chisheng Li

1) US Obesity Rate vs Median Household Income

Compare the obesity rate and median household income of every state and output the sorted result to obesityIncome.txt.

First, open ObesityByState.txt to retrieve every state's obesity rate.


In [1]:
from operator import itemgetter

obesityData = open('ObesityByState.txt', 'r').readlines()

# Create a dictionary (hash) to store the data
stateObesity = dict()

# Iterate through obesity data, omit non-data lines, store data in dictionary
for line in obesityData:
    # split the contents on each tab
    # assign the first entry before a tab to the variable 'state', the second to 'obesityPercentage'
    (state, obesityPercentage) = line.split('\t')[0:2]
    # skip the header
    if 'state' in state:
        continue
    # check if the state is DC, and change how we refer to it
    if state == 'DC':
        state = 'District of Columbia'

    # Make sure to accurately represent non-numeric values.
    # The string 'N/A' shouldn't be used in comparisons.
    # Anything that cannot be converted to a float is nulled
    try:
        float(obesityPercentage)
    except ValueError:
        obesityPercentage = None
    # store the value in the dictionary
    stateObesity[state] = obesityPercentage
    # uncomment the following line to check
    print "%s\t%s" % (state, stateObesity[state])


Alabama	28.9
Alaska	23.7
Arizona	21.2
Arkansas	26.1
California	22.2
Colorado	16.8
Connecticut	19.7
Delaware	21.1
District of Columbia	22.5
Florida	22.9
Georgia	24.7
Hawaii	20.1
Idaho	20.8
Illinois	23
Indiana	25.5
Iowa	23.5
Kansas	23.2
Kentucky	25.8
Louisiana	27
Maine	23.4
Maryland	23.9
Massachusetts	18.4
Michigan	25.4
Minnesota	22.6
Mississippi	29.5
Missouri	24.9
Montana	19.7
Nebraska	23.2
Nevada	21.1
New Hampshire	21.6
New Jersey	21.9
New Mexico	21.5
New York	22.1
North Carolina	24.2
North Dakota	24.6
Ohio	25.3
Oklahoma	24.9
Oregon	21.2
Pennsylvania	24.3
Rhode Island	19
South Carolina	25.1
South Dakota	23.8
Tennessee	27.2
Texas	25.8
Utah	20.4
Vermont	18.7
Virginia	23.1
Washington	22.2
West Virginia	27.6
Wisconsin	23.2
Wyoming	20.8

Next, open MedianIncomeByState.txt to retrieve every state's median household income.


In [2]:
incomeData = open('MedianIncomeByState.txt', 'r').readlines()

# Remove any trailing newline characters
for line in incomeData:
    line.strip()

# Use a hash called %medianIncome to associate state names with median income values
medianIncome = dict()
for line in incomeData:
    (statename, income) = line.split('\t')[0:2]
    medianIncome[statename] = income
    print line


state	Median Income (dollars)	Standard Error (dollars)

New Jersey	59989 	1112 

Maryland	58347 	1071 

New Hampshire	58223 	1113 

Hawaii	57572 	1040 

Connecticut	57369 	1166 

Minnesota	56084 	846 

Alaska	55935 	1105 

Massachusetts	54617 	1075 

Virginia	54301 	930 

Utah	53226 	749 

Colorado	52011 	987 

California	51647 	479 

Delaware	50970 	946 

Washington	50885 	847 

Rhode Island	48823 	1155 

Vermont	48508 	861 

Nevada	48314 	1101 

Illinois	47978 	688 

Wisconsin	47004 	851 

Nebraska	46613 	980 

New York	46242 	598 

District of Columbia	45900 	1318 

Pennsylvania	45814 	630 

Michigan	45793 	653 

Wyoming	45598 	878 

Iowa	45086 	922 

Idaho	44994 	883 

Ohio	44961 	709 

Arizona	44748 	918 

Georgia	44439 	580 

Missouri	44324 	712 

Kansas	43802 	1013 

Indiana	43735 	776 

Oregon	43570 	816 

South Dakota	42525 	804 

Florida	42079 	524 

Maine	42006 	870 

Texas	41959 	427 

North Dakota	41869 	810 

North Carolina	41067 	648 

South Carolina	40350 	799 

Tennessee	39524 	820 

New Mexico	39029 	1074 

Oklahoma	38895 	815 

Alabama	38180 	990 

Kentucky	37566 	731 

Louisiana	36814 	877 

Montana	36200 	695 

Arkansas	35591 	744 

West Virginia	35234 	785 

Mississippi	34508 	847 

Sort the states from the most obese to least obese, and print the states, obesity rates and median incomes to obesityIncome.txt.


In [3]:
OUT = open('obesityIncome.txt', 'w')
OUT.write("State"+"\t"+"Obesity rate"+"\t"+"Median household income")

print "%s\t%s\t%s" % ("State", "Obesity rate", "Median household income")
for state, obesityPercentage in sorted(stateObesity.iteritems(), key=lambda (k,v): (v,k), reverse=True):
    print "%s\t%s\t%s" % (state,stateObesity[state],medianIncome[state] )
    OUT.write("\n%s\t%s\t%s" % (state,stateObesity[state],medianIncome[state]))

OUT.close()


State	Obesity rate	Median household income
Mississippi	29.5	34508 
Alabama	28.9	38180 
West Virginia	27.6	35234 
Tennessee	27.2	39524 
Louisiana	27	36814 
Arkansas	26.1	35591 
Texas	25.8	41959 
Kentucky	25.8	37566 
Indiana	25.5	43735 
Michigan	25.4	45793 
Ohio	25.3	44961 
South Carolina	25.1	40350 
Oklahoma	24.9	38895 
Missouri	24.9	44324 
Georgia	24.7	44439 
North Dakota	24.6	41869 
Pennsylvania	24.3	45814 
North Carolina	24.2	41067 
Maryland	23.9	58347 
South Dakota	23.8	42525 
Alaska	23.7	55935 
Iowa	23.5	45086 
Maine	23.4	42006 
Wisconsin	23.2	47004 
Nebraska	23.2	46613 
Kansas	23.2	43802 
Virginia	23.1	54301 
Illinois	23	47978 
Florida	22.9	42079 
Minnesota	22.6	56084 
District of Columbia	22.5	45900 
Washington	22.2	50885 
California	22.2	51647 
New York	22.1	46242 
New Jersey	21.9	59989 
New Hampshire	21.6	58223 
New Mexico	21.5	39029 
Oregon	21.2	43570 
Arizona	21.2	44748 
Nevada	21.1	48314 
Delaware	21.1	50970 
Wyoming	20.8	45598 
Idaho	20.8	44994 
Utah	20.4	53226 
Hawaii	20.1	57572 
Montana	19.7	36200 
Connecticut	19.7	57369 
Rhode Island	19	48823 
Vermont	18.7	48508 
Massachusetts	18.4	54617 
Colorado	16.8	52011 

Count the number of states that have a higher rate of obesity than Michigan:


In [4]:
count=0
for state, income in sorted(stateObesity.iteritems(), key=lambda (k,v): (v,k), reverse=True):
    if state=="Michigan":
        break
    count=count+1
print "The number of states that have a higher obesity rate than Michigan:",count


The number of states that have a higher obesity rate than Michigan: 9

Count the number of states that have a lower median income than Michigan:


In [6]:
count1=0
for state, income in sorted(medianIncome.iteritems(), key=lambda (k,v): (v,k)):
    if state=="Michigan":
        break
    count1=count1+1
print "the number of states that have a lower median income than Michigan:",count1


the number of states that have a lower median income than Michigan: 27

US obesity rate map


In [7]:
from IPython.display import Image
Image(filename='states obesity rate 2005.png')


Out[7]:

US median household income map


In [9]:
Image(filename='states median income 2005.png')


Out[9]:

2) US Obesity Rate vs Alcohol/Wine Consumption

After completing the first part, you start to wonder whether there are other factors besides median household income that are linked to obesity. Beer has been blamed for beer guts, so why not obesity? You then also wonder about the French, and how they maintain a normal body weight while eating what they please. They claim it's the wine. So, you decide to investigate how beer and wine consumption are related to obesity. The source for the beer data (2003-2006) is the Beer Institute. The source for the 2004 wine consumption data is the Adams 2005 Wine Handbook.

Unfortunately, this data is in a weird format. Start with BeerAndWinePerCapita.txt and fix it up.


In [10]:
# Read the alcohol consumption data
# Change the state names to capitalize the first letter of each word
# Read the obesity data
# Output the data with the state name, obesity %, per capita beer consumption, 
# and per capita wine consumption

alcoholConsumption = open('BeerAndWinePerCapita.txt', 'r').readlines()
stateBeer = dict()
stateWine = dict()

In [11]:
# To change switch the upper and lower case of all states
for line in alcoholConsumption:
    line.strip()
    (statename, beer, wine) = line.split(',')
    state = str.title(statename)
    try:
        float(beer)
    except ValueError:
        beer = None
    stateBeer[state] = beer
    try:
        float(wine)
    except ValueError:
        wine = None
    stateWine[state] = wine
    print state, stateBeer[state], stateWine[state]


State None None
Nevada 44 5.75

New Hampshire 43.4 6.26

North Dakota 41.7 1.56

Montana 41.5 3.06

South Dakota 39 1.5

Wisconsin 38.2 2.63

New Mexico 37.8 2.25

Texas 37.4 2.13

Louisiana 37.1 2.35

South Carolina 37 2.06

Kentucky 36.8 1.39

Nebraska 36.6 1.71

Arizona 36.4 3.29

Wyoming 36.4 2

Delaware 35.4 5.14

Mississippi 35.1 1.06

Iowa 34.4 1.33

Florida 33.8 4.09

Ohio 33.5 2.01

Colorado 33.4 3.66

Missouri 33.4 2.36

Hawaii 32.7 4.27

Alaska 32.4 3.83

Vermont 31.4 4.75

West Virginia 31.3 0.88

Illinois 31.3 3.1

Minnesota 31.3 2.66

Maine 31.2 3.57

Oregon 30.6 4.27

Alabama 30.6 1.72

Idaho 30.4 3.06

North Carolina 30.2 2.43

Tennessee 30.2 1.6

Kansas 30 1.34

Pennsylvania 29.6 2.09

Georgia 29.5 2.38

Virginia 29.3 3.09

Michigan 29.3 2.47

Indiana 28.3 2

Rhode Island 28 4.36

Washington 27.9 4.1

Massachusetts 27.8 4.89

Arkansas 27.6 1.34

Oklahoma 27 1.36

Maryland 26.1 2.95

California 26 4.53

New Jersey 24.1 4.46

Connecticut 23.2 4.62

New York 23 3.61

Utah 19.5 1.37

District Of Columbia None 7.65


In [12]:
# To read the obesity data
obesityData = open('ObesityByState.txt', 'r').readlines()
stateObesity = dict()
for line in obesityData:
    (state, obesityPercentage) = line.split('\t')[0:2]
    if 'state' in state:
        continue
    if state == 'DC':
        state = 'District Of Columbia'
    try:
        float(obesityPercentage)
    except ValueError:
        obesityPercentage = None
    stateObesity[state] = obesityPercentage
    print state, stateObesity[state]


Alabama 28.9
Alaska 23.7
Arizona 21.2
Arkansas 26.1
California 22.2
Colorado 16.8
Connecticut 19.7
Delaware 21.1
District Of Columbia 22.5
Florida 22.9
Georgia 24.7
Hawaii 20.1
Idaho 20.8
Illinois 23
Indiana 25.5
Iowa 23.5
Kansas 23.2
Kentucky 25.8
Louisiana 27
Maine 23.4
Maryland 23.9
Massachusetts 18.4
Michigan 25.4
Minnesota 22.6
Mississippi 29.5
Missouri 24.9
Montana 19.7
Nebraska 23.2
Nevada 21.1
New Hampshire 21.6
New Jersey 21.9
New Mexico 21.5
New York 22.1
North Carolina 24.2
North Dakota 24.6
Ohio 25.3
Oklahoma 24.9
Oregon 21.2
Pennsylvania 24.3
Rhode Island 19
South Carolina 25.1
South Dakota 23.8
Tennessee 27.2
Texas 25.8
Utah 20.4
Vermont 18.7
Virginia 23.1
Washington 22.2
West Virginia 27.6
Wisconsin 23.2
Wyoming 20.8

In [13]:
# To sort states by beer consumption, by increasing order
print "%s\t%s\t%s" % ("State", "per capita beer consumption", "per capita wine consumption")
for state, beer in sorted(stateBeer.iteritems(), key=lambda (k,v):(v,k)):
    print "%s\t%s\t%s" % (state,stateBeer[state],stateWine[state])


State	per capita beer consumption	per capita wine consumption
District Of Columbia	None	7.65

State	None	None
Utah	19.5	1.37

New York	23	3.61

Connecticut	23.2	4.62

New Jersey	24.1	4.46

California	26	4.53

Maryland	26.1	2.95

Oklahoma	27	1.36

Arkansas	27.6	1.34

Massachusetts	27.8	4.89

Washington	27.9	4.1

Rhode Island	28	4.36

Indiana	28.3	2

Michigan	29.3	2.47

Virginia	29.3	3.09

Georgia	29.5	2.38

Pennsylvania	29.6	2.09

Kansas	30	1.34

North Carolina	30.2	2.43

Tennessee	30.2	1.6

Idaho	30.4	3.06

Alabama	30.6	1.72

Oregon	30.6	4.27

Maine	31.2	3.57

Illinois	31.3	3.1

Minnesota	31.3	2.66

West Virginia	31.3	0.88

Vermont	31.4	4.75

Alaska	32.4	3.83

Hawaii	32.7	4.27

Colorado	33.4	3.66

Missouri	33.4	2.36

Ohio	33.5	2.01

Florida	33.8	4.09

Iowa	34.4	1.33

Mississippi	35.1	1.06

Delaware	35.4	5.14

Arizona	36.4	3.29

Wyoming	36.4	2

Nebraska	36.6	1.71

Kentucky	36.8	1.39

South Carolina	37	2.06

Louisiana	37.1	2.35

Texas	37.4	2.13

New Mexico	37.8	2.25

Wisconsin	38.2	2.63

South Dakota	39	1.5

Montana	41.5	3.06

North Dakota	41.7	1.56

New Hampshire	43.4	6.26

Nevada	44	5.75


In [14]:
# To sort states by wine consumption, by increasing order
print "%s\t%s\t%s" % ("State", "per capita beer consumption", "per capita wine consumption")
for state, wine in sorted(stateWine.iteritems(), key=lambda (k,v):(v,k)):
    print "%s\t%s\t%s" % (state,stateBeer[state],stateWine[state] )


State	per capita beer consumption	per capita wine consumption
State	None	None
West Virginia	31.3	0.88

Mississippi	35.1	1.06

Iowa	34.4	1.33

Arkansas	27.6	1.34

Kansas	30	1.34

Oklahoma	27	1.36

Utah	19.5	1.37

Kentucky	36.8	1.39

South Dakota	39	1.5

North Dakota	41.7	1.56

Tennessee	30.2	1.6

Nebraska	36.6	1.71

Alabama	30.6	1.72

Indiana	28.3	2

Wyoming	36.4	2

Ohio	33.5	2.01

South Carolina	37	2.06

Pennsylvania	29.6	2.09

Texas	37.4	2.13

New Mexico	37.8	2.25

Louisiana	37.1	2.35

Missouri	33.4	2.36

Georgia	29.5	2.38

North Carolina	30.2	2.43

Michigan	29.3	2.47

Wisconsin	38.2	2.63

Minnesota	31.3	2.66

Maryland	26.1	2.95

Idaho	30.4	3.06

Montana	41.5	3.06

Virginia	29.3	3.09

Illinois	31.3	3.1

Arizona	36.4	3.29

Maine	31.2	3.57

New York	23	3.61

Colorado	33.4	3.66

Alaska	32.4	3.83

Florida	33.8	4.09

Washington	27.9	4.1

Hawaii	32.7	4.27

Oregon	30.6	4.27

Rhode Island	28	4.36

New Jersey	24.1	4.46

California	26	4.53

Connecticut	23.2	4.62

Vermont	31.4	4.75

Massachusetts	27.8	4.89

Delaware	35.4	5.14

Nevada	44	5.75

New Hampshire	43.4	6.26

District Of Columbia	None	7.65


In [15]:
# To sort the states from the most to the least obese
# To use the three hashes, %obesityByState, %stateBeer and %stateBeer to output
OUT = open('obesityAlcohol.txt', 'w')
OUT.write("state"+"\t"+"obesity %"+"\t"+"per capita beer consumption"+"\t"
          +"per capita wine consumption")

print "%s\t%s\t%s\t%s" % ("State", "Obesity rate", 
                          "Per capita beer consumption", 
                          "Per capita wine consumption")
for state, obesityPercentage in sorted(stateObesity.iteritems(),
    key=lambda (k,v): (v,k), reverse=True):
    print "%s\t%s\t%s\t%s" % (state,stateObesity[state],stateBeer[state],stateWine[state] ) 
    OUT.write("\n%s\t%s\t%s\t%s" % (state,stateObesity[state],stateBeer[state],stateWine[state] ))
OUT.close()


State	Obesity rate	Per capita beer consumption	Per capita wine consumption
Mississippi	29.5	35.1	1.06

Alabama	28.9	30.6	1.72

West Virginia	27.6	31.3	0.88

Tennessee	27.2	30.2	1.6

Louisiana	27	37.1	2.35

Arkansas	26.1	27.6	1.34

Texas	25.8	37.4	2.13

Kentucky	25.8	36.8	1.39

Indiana	25.5	28.3	2

Michigan	25.4	29.3	2.47

Ohio	25.3	33.5	2.01

South Carolina	25.1	37	2.06

Oklahoma	24.9	27	1.36

Missouri	24.9	33.4	2.36

Georgia	24.7	29.5	2.38

North Dakota	24.6	41.7	1.56

Pennsylvania	24.3	29.6	2.09

North Carolina	24.2	30.2	2.43

Maryland	23.9	26.1	2.95

South Dakota	23.8	39	1.5

Alaska	23.7	32.4	3.83

Iowa	23.5	34.4	1.33

Maine	23.4	31.2	3.57

Wisconsin	23.2	38.2	2.63

Nebraska	23.2	36.6	1.71

Kansas	23.2	30	1.34

Virginia	23.1	29.3	3.09

Illinois	23	31.3	3.1

Florida	22.9	33.8	4.09

Minnesota	22.6	31.3	2.66

District Of Columbia	22.5	None	7.65

Washington	22.2	27.9	4.1

California	22.2	26	4.53

New York	22.1	23	3.61

New Jersey	21.9	24.1	4.46

New Hampshire	21.6	43.4	6.26

New Mexico	21.5	37.8	2.25

Oregon	21.2	30.6	4.27

Arizona	21.2	36.4	3.29

Nevada	21.1	44	5.75

Delaware	21.1	35.4	5.14

Wyoming	20.8	36.4	2

Idaho	20.8	30.4	3.06

Utah	20.4	19.5	1.37

Hawaii	20.1	32.7	4.27

Montana	19.7	41.5	3.06

Connecticut	19.7	23.2	4.62

Rhode Island	19	28	4.36

Vermont	18.7	31.4	4.75

Massachusetts	18.4	27.8	4.89

Colorado	16.8	33.4	3.66

US map per capita beer consumption


In [18]:
Image(filename='states per capita beer consumption 2005.png')


Out[18]:

US map per capita wine consumption


In [19]:
Image(filename='states per capita wine consumption 2005.png')


Out[19]:

Obesity rate vs Per capita beer consumption


In [20]:
Image(filename='scatterplot beer capita.png')


Out[20]:

Obesity rate vs Per capita wine consumption


In [23]:
Image(filename='scatterplot wine capita.png')


Out[23]:

Observation

Based on the scatterplot of obesity rate vs. per capita beer consumption, obesity rate is fairly correlated with beer consumption. In other words, a state that has higher beer consumption is expected to have higher population obesity rate. Beer consumption is not the sole cause leading to obesity, because there are outliers such as Colorado, which has the lowest obesity rate among the states even though Colorado has higher per capita beer consumption than 29 states.

Conversely, based on the scatterplot of obesity rate vs. per capita wine consumption, the higher wine consumption is highly correlated to lower obesity rate. In other words, a state that has higher wine consumption is expected to have lower obesity rate.

However, it is dangerous to conclude that wine consumption will decrease obesity (correlation does not mean causation). Because consumption of wine is often perceived as expensive, states that have higher wine consumption may also have higher median income. Because states that have higher median income also tend to have lower obesity rates, it is important to conduct further analysis on the association between state median income and per capita wine consumption.


In [ ]: