Assignment #2: Using the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, calculate the correlation between the recycling rate and the median income. Discuss your findings in your PR.


In [10]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_excel('2013_NYC_CD_MedianIncome_Recycle.xlsx')

In [3]:
df.head(3)


Out[3]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485

In [4]:
df.corr()


Out[4]:
MdHHIncE RecycleRate
MdHHIncE 1.000000 0.884783
RecycleRate 0.884783 1.000000

In [16]:
df.plot(kind='scatter', x='MdHHIncE', y='RecycleRate')
plt.title('Correlation between Median Income and Recycle Rate')
plt.xlabel('Median Income')


Out[16]:
<matplotlib.text.Text at 0x114686940>