Using the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, calculate the correlation between the recycling rate and the median income. Discuss your findings in your PR.


In [1]:
import pandas as pd
import matplotlib
%matplotlib inline

In [2]:
df = pd.read_excel('data/2013_NYC_CD_MedianIncome_Recycle.xlsx')

In [3]:
df.head()


Out[3]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [9]:
df.plot(kind = 'scatter', x = 'MdHHIncE', y = 'RecycleRate')


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ceeb668>

In [11]:
df.corr()


Out[11]:
MdHHIncE RecycleRate
MdHHIncE 1.000000 0.884783
RecycleRate 0.884783 1.000000

In [ ]: