Assignment 2

  • Using the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, calculate the correlation between the recycling rate and the median income. Discuss your findings in your PR.

In [1]:
import pandas as pd
%matplotlib inline


/Users/zhizhou/.virtualenvs/data-analysis/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/zhizhou/.matplotlib/matplotlibrc", line #2
  (fname, cnt))
/Users/zhizhou/.virtualenvs/data-analysis/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/zhizhou/.matplotlib/matplotlibrc", line #3
  (fname, cnt))

In [3]:
df = pd.read_excel("data/2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [4]:
df.head()


Out[4]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [5]:
df.plot(kind='scatter',x='MdHHIncE',y='RecycleRate')


Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x11116aac8>

In [6]:
df.corr()


Out[6]:
MdHHIncE RecycleRate
MdHHIncE 1.000000 0.884783
RecycleRate 0.884783 1.000000

Conclusion

  • The correlation is 0.88, which is quite close to 1, therefore the recycling rate and the median income has a close positive correlation with each other.

In [ ]: