Using the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, calculate the correlation between the recycling rate and the median income. Discuss your findings in your PR.


In [10]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

%matplotlib inline
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42

In [2]:
df = pd.read_excel('2013_NYC_CD_MedianIncome_Recycle.xlsx')

In [3]:
df.head()


Out[3]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [4]:
df.columns = ['name', 'median_household_income', 'recycle_rate']

In [5]:
df.head()


Out[5]:
name median_household_income recycle_rate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [7]:
df.corr()


Out[7]:
median_household_income recycle_rate
median_household_income 1.000000 0.884783
recycle_rate 0.884783 1.000000
  • It'd be much easier to understand corr() if there's a graph. How to do that?
  • I guess my biggest question is - how would I explain the relationship between household income and recycle rate? Could it be that neighborhoods with higher recycle rates have more educational materials like posters etc? Could it be levels of education? Could it be more bins? I'd look into that before jumping to conclusions.

In [19]:



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-5d9d6485ba54> in <module>()
----> 1 plt.matshow(df.corr.recycle_rate())

AttributeError: 'function' object has no attribute 'recycle_rate'

In [ ]: