notebook.community

Edit and run



In [9]:

    
import pandas as pd
import matplotlib
import numpy
%matplotlib inline



In [10]:

    
df = pd.read_excel("2013_NYC_CD_MedianIncome_Recycle.xlsx")



In [11]:

    
df.head()









    Out[11]:






  
    
      
      CD_Name
      MdHHIncE
      RecycleRate
    
  
  
    
      0
      Battery Park City, Greenwich Village & Soho
      119596
      0.286771
    
    
      1
      Battery Park City, Greenwich Village & Soho
      119596
      0.264074
    
    
      2
      Chinatown & Lower East Side
      40919
      0.156485
    
    
      3
      Chelsea, Clinton & Midtown Business Distric
      92583
      0.235125
    
    
      4
      Chelsea, Clinton & Midtown Business Distric
      92583
      0.246725



In [12]:

    
df.corr()









    Out[12]:






  
    
      
      MdHHIncE
      RecycleRate
    
  
  
    
      MdHHIncE
      1.000000
      0.884783
    
    
      RecycleRate
      0.884783
      1.000000



In [13]:

    
df.plot(kind='scatter', x='MdHHIncE', y='RecycleRate')









    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x10be2da58>

The visualisation of the data shows a clear positive correlation between the household income and the recycling rate, which is confirmed by r = 0.884783.

	CD_Name	MdHHIncE	RecycleRate
0	Battery Park City, Greenwich Village & Soho	119596	0.286771
1	Battery Park City, Greenwich Village & Soho	119596	0.264074
2	Chinatown & Lower East Side	40919	0.156485
3	Chelsea, Clinton & Midtown Business Distric	92583	0.235125
4	Chelsea, Clinton & Midtown Business Distric	92583	0.246725