Assignment 3

  • Using the data from the2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm

In [47]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

In [48]:
df = pd.read_excel("data/2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [49]:
df.head()


Out[49]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [51]:
lm = smf.ols(formula="RecycleRate~MdHHIncE",data=df).fit()

In [52]:
lm.params


Out[52]:
Intercept    0.074804
MdHHIncE     0.000002
dtype: float64

In [53]:
Intercept, Median_Income = lm.params

In [58]:
df['Predicted RecycleRate'] = Median_Income*df['MdHHIncE']+Intercept

In [59]:
df.head()


Out[59]:
CD_Name MdHHIncE RecycleRate Predicted RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771 0.298402
1 Battery Park City, Greenwich Village & Soho 119596 0.264074 0.298402
2 Chinatown & Lower East Side 40919 0.156485 0.151307
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125 0.247898
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725 0.247898

In [56]:
intercept, slope = lm.params

In [57]:
df.plot(kind="scatter",x="MdHHIncE",y="RecycleRate")
plt.plot(df["MdHHIncE"],slope*df["MdHHIncE"]+intercept,"-",color="red")


Out[57]:
[<matplotlib.lines.Line2D at 0x1111b3710>]

In [ ]:


In [ ]: