Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm.


In [1]:
import pandas as pd
import statsmodels.formula.api as smf

In [2]:
df=pd.read_excel("2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [3]:
df.head()


Out[3]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [8]:
df.columns = ['Neighborhood', 'Median_Income', 'Recycle_Rate']

In [9]:
df.head()


Out[9]:
Neighborhood Median_Income Recycle_Rate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [10]:
lm = smf.ols(formula="Recycle_Rate~Median_Income",data=df).fit()

In [12]:
lm.params #get the parameters from the model fit


Out[12]:
Intercept        0.074804
Median_Income    0.000002
dtype: float64

In [11]:
intercept, slope = lm.params #assign those values to variables

In [13]:
def pre_recycle(median_income):
    recycle_rate = intercept + slope*median_income
    return recycle_rate

In [17]:
pre_recycle(92000)


Out[17]:
0.24680850157721865

In [ ]: