Assignment 3

Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.formula.api as smf

In [3]:
df = pd.read_excel("/home/sean/git/algorithms/class4/homework/data/2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [4]:
df.head()


Out[4]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [46]:
lm = smf.ols(formula="RecycleRate~MdHHIncE",data=df).fit()

In [47]:
lm.params


Out[47]:
Intercept    0.074804
MdHHIncE     0.000002
dtype: float64

In [65]:
intercept, slope = lm.params

In [57]:
slope


Out[57]:
0.074804136152441802

In [58]:
intercept


Out[58]:
1.8696126676606164e-06

In [63]:
plt.yscale?

In [67]:
df.plot(kind='scatter', x='MdHHIncE', y='RecycleRate')
plt.plot(df["MdHHIncE"],slope*df["MdHHIncE"]+intercept,"-", c='red')


Out[67]:
[<matplotlib.lines.Line2D at 0x7faf9250d128>]

In [85]:
def income_to_rate(income_str):
    income=float(income_str)
    return '%s' % float('%.3g' % ((slope*income+intercept)*100))

In [86]:
income_to_rate(50000)


Out[86]:
'16.8'

In [90]:
income=input('Enter median neighborhood income: $')
print('Predicted recycle rate for this neighborhood is {}%'.format(income_to_rate(income)))


Enter median neighborhood income: $1000000
Predicted recycle rate for this neighborhood is 194.0%

In [ ]: