Assignment 3

  • Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm

In [7]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt 
import statsmodels.formula.api as smf 
!pip3 install xlrd


Collecting xlrd
  Using cached xlrd-1.0.0-py3-none-any.whl
Installing collected packages: xlrd
Successfully installed xlrd-1.0.0

In [10]:
df = pd.read_excel("data/2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [11]:
df.head()


Out[11]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [20]:
lm = smf.ols(formula="RecycleRate~MdHHIncE",data=df).fit()

In [21]:
lm.params


Out[21]:
Intercept    0.074804
MdHHIncE     0.000002
dtype: float64

In [22]:
intercept, slope = lm.params

In [27]:
df.plot(kind='scatter',x='MdHHIncE',y='RecycleRate',color='gray',alpha=0.8,linewidth=0)
plt.plot(df["MdHHIncE"],slope*df["MdHHIncE"]+intercept,"-",color="red",alpha=0.5)


Out[27]:
[<matplotlib.lines.Line2D at 0x10b883828>]

In [28]:
print("The module is: Recycle rate =", slope,"* medianincome +",intercept)


The module is: Recycle rate = 1.86961266766e-06 * medianincome + 0.0748041361524

In [31]:
def get_rrate(income):
    recycle_rate = income * slope + intercept
    return recycle_rate

In [ ]: