Assignment 3

Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm


In [1]:
import pandas as pd
import matplotlib.pyplot as plt # package for doing plotting (necessary for adding the line)
import statsmodels.formula.api as smf # package we'll be using for linear regression
%matplotlib inline

In [3]:
df = pd.read_excel('../../class4/homework/data/2013_NYC_CD_MedianIncome_Recycle.xlsx')

In [5]:
df.head()


Out[5]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [6]:
lm = smf.ols(formula="RecycleRate~MdHHIncE",data=df).fit()
lm.params
intercept, height = lm.params

In [14]:
# Function using the built math. 
def simplest_predictor(income, height, intercept):
    height = float(height)
    intercept = float(intercept)
    income = float(income)
    return height*income+intercept

In [15]:
income = input("How high is the houshold income? ")
print("We predict a recycling rate of", simplest_predictor(income,height,intercept))


How high is the houshold income? 50000
We predict a recycling rate of 0.16828476953547256