Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm


In [1]:
import pandas as pd


/usr/local/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/mercybenzaquen/.matplotlib/matplotlibrc", line #2
  (fname, cnt))

In [2]:
import statsmodels.formula.api as smf

In [3]:
df = pd.read_excel("2013_NYC_CD_MedianIncome_Recycle.xlsx")

In [4]:
df.head()


Out[4]:
CD_Name MdHHIncE RecycleRate
0 Battery Park City, Greenwich Village & Soho 119596 0.286771
1 Battery Park City, Greenwich Village & Soho 119596 0.264074
2 Chinatown & Lower East Side 40919 0.156485
3 Chelsea, Clinton & Midtown Business Distric 92583 0.235125
4 Chelsea, Clinton & Midtown Business Distric 92583 0.246725

In [22]:
lm = smf.ols(formula="RecycleRate~MdHHIncE",data=df).fit()

In [23]:
lm.params


Out[23]:
Intercept    0.074804
MdHHIncE     0.000002
dtype: float64

y = ß0 + ß1x

Here:

y: is the variable that we want to predict

ß0: is intercept of the regression line i.e. value of y when x is 0

ß1: is coefficient of x i.e. variation in y with change in value of x

x: Variables that affects value of y i.e. already know variable whose effect we want to se on values of y


In [26]:
intercept, slope = lm.params
def RecycleRate_calculator(median_income): 
    return (0.074804 + (0.000002* float(median_income)))

In [27]:
RecycleRate_calculator(119596)


Out[27]:
0.313996

In [ ]: