Assignment 3

Using the data from the 2013_NYC_CD_MedianIncome_Recycle.xlsx file, create a predictor using the weights from the model. This time, use the built in attributes in your model rather than hard-coding them into your algorithm


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline
import statsmodels.formula.api as smf

df = pd.read_excel('2013_NYC_CD_MedianIncome_Recycle.xlsx')

In [2]:
df.columns = ['Neighborhood', 'Median_Income', 'Recycle_Rate']
df.plot(kind='scatter',x='Median_Income',y='Recycle_Rate')


Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x10a240d68>

In [3]:
df.corr()['Median_Income'].sort_values(ascending=False)


Out[3]:
Median_Income    1.000000
Recycle_Rate     0.884783
Name: Median_Income, dtype: float64

In [29]:
lm = smf.ols(formula="Recycle_Rate~Median_Income",data=df).fit()
lm.params


Out[29]:
Intercept        0.074804
Median_Income    0.000002
dtype: float64

In [30]:
intercept, slope = lm.params

In [31]:
df.plot(kind="scatter",x="Median_Income",y="Recycle_Rate")
plt.plot(df["Median_Income"],slope*df["Median_Income"]+intercept,"-",color="darkgrey") 

plt.title('Correlation between income and recycling rate')
plt.xlabel('Median Income ($)')
plt.ylabel('Recycle Rate')


Out[31]:
<matplotlib.text.Text at 0x10ade0278>

Function:


In [36]:
median_income = int(input('What is the median income of the neighborhood? '))
recycle_rate = slope * median_income + intercept
print('If the neighborhood\'s median income is $' + str(median_income) + ' its recycle rate is probably around ' + str(round(recycle_rate, 2)) + ' percent.')


What is the median income of the neighborhood? 100000
If the neighborhood's median income is $100000 its recycle rate is probably around 0.26 percent.