Assignment 1

Use the data from heights_weights_genders.csv to create a simple predictor that takes in a person's height and guesses their weight based on a model using all the data, regardless of gender. To do this, find the parameters (lm.params) and use those in your function (i.e. don't generate a model each time)


In [26]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
import statsmodels.formula.api as smf

df = pd.read_csv('heights_weights_genders.csv')

In [27]:
df.describe()


Out[27]:
Height Weight
count 10000.000000 10000.000000
mean 66.367560 161.440357
std 3.847528 32.108439
min 54.263133 64.700127
25% 63.505620 135.818051
50% 66.318070 161.212928
75% 69.174262 187.169525
max 78.998742 269.989699

In [28]:
df.corr()['Height'].sort_values(ascending=False)


Out[28]:
Height    1.000000
Weight    0.924756
Name: Height, dtype: float64

In [29]:
lm = smf.ols(formula="Weight~Height",data=df).fit()
lm.params


Out[29]:
Intercept   -350.737192
Height         7.717288
dtype: float64

In [30]:
intercept, slope = lm.params

In [32]:
df.plot(kind="scatter",x="Height",y="Weight")
plt.plot(df["Height"],slope*df["Height"]+intercept,"-",color="darkgrey") 

plt.title('Correlation between height and weight')
plt.xlabel('Height (inches)')
plt.ylabel('Weight (lbs)')


Out[32]:
<matplotlib.text.Text at 0x10840f3c8>

Function:


In [35]:
height = int(input('Height (in inches): '))
weight = slope * height + intercept
print('If a person is ' + str(height) + ' inches tall, they probably weigh ' + str(round(weight,2)) + ' pounds.')


Height (in inches): 57
If a person is 57 inches tall, they probably weigh 89.15 pounds.

In [36]:
height = int(input('Height (in inches): '))
weight = 7.717288 * height - 350.737192
print('If a person is ' + str(height) + ' inches tall, they probably weigh ' + str(round(weight,2)) + ' pounds.')


Height (in inches): 57
If a person is 57 inches tall, they probably weigh 89.15 pounds.