Create two models for the relationship between height and weight based on gender Modify the code in Assignment 1 to ask for a person's gender as well as their height to produce an estimate of a person's weight using the models you created Find the weights and use those in your function (i.e. don't generate a model each time)


In [1]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt # package for doing plotting (necessary for adding the line)
import statsmodels.formula.api as smf # package we'll be using for linear regression

In [ ]:


In [2]:
df = pd.read_csv("data/heights_weights_genders.csv")

# Male
lm_male = smf.ols(formula="Weight~Height",data=df[df['Gender']=='Male']).fit()
intercept_male, slope_male = lm_male.params

# Female
lm_female = smf.ols(formula="Weight~Height",data=df[df['Gender']=='Female']).fit()
intercept_female, slope_female = lm_female.params

In [3]:
df.head()


Out[3]:
Gender Height Weight
0 Male 73.847017 241.893563
1 Male 68.781904 162.310473
2 Male 74.110105 212.740856
3 Male 71.730978 220.042470
4 Male 69.881796 206.349801

In [38]:
intercept_male


Out[38]:
-224.49884070545858

In [39]:
intercept_female


Out[39]:
-246.01326574667277

In [34]:
def predicting_weight(gender,height):
    if gender == 'Male':
        return intercept_male + float(height) * slope_male
    elif gender == 'Female':
        return intercept_female + float(height) * slope_female

In [37]:
y = input("Please insert the height ")
x = input("Male or Female? ")
predicting_weight(x,y)


Please insert the height 73.84
Male or Female? Male
Out[37]:
215.71853757466636

Additional Analysis


In [36]:
lm_male.params #get the parameters from the model fit


Out[36]:
Intercept   -224.498841
Height         5.961774
dtype: float64

In [41]:
lm_female.params


Out[41]:
Intercept   -246.013266
Height         5.994047
dtype: float64

In [31]:
lm_male.summary()


Out[31]:
OLS Regression Results
Dep. Variable: Weight R-squared: 0.745
Model: OLS Adj. R-squared: 0.745
Method: Least Squares F-statistic: 1.458e+04
Date: Wed, 27 Jul 2016 Prob (F-statistic): 0.00
Time: 11:38:14 Log-Likelihood: -18604.
No. Observations: 5000 AIC: 3.721e+04
Df Residuals: 4998 BIC: 3.723e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -224.4988 3.411 -65.819 0.000 -231.186 -217.812
Height 5.9618 0.049 120.754 0.000 5.865 6.059
Omnibus: 1.444 Durbin-Watson: 2.011
Prob(Omnibus): 0.486 Jarque-Bera (JB): 1.478
Skew: 0.039 Prob(JB): 0.478
Kurtosis: 2.970 Cond. No. 1.67e+03

In [40]:
lm_female.summary()


Out[40]:
OLS Regression Results
Dep. Variable: Weight R-squared: 0.722
Model: OLS Adj. R-squared: 0.722
Method: Least Squares F-statistic: 1.297e+04
Date: Wed, 27 Jul 2016 Prob (F-statistic): 0.00
Time: 11:40:08 Log-Likelihood: -18623.
No. Observations: 5000 AIC: 3.725e+04
Df Residuals: 4998 BIC: 3.726e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -246.0133 3.356 -73.302 0.000 -252.593 -239.434
Height 5.9940 0.053 113.885 0.000 5.891 6.097
Omnibus: 0.679 Durbin-Watson: 2.020
Prob(Omnibus): 0.712 Jarque-Bera (JB): 0.626
Skew: -0.008 Prob(JB): 0.731
Kurtosis: 3.052 Cond. No. 1.51e+03

In [ ]: