Create two models for the relationship between height and weight based on gender. Modify the code in Assignment 1 to ask for a person's gender as well as their height to produce an estimate of a person's weight using the models you created. Find the weights and use those in your function (i.e. don't generate a model each time).


In [6]:
import pandas as pd
import statsmodels.formula.api as smf

In [2]:
df=pd.read_csv("heights_weights_genders.csv")

In [14]:
df.head()


Out[14]:
Gender Height Weight
0 Male 73.847017 241.893563
1 Male 68.781904 162.310473
2 Male 74.110105 212.740856
3 Male 71.730978 220.042470
4 Male 69.881796 206.349801

coefficients are estimated using the least squares criterion, which means we are find the line (mathematically) which minimizes the sum of squared residuals (or "sum of squared errors")


In [3]:
women = df[df['Gender']=='Female']

In [15]:
women.head()


Out[15]:
Gender Height Weight
5000 Female 58.910732 102.088326
5001 Female 65.230013 141.305823
5002 Female 63.369004 131.041403
5003 Female 64.479997 128.171511
5004 Female 61.793096 129.781407

In [4]:
men = df[df['Gender']=='Male']

In [11]:
# create a fitted model in one line
lm_female = smf.ols(formula="Weight~Height",data=women).fit()
lm_female.params


Out[11]:
Intercept   -246.013266
Height         5.994047
dtype: float64

In [12]:
lm_male = smf.ols(formula="Weight~Height",data=men).fit()
# print the coefficient
lm_male.params


Out[12]:
Intercept   -224.498841
Height         5.961774
dtype: float64

In [25]:
intercept_f, slope_f=lm_female.params 
intercept_m, slope_m=lm_male.params

In [26]:
def weight(height,gender):
    if gender=="female":
        print("Your height is", height, "Your predicted weight is", slope_f*height+intercept_f)
    if gender=="male":
        print("Your height is", height, "Your predicted weight is", slope_m*height+intercept_m)

In [28]:
weight(65,"female")


Your height is 65 Your predicted weight is 143.599764219

In [ ]: