Create two models for the relationship between height and weight based on gender. Modify the code in Assignment 1 to ask for a person's gender as well as their height to produce an estimate of a person's weight using the models you created. Find the weights and use those in your function (i.e. don't generate a model each time).
In [6]:
import pandas as pd
import statsmodels.formula.api as smf
In [2]:
df=pd.read_csv("heights_weights_genders.csv")
In [14]:
df.head()
Out[14]:
coefficients are estimated using the least squares criterion, which means we are find the line (mathematically) which minimizes the sum of squared residuals (or "sum of squared errors")
In [3]:
women = df[df['Gender']=='Female']
In [15]:
women.head()
Out[15]:
In [4]:
men = df[df['Gender']=='Male']
In [11]:
# create a fitted model in one line
lm_female = smf.ols(formula="Weight~Height",data=women).fit()
lm_female.params
Out[11]:
In [12]:
lm_male = smf.ols(formula="Weight~Height",data=men).fit()
# print the coefficient
lm_male.params
Out[12]:
In [25]:
intercept_f, slope_f=lm_female.params
intercept_m, slope_m=lm_male.params
In [26]:
def weight(height,gender):
if gender=="female":
print("Your height is", height, "Your predicted weight is", slope_f*height+intercept_f)
if gender=="male":
print("Your height is", height, "Your predicted weight is", slope_m*height+intercept_m)
In [28]:
weight(65,"female")
In [ ]: