Assignment #3: Using the heights_weights_genders.csv, analyze the difference between the height weight correlation in women and men.


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('heights_weights_genders.csv')

In [3]:
df.head(3)


Out[3]:
Gender Height Weight
0 Male 73.847017 241.893563
1 Male 68.781904 162.310473
2 Male 74.110105 212.740856

In [45]:
male = df[df['Gender']=='Male']
print("Male correlation: " + str(male.corr()['Height']['Weight']))

female = df[df['Gender']=='Female']
print("Female correlation: " + str(female.corr()['Height']['Weight']))


Male correlation: 0.862978848616
Female correlation: 0.849608591419

The positive correlation between height and weight is slightly higher for males.


In [71]:
ax.set_prop_cycle('color', ['grey'])

fig, ax = plt.subplots(figsize=(5,5))
for category, group in df.groupby('Gender'):
    ax.plot(group['Height'], group['Weight'], marker='o', linestyle='', label=category, markeredgewidth=0)
    
ax.set_ylabel('Weight')
ax.set_xlabel('Height')

ax.grid()
ax.set_axisbelow(True)
ax.grid(linestyle=':', linewidth='0.5', color='darkgrey')
ax.minorticks_on()
ax.grid(which='minor', linestyle=':', linewidth='0.5', color='darkgrey')

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

ax.tick_params(which='both', 
                top='off',
                left='off', 
                right='off',  
                bottom='off')

ax.set_xlim(45,90)
ax.set_ylim(45,300)
ax.legend(loc='lower right')


Out[71]:
<matplotlib.legend.Legend at 0x110828240>