Assigment 3

  • Using the heights_weights_genders.csv, analyze the difference between the height weight correlation in women and men.

In [8]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib
plt.style.use('ggplot')

In [2]:
df = pd.read_csv("data/heights_weights_genders.csv")

In [3]:
df.head(2)


Out[3]:
Gender Height Weight
0 Male 73.847017 241.893563
1 Male 68.781904 162.310473

In [13]:
fig, ax = plt.subplots(figsize=[10,6])
for category, group in df.groupby('Gender'):
    ax.plot(group['Height'], group['Weight'], marker='o', linestyle='', label=category, markeredgewidth=0,alpha=0.2)
ax.legend()


Out[13]:
<matplotlib.legend.Legend at 0x109afc550>

In [19]:
women = df[df['Gender']=='Female']
men = df[df['Gender']=='Male']

In [21]:
women.plot(kind='scatter',x='Height',y='Weight')
women.corr()


Out[21]:
Height Weight
Height 1.000000 0.849609
Weight 0.849609 1.000000

In [22]:
men.plot(kind='scatter',x='Height',y='Weight')
men.corr()


Out[22]:
Height Weight
Height 1.000000 0.862979
Weight 0.862979 1.000000

In [23]:
df.corr()


Out[23]:
Height Weight
Height 1.000000 0.924756
Weight 0.924756 1.000000

Conclusion

  • For male, the coefficient of correlation between height and weight is 0.86.
  • For female, 0.84.
  • Male's height and weight has a closer corellation than female's.

In [ ]: