Using the heights_weights_genders.csv, analyze the difference between the height weight correlation in women and men.


In [2]:
!pip install xlrd
!pip install matplotlib


Requirement already satisfied (use --upgrade to upgrade): xlrd in /usr/local/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/local/lib/python3.5/site-packages
Requirement already satisfied (use --upgrade to upgrade): pyparsing!=2.0.0,!=2.0.4,>=1.5.6 in /usr/local/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): pytz in /usr/local/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.6 in /usr/local/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): cycler in /usr/local/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/local/lib/python3.5/site-packages (from python-dateutil->matplotlib)

In [3]:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline


/usr/local/lib/python3.5/site-packages/matplotlib/__init__.py:1035: UserWarning: Duplicate key in file "/Users/mercybenzaquen/.matplotlib/matplotlibrc", line #2
  (fname, cnt))

In [4]:
df = pd.read_csv("heights_weights_genders.csv")

In [5]:
df.tail()


Out[5]:
Gender Height Weight
9995 Female 66.172652 136.777454
9996 Female 67.067155 170.867906
9997 Female 63.867992 128.475319
9998 Female 69.034243 163.852461
9999 Female 61.944246 113.649103

In [6]:
df_males = df[df['Gender'] == 'Male']

In [7]:
df_females = df[df['Gender'] == 'Female']

MEDIAN


In [8]:
plt.style.use('fivethirtyeight')

In [9]:
df_females.median()


Out[9]:
Height     63.730924
Weight    136.117583
dtype: float64

In [10]:
df_males.median()


Out[10]:
Height     69.027709
Weight    187.033546
dtype: float64

MEAN


In [11]:
df_females.mean()


Out[11]:
Height     63.708774
Weight    135.860093
dtype: float64

In [13]:
df_males.mean()


Out[13]:
Height     69.026346
Weight    187.020621
dtype: float64

RANGE


In [16]:
df_females['Height'].max() - df_females['Height'].min()


Out[16]:
19.126452540972608

In [17]:
df_females['Weight'].max() - df_females['Weight'].min()


Out[17]:
137.53708702680598

In [18]:
df_males['Height'].max() - df_males['Height'].min()


Out[18]:
20.591837414639791

In [19]:
df_males['Weight'].max() - df_males['Weight'].min()


Out[19]:
157.08675905728799

IQR


In [55]:
iqr_f = df_females.quantile(q=0.75)- df_females.quantile(q=0.25)
iqr_f


Out[55]:
Height     3.669124
Weight    25.876830
dtype: float64

In [56]:
iqr_m = df_males.quantile(q=0.75)- df_males.quantile(q=0.25)
iqr_m


Out[56]:
Height     3.814065
Weight    26.470034
dtype: float64

UAL and LAL


In [64]:
UAL_f= (iqr_f*1.5) + df_females.quantile(q=0.75)
UAL_f


Out[64]:
Height     71.067251
Weight    187.626171
dtype: float64

In [65]:
LAL_f= df_females.quantile(q=0.25) - (iqr_f*1.5)  
LAL_f


Out[65]:
Height    56.390756
Weight    84.118851
dtype: float64

In [66]:
UAL_m= (iqr_m*1.5) + df_males.quantile(q=0.75)
UAL_m


Out[66]:
Height     76.709840
Weight    240.062854
dtype: float64

In [67]:
LAL_m= df_males.quantile(q=0.25) - (iqr_m*1.5)  
LAL_m


Out[67]:
Height     61.453582
Weight    134.182716
dtype: float64

CORRELATION COEFICIENT


In [16]:
df_females.corr()


Out[16]:
Height Weight
Height 1.000000 0.849609
Weight 0.849609 1.000000

In [10]:
df_males.corr()


Out[10]:
Height Weight
Height 1.000000 0.862979
Weight 0.862979 1.000000

OUTLIERS


In [50]:
len(df_males)


Out[50]:
5000

In [34]:
len(df_females[df_females['Height']> 71.284662])


Out[34]:
7

In [35]:
len(df_females[df_females['Height']< 56.173345])  #In total 25 outliers


Out[35]:
17

In [69]:
len(df_females[df_females['Weight']> 188.515978])


Out[69]:
14

In [70]:
len(df_females[df_females['Weight']< 83.229044]) #In total 25 outliers


Out[70]:
11

In [40]:
len(df_males[df_males['Height']> 76.709840])


Out[40]:
18

In [44]:
len(df_males[df_males['Height']< 61.453582])  #In total 46 outliers


Out[44]:
28

In [45]:
len(df_males[df_males['Weight']>240.062854])


Out[45]:
26

In [71]:
len(df_males[df_males['Weight']< 134.182716]) #In total 46 outliers


Out[71]:
20

GRAPHS


In [17]:
#fig, ax = plt.subplots()
ax= df_males.plot(kind='scatter', y='Height', x='Weight', color='darkblue', figsize= (7,5))
ax_f= df_females.plot(kind='scatter', y='Height', x='Weight', color='orange', figsize= (7,5))
ax_f.set_ylim([50, 75])
ax_f.set_xlim([60, 220])


Out[17]:
(60, 220)

The correlation coeficient of weight and height for men and women is very similar, with 84% for women and 86% in men.


In [ ]:


In [ ]: