Accomplish the following tasks by whatever means necessary based on the material we've covered in class. Save the notebook in this format: <lastname>_DoNow_2-2.ipynb where <lastname> is your last (family) name and turn it in via Slack.


In [1]:
# the magic command to plot inline with the notebook 
# https://ipython.org/ipython-doc/dev/interactive/tutorial.html#magic-functions
%matplotlib inline

1. Import the pandas package and use the common alias


In [2]:
import pandas as pd

2. Read the file "heights_weights.xlsx" in the data folder into a pandas dataframe


In [3]:
df = pd.read_excel("data/height_weight.xlsx")

3. Plot a histogram for both height and weight. Describe the data distribution in comments.


In [4]:
df.hist()


Out[4]:
array([[<matplotlib.axes.AxesSubplot object at 0x107359d10>,
        <matplotlib.axes.AxesSubplot object at 0x1074131d0>]], dtype=object)

4. Calculate the mean height and mean weight for the dataframe.


In [5]:
df.mean()


Out[5]:
height     62.336842
weight    100.026316
dtype: float64

5. Calculate the other significant descriptive statistics on the two data points

  • Standard deviation
  • Range
  • Interquartile range

In [6]:
df.describe()


Out[6]:
height weight
count 19.000000 19.000000
mean 62.336842 100.026316
std 5.127075 22.773933
min 51.300000 50.500000
25% 58.250000 84.250000
50% 62.800000 99.500000
75% 65.900000 112.250000
max 72.000000 150.000000

In [7]:
df['height'].max() - df['height'].min()


Out[7]:
20.700000000000003

In [9]:
df['height'].quantile(q=0.75) - df['height'].quantile(q=0.25)


Out[9]:
7.6500000000000057

6. Calculate the coefficient of correlation for these variables. Do they appear correlated? (put your answer in comments)


In [10]:
df.corr()


Out[10]:
height weight
height 1.000000 0.877785
weight 0.877785 1.000000

Extra Credit: Create a scatter plot of height and weight


In [ ]: