Mean, Median, Mode, and introducing NumPy

Mean vs. Median

Let's create some fake income data, centered around 27,000 with a normal distribution and standard deviation of 15,000, with 10,000 data points. (We'll discuss those terms more later, if you're not familiar with them.)

Then, compute the mean (average) - it should be close to 27,000:


In [48]:
import numpy as np

incomes = np.random.normal(27000, 15000, 10000)
np.mean(incomes)


Out[48]:
27320.73747742573

We can segment the income data into 50 buckets, and plot it as a histogram:


In [50]:
%matplotlib inline
# %config InlineBackend.figure_format='retina'
# import seaborn as sns

# sns.set_context("paper")
# sns.set_style("white")
# sns.set()

import matplotlib.pyplot as plt
plt.hist(incomes, 50)
plt.show()


Now compute the median - since we have a nice, even distribution it too should be close to 27,000:


In [27]:
np.median(incomes)


Out[27]:
27428.566762737137

Now we'll add Donald Trump into the mix. Darn income inequality!


In [28]:
incomes = np.append(incomes, [1000000000])

The median won't change much, but the mean does:


In [29]:
np.median(incomes)


Out[29]:
27429.036458484166

In [30]:
np.mean(incomes)


Out[30]:
127237.31216460519

Mode

Next, let's generate some fake age data for 500 people:


In [33]:
ages = np.random.randint(18, high=90, size=500)
ages


Out[33]:
array([21, 44, 52, 20, 68, 43, 65, 78, 69, 30, 77, 69, 24, 63, 65, 79, 47,
       54, 33, 49, 53, 18, 65, 64, 65, 44, 86, 47, 87, 18, 31, 32, 27, 35,
       88, 28, 79, 48, 45, 84, 88, 48, 73, 79, 28, 64, 52, 30, 76, 74, 24,
       60, 82, 37, 53, 26, 19, 89, 28, 26, 82, 55, 72, 56, 22, 36, 30, 39,
       53, 38, 29, 51, 59, 75, 41, 35, 75, 42, 53, 29, 26, 26, 54, 68, 38,
       86, 86, 68, 72, 23, 19, 82, 35, 55, 78, 28, 18, 21, 83, 72, 89, 46,
       66, 75, 34, 34, 74, 53, 26, 60, 43, 47, 73, 30, 75, 44, 70, 35, 76,
       46, 73, 67, 72, 39, 48, 43, 34, 42, 43, 63, 80, 30, 28, 28, 40, 64,
       53, 88, 77, 82, 72, 27, 45, 43, 67, 83, 26, 58, 81, 41, 70, 86, 89,
       28, 88, 32, 57, 87, 63, 57, 41, 24, 20, 40, 42, 66, 76, 46, 37, 62,
       61, 20, 48, 36, 49, 39, 68, 79, 69, 83, 62, 69, 62, 52, 58, 23, 71,
       34, 27, 22, 27, 78, 51, 76, 56, 43, 66, 62, 47, 73, 45, 66, 82, 88,
       48, 72, 24, 45, 42, 59, 35, 37, 60, 70, 37, 65, 37, 88, 28, 28, 39,
       66, 52, 64, 50, 56, 55, 52, 30, 61, 36, 45, 61, 51, 77, 24, 44, 51,
       42, 27, 77, 51, 27, 34, 65, 21, 72, 45, 61, 40, 46, 45, 77, 75, 24,
       65, 87, 69, 63, 31, 25, 44, 45, 79, 82, 25, 89, 47, 63, 67, 60, 47,
       82, 81, 27, 48, 35, 68, 38, 32, 40, 40, 59, 22, 34, 32, 88, 77, 42,
       81, 20, 77, 51, 89, 34, 63, 20, 38, 55, 51, 41, 70, 60, 39, 62, 43,
       34, 77, 83, 45, 81, 78, 66, 76, 71, 34, 65, 31, 27, 68, 43, 37, 71,
       22, 27, 73, 61, 64, 58, 62, 24, 64, 60, 28, 60, 35, 78, 31, 35, 61,
       85, 78, 68, 84, 83, 65, 79, 66, 70, 22, 73, 80, 61, 27, 57, 73, 33,
       41, 61, 29, 65, 70, 80, 52, 46, 33, 19, 18, 58, 59, 72, 79, 82, 80,
       65, 20, 71, 51, 51, 30, 27, 54, 87, 43, 35, 43, 53, 67, 51, 19, 49,
       51, 32, 83, 65, 45, 61, 43, 70, 69, 18, 25, 78, 28, 44, 33, 46, 80,
       86, 87, 69, 32, 82, 84, 39, 54, 40, 55, 29, 83, 67, 83, 35, 87, 77,
       21, 52, 81, 70, 27, 71, 49, 83, 79, 50, 74, 86, 84, 58, 81, 78, 35,
       59, 34, 54, 63, 61, 88, 41, 88, 60, 63, 30, 76, 35, 80, 76, 23, 18,
       83, 53, 69, 61, 76, 88, 56, 59, 46, 77, 74, 89, 83, 68, 51, 62, 20,
       87, 66, 29, 28, 70, 50, 43, 23, 42, 76, 20, 27, 73, 21, 26, 29, 48,
       57, 34, 80, 71, 71, 51, 76])

In [34]:
from scipy import stats
stats.mode(ages)


Out[34]:
ModeResult(mode=array([27]), count=array([13]))

In [ ]: