Let's create some fake income data, centered around 27,000 with a normal distribution and standard deviation of 15,000, with 10,000 data points. (We'll discuss those terms more later, if you're not familiar with them.)
Then, compute the mean (average) - it should be close to 27,000:
In [4]:
import numpy as np
incomes = np.random.normal(27000, 15000, 10000)
np.mean(incomes)
Out[4]:
We can segment the income data into 50 buckets, and plot it as a histogram:
In [5]:
%matplotlib inline
%config InlineBackend.figure_format='retina'
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context("paper")
sns.set_style("white")
sns.set()
# ----------------------------------------------
incomes = np.random.normal(27000, 15000, 10000)
print(np.mean(incomes))
plt.hist(incomes, 50)
plt.show()
Now compute the median - since we have a nice, even distribution it too should be close to 27,000:
In [27]:
np.median(incomes)
Out[27]:
Now we'll add Donald Trump into the mix. Darn income inequality!
In [28]:
incomes = np.append(incomes, [1000000000])
The median won't change much, but the mean does:
In [29]:
np.median(incomes)
Out[29]:
In [30]:
np.mean(incomes)
Out[30]:
Next, let's generate some fake age data for 500 people:
In [33]:
ages = np.random.randint(18, high=90, size=500)
ages
Out[33]:
In [34]:
from scipy import stats
stats.mode(ages)
Out[34]:
In [ ]: