Statistics - Basic Theorems



In [31]:
%matplotlib notebook

import numpy as np
import seaborn as sns

sns.set_context("paper")

Law Of Large Numbers

  • ?? Sample mean converges to population mean for number of samples tending to infinite
  • ?? Average of results approximately equal to expected value (mean)
  • "As more observations are collected, the proportion $p_n$ of occurrences with a particular outcome converges to the probability $p$ of that outcome." OpenIntro Statistics

In [33]:
# Define info for die population
die = np.arange(6)+1
die_dist = np.array([1/len(values)]*len(values))

In [22]:
# Expected value
sum([v*p for v, p in zip(die, die_dist)])


Out[22]:
3.5

In [11]:
# Simulate N die rolls
num_rolls = 10000
rolls_res = np.random.choice(die, num_rolls, p=die_dist)

In [32]:
sns.plt.plot(np.arange(1, num_rolls), [rolls_res[:i].mean() for i in range(1, num_rolls)])
sns.plt.xscale('log')
sns.plt.show()


Central Limit Theorem

  • ?? The sum of values draw independently from a distribution approximate a normal distribution the more values we extract (regardless of the underline distribution)
  • "if we collect a large enough sample from a population, the sample mean should be equal to, more or less, the population mean"

In [35]:
sns.barplot(die, die_dist)
sns.plt.show()



In [70]:
np.random.choice(die, (5, 3), p=die_dist)


Out[70]:
array([[2, 3, 1],
       [1, 1, 4],
       [6, 2, 5],
       [6, 5, 1],
       [2, 1, 3]])

In [76]:
num_rolls = 1000
num_dice = [1, 2, 3, 4, 5]

fig, axes = sns.plt.subplots(len(num_dice))
fig.tight_layout()
for i, num in enumerate(num_dice):
    rolls_res = np.random.choice(die, (num_rolls, num), p=die_dist).sum(axis=1)
    sns.distplot(rolls_res, ax=axes[i])
    #axes[i].set_xticklabels(axes[i].xaxis.get_majorticklabels(), rotation=30)
    #axes[i].set_xlabel(name)
sns.plt.show()


Experiment: Sum Of N Dice


In [77]:
# Define info for die population
die = np.arange(6)+1
die_dist = np.array([1/len(values)]*len(values))

In [83]:
rolls_res = np.random.choice(die, (10000, 5), p=die_dist).sum(axis=1)


Out[83]:
array([17, 25, 17, ..., 15, 21, 10])

In [86]:
vals, counts = np.unique(rolls_res, return_counts=True)
num_vals = len(vals)
total_count = counts.sum()

In [88]:
sns.barplot(vals, counts/total_count)
sns.plt.show()



In [92]:
sns.plt.plot(vals, counts/total_count)
sns.plt.show()



In [103]:
data = counts/total_count
p = np.arange(26) / float(26)
sns.plt.plot(vals, np.cumsum(data))
sns.plt.show()



In [ ]: