In [1]:
from scipy.stats import bernoulli, binom, uniform
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)

%matplotlib inline

Problem Setup

We have a coin. We don't know whether it is biased or not (i.e. the probability of one side is greater than 0.5, and the probability of the other side is less than 0.5). How do we figure this out?

We can flip the coin many times, and measure the number of times heads comes up out of all flips.

This is a power calculation: How many times would we need to flip the coin in order to conclude that it is indeed rigged?

We can compare our measured proportion of heads to a known model of randomness.

This is hypothesis testing: How probable is it that we would have observed the number of flips in our coin, assuming that the coin was unbiased?

A Biased Coin

The coin has a biased probability that is hidden. We are going to use statistical tests to estimate its probability of tossing a heads. Evaluate the cell, but don't peek at the value of p_hidden!


In [2]:
p_hidden = 1 / np.log10(np.log10(1 - (1 - np.exp(135))))

Hypotheses

  • H_null: The coin is unbiased. p = 0.5
  • H_altr: The coin is biased towards heads. p > 0.5

In our setup, we state that we either reject or do not reject the null hypothesis.

Bernoulli Trials

If we were to flip the coin 1000 times, what would we get?


In [3]:
n_flips = 1000
p = 0.5
bernoulli.rvs(p=p, size=n_flips)


Out[3]:
array([0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0,
       1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1,
       1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
       1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
       1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1,
       0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1,
       1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0,
       1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0])

Binomial Experiment

Each coin flip is a bernoulli trial. If we do 1000 Bernoulli trials, the number of heads will follow a Binomial(1000, p) distribution.

What does one Binomial experiment look like? Run the following cell a few times.


In [10]:
n_flips = 1000
p = 0.5
size = 1
binom.rvs(n_flips, p, size=size)


Out[10]:
array([507])

Binomial Distribution

In the previous cell, when we run it once, we draw one instance of 1000 Bernoulli trials, each with a probability p of turning heads. The 1000 Bernoulli trials can be thought of as one Binomial trial.

Each time round, the number of heads drawn from 1000 Bernoulli trials (or 1 Binomial trial) changes. Each time we draw 1 Binomial trial, we draw a sample from Binomial(n=1000, p=0.5). What would the distribution of the number of heads be from 1000 Binomial trials?


In [11]:
n_flips = 1000
p = 0.5
size = 1000
proportions = np.zeros(size) # initialize an array to store the data.

for i, n_heads in enumerate(binom.rvs(n_flips, p, size=size)):
    proportions[i] = n_heads
    
plt.hist(proportions)


Out[11]:
(array([   9.,   26.,   97.,  183.,  239.,  255.,  121.,   51.,   17.,    2.]),
 array([ 452. ,  462.2,  472.4,  482.6,  492.8,  503. ,  513.2,  523.4,
         533.6,  543.8,  554. ]),
 <a list of 10 Patch objects>)

Pilot Experiment

Let's run a pilot experiment. Flipping coins isn't so expensive, so let's flip the coin on hand 50 times, and see whether it may or may not be biased.

Question: What experiment is this pilot going to be called? n=50 Bernoulli trials, or size=50 Binomial trials?


In [26]:
# Flip the coin 50 times
n_flips = 50
pilot_results = bernoulli.rvs(p=p_hidden, size=n_flips)

# Compute an estimated value of p_hidden
p_hat = np.sum(pilot_results) / len(pilot_results)
p_hat


Out[26]:
0.64000000000000001

Hypothesis Test


In [ ]:


In [ ]:


In [ ]:


In [ ]:

Power Calculation

We have estimated p_hidden by conducting a pilot experiment of one Binomial(n=50, p=p_hidden). The estimated value of p_hidden is p_hat. Let's assume p_hat is correct for a moment.

What is the necessary sample size for us to have such that we can be confident that we won't:

  1. Falsely reject the null hypothesis (Type I Error: False Positive)? In other words, the null hypothesis should not be rejected, and we were wrong about our alternate hypothesis.
  2. Falsely fail to reject the null hypothesis (Type II Error: False Negative)? In other words, we were correct about our hypothesis, but our data were not good enough to reject the null hypothesis.

The place where most of our intuition fails is about here. We have to be comfortable with being wrong simply by chance. The only question is how comfortable are we being wrong?

Alphabet(a)

  • alpha: the probability of falsely rejecting the null hypothesis.
  • beta: the probability of falsely failing to reject the null hypothesis.
  • power: 1 - beta. The probability of not falsely failing to reject the null hypothesis.

We have to define this a priori, at some probability


In [7]:
alpha = 0.05
beta = 0.05

In [8]:
p_hidden_est = binom.rvs(n=1000, p=p_hidden, size=50)
plt.hist(p_hidden_est)
plt.hist(proportions)


Out[8]:
(array([   4.,   13.,   54.,  113.,  251.,  255.,  171.,  103.,   27.,    9.]),
 array([ 443. ,  453.7,  464.4,  475.1,  485.8,  496.5,  507.2,  517.9,
         528.6,  539.3,  550. ]),
 <a list of 10 Patch objects>)

In [ ]:


In [ ]: