In [1]:
import math
import matplotlib.pyplot as plt
In [2]:
%%capture
%run "6 - Probability.ipynb"
In [5]:
def normal_approximation_to_binomial(n, p):
"""return mu and sigma corresponding to Binomial(n, p)"""
mu = p * n
sigma = math.sqrt(p * (1 - p) * n)
return mu, sigma
In [3]:
normal_probability_below = normal_cdf
def normal_probability_above(lo, mu=0, sigma=1):
return 1 - normal_cdf(lo, mu, sigma)
def normal_probability_between(lo, hi, mu=0, sigma=1):
return normal_cdf(hi, mu, sigma) - normal_cdf(lo, mu, sigma)
def normal_probabilty_outside(lo, hi, mu=0, sigma=1):
return 1 - normal_probability_between(lo, hi, mu, sigma)
def normal_upper_bound(probability, mu=0, sigma=1):
"""returns the z for which P(Z <= z) = probability"""
return inverse_normal_cdf(probability, mu, sigma)
def normal_lower_bound(probability, mu=0, sigma=1):
"""returns the z for which P(Z >= z) = probability"""
return inverse_normal_cdf(1 - probability, mu, sigma)
def normal_two_sided_bounds(probability, mu=0, sigma=1):
"""returns the symmetric (about the mean) bounds
that contain the specified probability"""
tail_probability = (1 - probability) / 2
# upper bound should have tail_probability above it
upper_bound = normal_lower_bound(tail_probability, mu, sigma)
# lower bound should have tail_probability below it
lower_bound = normal_upper_bound(tail_probability, mu, sigma)
return lower_bound, upper_bound
Now we flip the coin 1,000 times and see if our null hypothesis is true. If so, $X$ will be approximately normally distributed with a mean of 500 and a standard deviation of 15.8:
In [10]:
mu_0, sigma_0 = normal_approximation_to_binomial(1000, 0.5)
mu_0, sigma_0
Out[10]:
A decision must be made with respect to significance, i.e., how willing are we to accept "false positives" (type 1 errors) by rejecting $H_0$ even though it is true? This is often set to 5% or 1%. We will use 5%:
In [11]:
normal_two_sided_bounds(0.95, mu_0, sigma_0)
Out[11]:
If $H_0$ is true, $p$ should equal 0.5. The interval above denotes the values outside of which there is a 5% chance that this is false. Now we can determine the power of a test, i.e., how willing are we to accept "false positives" (type 2 errors) by failing to reject $H_0$ even though it is false.
In [12]:
# 95% bounds based on assumption p is 0.5
lo, hi = normal_two_sided_bounds(0.95, mu_0, sigma_0)
# actual mu and sigma based on p = 0.55
mu_1, sigma_1 = normal_approximation_to_binomial(1000, 0.55)
# a type 2 error means we fail to reject the null hypothesis
# which will happen when X is still in our original interval
type_2_probability = normal_probability_between(lo, hi, mu_1, sigma_1)
power = 1 - type_2_probability # 0.887
Run a 5% significance test to find the cutoff below which 95% of the probability lies:
In [13]:
hi = normal_upper_bound(0.95, mu_0, sigma_0)
# is 526 (< 531, since we need more probability in the upper tail)
type_2_probability = normal_probability_below(hi, mu_1, sigma_1)
power = 1 - type_2_probability # 0.936
In [15]:
def two_sided_p_value(x, mu=0, sigma=1):
if x >= mu:
# if x is greater than the mean, the tail is what's greater than x
return 2 * normal_probability_above(x, mu, sigma)
else:
# if x is less than the mean, the tail is what's less than x
return 2 * normal_probability_below(x, mu, sigma)
two_sided_p_value(529.5, mu_0, sigma_0) # 0.062
Out[15]:
In [17]:
extreme_value_count = 0
for _ in range(100000):
num_heads = sum(1 if random.random() < 0.5 else 0 # count # of heads
for _ in range(1000)) # in 1000 flips
if num_heads >= 530 or num_heads <= 470: # and count how often
extreme_value_count += 1 # the # is 'extreme'
print(extreme_value_count / 100000) # 0.062
In [18]:
two_sided_p_value(531.5, mu_0, sigma_0) # 0.0463
Out[18]:
Make sure your data is roughly normally distributed before using normal_probability_above to compute p-values. The annals of bad data science are filled with examples of people opining that the chance of some observed event occurring at random is one in a million, when what they really mean is “the chance, assuming the data is distributed normally,” which is pretty meaningless if the data isn’t.
There are various statistical tests for normality, but even plotting the data is a good start.
In [19]:
math.sqrt(p * (1 - p) / 1000)
Not knowing $p$ we use our estimate:
In [21]:
p_hat = 525 / 1000
mu = p_hat
sigma = math.sqrt(p_hat * (1 - p_hat) / 1000)
sigma
Out[21]:
Assuming a normal distribution, we conclude that we are 95% confident that the interval below includes the true $p$:
In [22]:
normal_two_sided_bounds(0.95, mu, sigma)
Out[22]:
0.5 falls within our interval so we do not conclude that the coin is unfair.
What if we observe 540 heads though?
In [23]:
p_hat = 540 / 1000
mu = p_hat
sigma = math.sqrt(p_hat * (1 - p_hat) / 1000) # 0.0158
normal_two_sided_bounds(0.95, mu, sigma) # [0.5091, 0.5709]
Out[23]:
In this scenario, 0.5 falls outside of our interval so the "fair coin" hypothesis is not confirmed.
P-hacking involves using various "hacks" to get a $p$ value to go below 0.05: creating a superfluous number of hypotheses, selectively removing outliers, etc.
In [24]:
def run_experiment():
"""flip a fair coin 1000 times, True = heads, False = tails"""
return [random.random() < 0.5 for _ in range(1000)]
def reject_fairness(experiment):
"""using the 5% significance levels"""
num_heads = len([flip for flip in experiment if flip])
return num_heads < 469 or num_heads > 531
random.seed(0)
experiments = [run_experiment() for _ in range(1000)]
num_rejections = len([experiment for experiment in experiments if reject_fairness(experiment)])
num_rejections # 46
Out[24]:
Valid inferences come from a priori hypotheses (hypotheses created before collecting any data) and data cleansing without reference to the hypotheses.
In [26]:
def estimated_parameters(N, n):
p = n / N
sigma = math.sqrt(p * (1 - p) / N)
return p, sigma
def a_b_test_statistic(N_A, n_A, N_B, n_B):
p_A, sigma_A = estimated_parameters(N_A, n_A)
p_B, sigma_B = estimated_parameters(N_B, n_B)
return (p_B - p_A) / math.sqrt(sigma_A ** 2 + sigma_B ** 2)
z = a_b_test_statistic(1000, 200, 1000, 180)
z
Out[26]:
In [27]:
two_sided_p_value(z)
Out[27]:
In [28]:
z = a_b_test_statistic(1000, 200, 1000, 150)
two_sided_p_value(z)
Out[28]:
In Bayesian Inference we start with a prior distribution for the parameters and then use the actual observations to get the posterior distribution of the same parameters. So instead of judging the probability of hypotheses, we make probability judgments about the parameters.
We will use the Beta distribution to convert all probabilities to a value between 0 and 1.
In [29]:
def B(alpha, beta):
"""a normalizing constant so that the total probability is 1"""
return math.gamma(alpha) * math.gamma(beta) / math.gamma(alpha + beta)
def beta_pdf(x, alpha, beta):
if x < 0 or x > 1: # no weight outside of [0, 1]
return 0
return x ** (alpha - 1) * (1 - x) ** (beta - 1) / B(alpha, beta)