Think Bayes

Second Edition

Copyright 2020 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

In [1]:
# If we're running on Colab, install empiricaldist

import sys
IN_COLAB = 'google.colab' in sys.modules

    !pip install empiricaldist

In [2]:
# Get and create directories

import os

if not os.path.exists(''):
if not os.path.exists('figs'):
    !mkdir figs
if not os.path.exists('tables'):
    !mkdir tables

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from empiricaldist import Pmf
from utils import decorate, savefig

The Euro Problem

In Information Theory, Inference, and Learning Algorithms, David MacKay poses this problem:

"A statistical statement appeared in The Guardian on Friday January 4, 2002:

When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. 'It looks very suspicious to me,' said Barry Blight, a statistics lecturer at the London School of Economics. 'If the coin were unbiased, the chance of getting a result as extreme as that would be less than 7\%.'

"But [MacKay asks] do these data give evidence that the coin is biased rather than fair?"

To answer that question, we'll proceed in two steps. First we'll use the binomial distribution to see where that 7% came from; then we'll use Bayes's Theorem to estimate the probability that this coin comes up heads.

The binomial distribution

Suppose I tell you that a coin is "fair", that is, the probability of heads is 50%. If you spin it twice, there are four outcomes: HH, HT, TH, and TT. All four outcomes have the same probability, 25%.

If we add up the total number of heads, there are three possible outcomes: 0, 1, or 2. The probability of 0 and 2 is 25%, and the probability of 1 is 50%.

More generally, suppose the probability of heads is p and we spin the coin n times. What is the probability that we get a total of k heads?

The answer is given by the binomial distribution:

$P(k; n, p) = \binom{n}{k} p^k (1-p)^{n-k}$

where $\binom{n}{k}$ is the binomial coefficient, usually pronounced "n choose k".

We can compute the binomial distribution ourselves, but we can also use the SciPy function binom.pmf:

In [4]:
from scipy.stats import binom

n = 2
p = 0.5
ks = np.arange(n+1)

a = binom.pmf(ks, n, p)

array([0.25, 0.5 , 0.25])

If we put this array in a Pmf, the result is the distribution of k for the given values of n and p.

In [5]:
pmf_k = Pmf(a, ks)

0 0.25
1 0.50
2 0.25

In [6]:
from utils import write_pmf
write_pmf(pmf_k, 'table03-01')

The following function computes the binomial distribution for given values of n and p:

In [7]:
def make_binomial(n, p):
    """Make a binomial PMF.
    n: number of spins
    p: probability of heads
    returns: Pmf representing the distribution
    ks = np.arange(n+1)
    a = binom.pmf(ks, n, p)
    return Pmf(a, ks)

And here's what it looks like with n=250 and p=0.5:

In [8]:
pmf_k = make_binomial(n=250, p=0.5)
pmf_k.plot(label='n=250, p=0.5')

decorate(xlabel='Number of heads (k)',
         title='Binomial distribution')


The most likely value in this distribution is 125:

In [9]:


But even though it is the most likely value, the probability that we get exactly 125 heads is only about 5%.

In [10]:


In MacKay's example, we got 140 heads, which is less likely than 125:

In [11]:


In the article MacKay quotes, the statistician says, ‘If the coin were unbiased the chance of getting a result as extreme as that would be less than 7%’.

We can use the binomial distribution to check his math. The following function takes a PMF and computes the total probability of values greater than or equal to threshold.

In [12]:
def ge_dist(pmf, threshold):
    """Probability of values greater than a threshold.
    pmf: Series representing a PMF
    threshold: value to compare to
    returns: probability
    ge = (pmf.index >= threshold)
    total = pmf[ge].sum()
    return total

Here's the probability of getting 140 heads or more:

In [13]:
ge_dist(pmf_k, 140)


Pmf provides a method that does the same computation.

In [14]:


The result is about 3.3%, which is less than 7%. The reason is that the statistician includes all values "as extreme as" 140, which includes values less than or equal to 110, because 140 exceeds the expected value by 15 and 110 falls short by 15.

In [15]:


The probability of values less than or equal to 110 is also 3.3%, so the total probability of values "as extreme" as 140 is 6.6%.

The point of this calculation is that these extreme values are unlikely if the coin is fair.

That's interesting, but it doesn't answer MacKay's question. Let's see if we can.

The Euro problem

Any given coin has some probability of landing heads up when spun on edge; I'll call this probability x.

It seems reasonable to believe that x depends on physical characteristics of the coin, like the distribution of weight.

If a coin is perfectly balanced, we expect x to be close to 50%, but for a lopsided coin, x might be substantially different. We can use Bayes's theorem and the observed data to estimate x.

For simplicity, I'll start with a uniform prior, which assume that all values of x are equally likely. That might not be a reasonable assumption, so we'll come back and consider other priors later.

We can make a uniform prior like this:

In [16]:
hypos = np.linspace(0, 1, 101)
prior = Pmf(1, hypos)

I'll use a dictionary to store the likelihoods for H and T:

In [17]:
likelihood = {
    'H': hypos,
    'T': 1 - hypos

I'll use a string to represent the dataset:

In [18]:
dataset = 'H' * 140 + 'T' * 110

The following function does the update.

In [19]:
def update_euro(pmf, dataset):
    """Updates the Suite with the given number of heads and tails.

    pmf: Pmf representing the prior
    data: tuple of heads and tails
    for data in dataset:
        pmf *= likelihood[data]


And here's how we use it.

In [20]:
posterior = prior.copy()
update_euro(posterior, dataset)

Here's what the posterior looks like.

In [21]:
def decorate_euro(title):
    decorate(xlabel='Proportion of heads (x)',

In [22]:
posterior.plot(label='140 heads out of 250')
decorate_euro(title='Posterior distribution of x')

The peak of the posterior is at 56%, which is the proportion of heads in the dataset.

In [23]:


Different priors

Let's see how that looks with different priors. Here's the uniform prior again.

In [24]:
uniform = Pmf(1, hypos, name='uniform')


And here's a triangle-shaped prior.

In [25]:
ramp_up = np.arange(50)
ramp_down = np.arange(50, -1, -1)

a = np.append(ramp_up, ramp_down)

triangle = Pmf(a, hypos, name='triangle')


Here's what they look like:

In [26]:
decorate_euro(title='Uniform and triangle prior distributions')


If we update them both with the same data:

In [27]:
update_euro(uniform, dataset)
update_euro(triangle, dataset)

Here are the posteriors.

In [28]:
decorate_euro(title='Posterior distributions')

The results are almost identical; the remaining difference is unlikely to matter in practice.

The binomial likelihood function

We can make the Euro class more efficient by computing the likelihood of the entire dataset at once, rather than one coin toss at a time.

If the probability of heads is p, we can compute the probability of k=140 heads in n=250 tosses using the binomial PMF.

In [29]:
from scipy.stats import binom

def update_binomial(pmf, data):
    """Update the PMF using the binomial distribution.
    pmf: Pmf representing the prior
    data: tuple of integers k and n
    k, n = data
    xs = pmf.qs
    likelihood = binom.pmf(k, n, xs)
    pmf *= likelihood

The data are represented with a tuple of values for k and n, rather than a long string of outcomes.

Here's the update.

In [30]:
uniform2 = Pmf(1, hypos, name='uniform2')
data = 140, 250
update_binomial(uniform2, data)

Here's what the posterior looks like.

In [31]:
decorate_euro(title='Posterior distributions computed two ways')

The results are the same, within floating-point error.

In [32]:



Exercise: In Major League Baseball, most players have a batting average between 200 and 330, which means that the probability of getting a hit is between 0.2 and 0.33.

Suppose a new player appearing in his first game gets 3 hits out of 3 attempts. What is the posterior distribution for his probability of getting a hit?

For this exercise, I will construct the prior distribution by starting with a uniform distribution and updating it with imaginary data until it has a shape that reflects my background knowledge of batting averages.

In [33]:
hypos = np.linspace(0.1, 0.4, 101)
prior = Pmf(1, hypos)

In [34]:
likelihood = {
    'Y': hypos,
    'N': 1-hypos

In [35]:
dataset = 'Y' * 25 + 'N' * 75

In [36]:
for data in dataset:
    prior *= likelihood[data]



In [37]:
decorate(xlabel='Probability of getting a hit',

This distribution indicates that most players have a batting average near 250, with only a few players below 175 or above 350. I'm not sure how accurately this prior reflects the distribution of batting averages in Major League Baseball, but it is good enough for this exercise.

Now update this distribution with the data and plot the posterior. What is the most likely value in the posterior distribution?

In [38]:
# Solution

posterior = prior.copy()

for data in 'YYY':
    posterior *= likelihood[data]



In [39]:
# Solution

posterior.plot(label='posterior ')
decorate(xlabel='Probability of getting a hit',

In [40]:
# Solution



In [41]:
# Solution



Exercise: Whenever you survey people about sensitive issues, you have to deal with social desirability bias, which is the tendency of people to shade their answers to show themselves in the most positive light.

One of the ways to improve the accuracy of the results is randomized response.

As an example, suppose you ask 100 people to flip a coin and:

  • If they get heads, they report YES.

  • If they get tails, they honestly answer the question "Do you cheat on your taxes?"

And suppose you get 80 YESes and 20 NOs. Based on this data, what is the posterior distribution for the fraction of people who cheat on their taxes? What is the most likely value in the posterior distribution?

In [42]:
# Solution

hypos = np.linspace(0, 1, 101)
prior = Pmf(1, hypos)

In [43]:
# Solution

likelihood = {
    'Y': 0.5 + hypos/2,
    'N': (1-hypos)/2

In [44]:
# Solution

dataset = 'Y' * 80 + 'N' * 20

posterior = prior.copy()

for data in dataset:
    posterior *= likelihood[data]



In [45]:
# Solution

posterior.plot(label='80 YES, 20 NO')
decorate(xlabel='Proportion of cheaters',

In [46]:
# Solution



Exercise: Suppose that instead of observing coin spins directly, you measure the outcome using an instrument that is not always correct. Specifically, suppose the probability is y=0.2 that an actual heads is reported as tails, or actual tails reported as heads.

If we spin a coin 250 times and the instrument reports 140 heads, what is the posterior distribution of x?

What happens as you vary the value of y?

In [47]:
# Solution

def update_unreliable(pmf, dataset, y):
    likelihood = {
        'H': (1-y) * hypos + y * (1-hypos),
        'T': y * hypos + (1-y) * (1-hypos)
    for data in dataset:
        pmf *= likelihood[data]


In [48]:
# Solution

hypos = np.linspace(0, 1, 101)
prior = Pmf(1, hypos)
dataset = 'H' * 140 + 'T' * 110

posterior00 = prior.copy()
update_unreliable(posterior00, dataset, 0.0)

posterior02 = prior.copy()
update_unreliable(posterior02, dataset, 0.2)

posterior04 = prior.copy()
update_unreliable(posterior04, dataset, 0.4)

In [49]:
# Solution

posterior00.plot(label='y = 0.0')
posterior02.plot(label='y = 0.2')
posterior04.plot(label='y = 0.4')
decorate(xlabel='Proportion of heads',

In [50]:
# Solution

posterior00.idxmax(), posterior02.idxmax(), posterior04.idxmax()

(0.56, 0.6, 0.8)

Exercise: In preparation for an alien invasion, the Earth Defense League (EDL) has been working on new missiles to shoot down space invaders. Of course, some missile designs are better than others; let's assume that each design has some probability of hitting an alien ship, x.

Based on previous tests, the distribution of x in the population of designs is approximately uniform between 0.1 and 0.4.

Now suppose the new ultra-secret Alien Blaster 9000 is being tested. In a press conference, an EDL general reports that the new design has been tested twice, taking two shots during each test. The results of the test are confidential, so the general won't say how many targets were hit, but they report: "The same number of targets were hit in the two tests, so we have reason to think this new design is consistent."

Is this data good or bad; that is, does it increase or decrease your estimate of x for the Alien Blaster 9000?

Hint: If the probability of hitting each target is $x$, the probability of hitting one target in both tests is $[2x(1-x)]^2$.

In [51]:
# Solution

hypos = np.linspace(0.1, 0.4, 101)
prior = Pmf(1, hypos)

In [52]:
# Solution

# specific version for n=2 shots
x = hypos
likes = [(1-x)**4, (2*x*(1-x))**2, x**4]
likelihood = np.sum(likes, axis=0)

In [53]:
# Solution

# general version for any n shots per test

from scipy.stats import binom

n = 2
likes2 = [binom.pmf(k, n, x)**2 for k in range(n+1)]
likelihood2 = np.sum(likes2, axis=0)

In [54]:
# Solution

plt.plot(x, likelihood, label='special case')
plt.plot(x, likelihood2, label='general formula')
decorate(xlabel='Probability of hitting the target',
         title='Likelihood of getting the same result')

In [55]:
# Solution

posterior = prior * likelihood


In [56]:
# Solution

posterior.plot(label='Two tests, two shots, same outcome')
decorate(xlabel='Probability of hitting the target',
         title='Posterior distribution',
         ylim=[0, 0.015])

In [57]:
# Solution

# Getting the same result in both tests is more likely for 
# extreme values of `x` and least likely when `x=0.5`.

# In this example, the prior suggests that `x` is less than 0.5,
# and the update gives more weight to extreme values.

# So the data makes lower values of `x` more likely.