Think Bayes

Second Edition

Copyright 2020 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

In [1]:
import numpy as np
import pandas as pd

Bayes's Theorem

Bayes's Theorem states:

$P(H|D) = P(H) ~ P(D|H) ~/~ P(D)$


  • $H$ stands for "hypothesis", and

  • $D$ stands for "data".

Each term in this equation has a name:

  • $P(H)$ is the "prior probability" of the hypothesis, which represents how confident you are that $H$ is true prior to seeing the data,

  • $P(D|H)$ is the "likelihood" of the data, which is the probability of seeing $D$ if the hypothesis is true,

  • $P(D)$ is the "total probability of the data", that is, the chance of seeing $D$ regardless of whether $H$ is true or not.

  • $P(H|D)$ is the "posterior probability" of the hypothesis, which indicates how confident you should be that $H$ is true after taking the data into account.

Here's a problem I got from Wikipedia a long time ago:

Suppose you have two bowls of cookies. Bowl 1 contains 30 vanilla and 10 chocolate cookies. Bowl 2 contains 20 of each kind.

You choose one of the bowls at random and, without looking into the bowl, choose one of the cookies at random. It turns out to be a vanilla cookie.

What is the chance that you chose Bowl 1?

We'll assume that there was an equal chance of choosing either bowl and an equal chance of choosing any cookie in the bowl.

We can solve this problem using Bayes's Theorem. First, I'll define $H$ and $D$:

  • $H$ is the hypothesis that the bowl you chose is Bowl 1.

  • $D$ is the datum that the cookie is vanilla ("datum" is the rarely-used singular form of "data").

What we want is the posterior probability of $H$, which is $P(H|D)$. It is not obvious how to compute it directly, but if we can figure out the terms on the right-hand side of Bayes's Theorem, we can get to it indirectly.

  1. $P(H)$ is the prior probability of $H$, which is the probability of choosing Bowl 1 before we see the data. If there was an equal chance of choosing either bowl, $P(H)$ is $1/2$.

  2. $P(D|H)$ is the likelihood of the data, which is the chance of getting a vanilla cookie if $H$ is true, in other words, the chance of getting a vanilla cookie from Bowl 1, which is $30/40$ or $3/4$.

  3. $P(D)$ is the total probability of the data, which is the chance of getting a vanilla cookie whether $H$ is true or not. In this example, we can figure out $P(D)$ directly: because the bowls are equally likely, and they contain the same number of cookies, you were equally likely to choose any cookie. Combining the two bowls, there are 50 vanilla and 30 chocolate cookies, so the probability of choosing a vanilla cookie is $50/80$ or $5/8$.

Now that we have the terms on the right-hand side, we can use Bayes's Theorem to combine them.

In [2]:
prior = 1/2

In [3]:
likelihood = 3/4

In [4]:
prob_data = 5/8

In [5]:
posterior = prior * likelihood / prob_data

The posterior probability is $0.6$, a little higher than the prior, which was $0.5$.

So the vanilla cookie makes us a little more certain that we chose Bowl 1.

Exercise: What if we had chosen a chocolate cookie instead; what would be the posterior probability of Bowl 1?

In [6]:
# Solution goes here

The Bayes table

In the cookie problem we were able to compute the probability of the data directly, but that's not always the case. In fact, computing the total probability of the data is often the hardest part of the problem.

Fortunately, there is another way to solve problems like this that makes it easier: the Bayes table.

You can write a Bayes table on paper or use a spreadsheet, but in this notebook I'll use a Pandas DataFrame.

Here's an empty DataFrame with one row for each hypothesis:

In [7]:
import pandas as pd

table = pd.DataFrame(index=['Bowl 1', 'Bowl 2'])

Now I'll add a column to represent the priors:

In [8]:
table['prior'] = 1/2, 1/2

And a column for the likelihoods:

In [9]:
table['likelihood'] = 3/4, 1/2

Here we see a difference from the previous method: we compute likelihoods for both hypotheses, not just Bowl 1:

  • The chance of getting a vanilla cookie from Bowl 1 is 3/4.

  • The chance of getting a vanilla cookie from Bowl 2 is 1/2.

The following cells write the Bayes table to a file.

In [10]:
# Get

import os

if not os.path.exists(''):
if not os.path.exists('tables'):
    !mkdir tables

In [11]:
from utils import write_table

write_table(table, 'table01-01')

The next step is similar to what we did with Bayes's Theorem; we multiply the priors by the likelihoods:

In [12]:
table['unnorm'] = table['prior'] * table['likelihood']

I called the result unnorm because it is an "unnormalized posterior". To see what that means, let's compare the right-hand side of Bayes's Theorem:

$P(H) P(D|H)~/~P(D)$

To what we have computed so far:

$P(H) P(D|H)$

The difference is that we have not divided through by $P(D)$, the total probability of the data. So let's do that.

There are two ways to compute $P(D)$:

  1. Sometimes we can figure it out directly.

  2. Otherwise, we can compute it by adding up the unnormalized posteriors.

Here's the total of the unnormalized posteriors:

In [13]:
prob_data = table['unnorm'].sum()

Notice that we get 5/8, which is what we got by computing $P(D)$ directly.

Now we divide by $P(D)$ to get the posteriors:

In [14]:
table['posterior'] = table['unnorm'] / prob_data

The posterior probability for Bowl 1 is 0.6, which is what we got using Bayes's Theorem explicitly.

As a bonus, we also get the posterior probability of Bowl 2, which is 0.4.

The posterior probabilities add up to 1, which they should, because the hypotheses are "complementary"; that is, either one of them is true or the other, but not both. So their probabilities have to add up to 1.

When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1. This process is called "normalization", which is why the total probability of the data is also called the "normalizing constant"

In [15]:
write_table(table, 'table01-02')

The dice problem

Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided die. I choose one of the dice at random, roll it, and report that the outcome is a 1. What is the probability that I chose the 6-sided die?

Here's a solution using a Bayes table:

In [16]:
table2 = pd.DataFrame(index=[6, 8, 12])

I'll use fractions to represent the prior probabilities and the likelihoods. That way they don't get rounded off to floating-point numbers.

In [17]:
from fractions import Fraction

table2['prior'] = Fraction(1, 3)
table2['likelihood'] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)

Once you have priors and likelhoods, the remaining steps are always the same.

In [18]:
table2['unnorm'] = table2['prior'] * table2['likelihood']
prob_data2 = table2['unnorm'].sum()
table2['posterior'] = table2['unnorm'] / prob_data2

The posterior probability of the 6-sided die is 4/9.

In [19]:
write_table(table2, 'table01-03')

The Monty Hall problem

The Monty Hall problem is based on one of the regular games on a television show called "The Price is Right".
If you are a contestant on the show, here's how the game works:

  • Monty shows you three closed doors numbered 1, 2, and 3. He tells you that there is a prize behind each door.

  • One prize is valuable (traditionally a car), the other two are less valuable (traditionally goats).

  • The object of the game is to guess which door has the car. If you guess right, you get to keep the car.

Suppose you pick Door 1. Before opening the door you chose, Monty opens Door 3 and reveals a goat. Then Monty offers you the option to stick with your original choice or switch to the remaining unopened door.

To maximize your chance of winning the car, should you stick with Door 1 or switch to Door 2?

To answer this question, we have to make some assumptions about the behavior of the host:

  • Monty always opens a door and offers you the option to switch.

  • He never opens the door you picked or the door with the car.

  • If you choose the door with the car, he chooses one of the other doors at random.

Here's a Bayes table that represent the hypotheses.

In [20]:
table3 = pd.DataFrame(index=['Door 1', 'Door 2', 'Door 3'])

And here are the priors and likelihoods.

In [21]:
table3['prior'] = Fraction(1, 3)
table3['likelihood'] = Fraction(1, 2), 1, 0

The next step is always the same.

In [22]:
table3['unnorm'] = table3['prior'] * table3['likelihood']
prob_data3 = table3['unnorm'].sum()
table3['posterior'] = table3['unnorm'] / prob_data3

The posterior probability for Door 2 is 2/3, so you are better off switching.

In [23]:
write_table(table3, 'table01-04')


Exercise: Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides. You choose a coin at random and see that one of the sides is heads. What is the probability that you chose the trick coin?

In [24]:
# Solution goes here

Exercise: Suppose you meet someone and learn that they have two children. You ask if either child is a girl and they say yes. What is the probability that both children are girls?

Hint: Start with four equally likely hypotheses.

In [25]:
# Solution goes here

Exercise: There are many variations of the Monty Hall problem.
For example, suppose Monty always chooses Door 2 if he can and only chooses Door 3 if he has to (because the car is behind Door 2).

If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?

If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?

In [26]:
# Solution goes here

In [27]:
# Solution goes here

Exercise: M&M's are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&M's, changes the mixture of colors from time to time. In 1995, they introduced blue M&M's.

  • In 1994, the color mix in a bag of plain M&M's was 30\% Brown, 20\% Yellow, 20\% Red, 10\% Green, 10\% Orange, 10\% Tan.

  • In 1996, it was 24\% Blue , 20\% Green, 16\% Orange, 14\% Yellow, 13\% Red, 13\% Brown.

Suppose a friend of mine has two bags of M&M's, and he tells me that one is from 1994 and one from 1996. He won't tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

Hint: The trick to this question is to define the hypotheses and the data carefully.

In [28]:
# Solution goes here

In [ ]: