Code and exercises from my workshop on Bayesian statistics in Python.
Copyright 2016 Allen Downey
MIT License: https://opensource.org/licenses/MIT
In [ ]:
from __future__ import print_function, division
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from thinkbayes2 import Pmf, Suite
import thinkplot
In [ ]:
d6 = Pmf()
A Pmf is a map from possible outcomes to their probabilities.
In [ ]:
for x in [1,2,3,4,5,6]:
d6[x] = 1
Initially the probabilities don't add up to 1.
In [ ]:
d6.Print()
Normalize
adds up the probabilities and divides through. The return value is the total probability before normalizing.
In [ ]:
d6.Normalize()
Now the Pmf is normalized.
In [ ]:
d6.Print()
And we can compute its mean (which only works if it's normalized).
In [ ]:
d6.Mean()
Random
chooses a random value from the Pmf.
In [ ]:
d6.Random()
thinkplot
provides methods for plotting Pmfs in a few different styles.
In [ ]:
thinkplot.Hist(d6)
Exercise 1: The Pmf object provides __add__
, so you can use the +
operator to compute the Pmf of the sum of two dice.
Compute and plot the Pmf of the sum of two 6-sided dice.
In [ ]:
# Solution goes here
Exercise 2: Suppose I roll two dice and tell you the result is greater than 3.
Plot the Pmf of the remaining possible outcomes and compute its mean.
In [ ]:
# Solution goes here
In [ ]:
cookie = Pmf(['Bowl 1', 'Bowl 2'])
cookie.Print()
Update each hypothesis with the likelihood of the data (a vanilla cookie).
In [ ]:
cookie['Bowl 1'] *= 0.75
cookie['Bowl 2'] *= 0.5
cookie.Normalize()
Print the posterior probabilities.
In [ ]:
cookie.Print()
Exercise 3: Suppose we put the first cookie back, stir, choose again from the same bowl, and get a chocolate cookie.
Hint: The posterior (after the first cookie) becomes the prior (before the second cookie).
In [ ]:
# Solution goes here
Exercise 4: Instead of doing two updates, what if we collapse the two pieces of data into one update?
Re-initialize Pmf
with two equally likely hypotheses and perform one update based on two pieces of data, a vanilla cookie and a chocolate cookie.
The result should be the same regardless of how many updates you do (or the order of updates).
In [ ]:
# Solution goes here
In [ ]:
pmf = Pmf([4, 6, 8, 12])
pmf.Print()
Exercise 5: We'll solve this problem two ways. First we'll do it "by hand", as we did with the cookie problem; that is, we'll multiply each hypothesis by the likelihood of the data, and then renormalize.
In the space below, update suite
based on the likelihood of the data (rolling a 6), then normalize and print the results.
In [ ]:
# Solution goes here
Exercise 6: Now let's do the same calculation using Suite.Update
.
Write a definition for a new class called Dice
that extends Suite
. Then define a method called Likelihood
that takes data
and hypo
and returns the probability of the data (the outcome of rolling the die) for a given hypothesis (number of sides on the die).
Hint: What should you do if the outcome exceeds the hypothetical number of sides on the die?
Here's an outline to get you started:
In [ ]:
class Dice(Suite):
# hypo is the number of sides on the die
# data is the outcome
def Likelihood(self, data, hypo):
return 1
In [ ]:
# Solution goes here
Now we can create a Dice
object and update it.
In [ ]:
dice = Dice([4, 6, 8, 12])
dice.Update(6)
dice.Print()
If we get more data, we can perform more updates.
In [ ]:
for roll in [8, 7, 7, 5, 4]:
dice.Update(roll)
Here are the results.
In [ ]:
dice.Print()
In [ ]:
class Tank(Suite):
# hypo is the number of tanks
# data is an observed serial number
def Likelihood(self, data, hypo):
if data > hypo:
return 0
else:
return 1 / hypo
Here are the posterior probabilities after seeing Tank #37.
In [ ]:
tank = Tank(range(100))
tank.Update(37)
thinkplot.Pdf(tank)
tank.Mean()
Exercise 7: Suppose we see another tank with serial number 17. What effect does this have on the posterior probabilities?
Update the suite again with the new data and plot the results.
In [ ]:
# Solution goes here
Exercise 8: Write a class definition for Euro
, which extends Suite
and defines a likelihood function that computes the probability of the data (heads or tails) for a given value of x
(the probability of heads).
Note that hypo
is in the range 0 to 100. Here's an outline to get you started.
In [ ]:
class Euro(Suite):
def Likelihood(self, data, hypo):
"""
hypo is the prob of heads (0-100)
data is a string, either 'H' or 'T'
"""
return 1
In [ ]:
# Solution goes here
We'll start with a uniform distribution from 0 to 100.
In [ ]:
euro = Euro(range(101))
thinkplot.Pdf(euro)
Now we can update with a single heads:
In [ ]:
euro.Update('H')
thinkplot.Pdf(euro)
Another heads:
In [ ]:
euro.Update('H')
thinkplot.Pdf(euro)
And a tails:
In [ ]:
euro.Update('T')
thinkplot.Pdf(euro)
Starting over, here's what it looks like after 7 heads and 3 tails.
In [ ]:
euro = Euro(range(101))
for outcome in 'HHHHHHHTTT':
euro.Update(outcome)
thinkplot.Pdf(euro)
euro.MaximumLikelihood()
The maximum posterior probability is 70%, which is the observed proportion.
Here are the posterior probabilities after 140 heads and 110 tails.
In [ ]:
euro = Euro(range(101))
evidence = 'H' * 140 + 'T' * 110
for outcome in evidence:
euro.Update(outcome)
thinkplot.Pdf(euro)
The posterior mean s about 56%
In [ ]:
euro.Mean()
So is the value with Maximum Aposteriori Probability (MAP).
In [ ]:
euro.MAP()
The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).
In [ ]:
euro.CredibleInterval(90)
In [ ]:
def TrianglePrior():
"""Makes a Suite with a triangular prior."""
suite = Euro(label='triangle')
for x in range(0, 51):
suite[x] = x
for x in range(51, 101):
suite[x] = 100-x
suite.Normalize()
return suite
And here's what it looks like:
In [ ]:
euro1 = Euro(range(101), label='uniform')
euro2 = TrianglePrior()
thinkplot.Pdfs([euro1, euro2])
thinkplot.Config(title='Priors')
Exercise 9: Update euro1 and euro2 with the same data we used before (140 heads and 110 tails) and plot the posteriors. How big is the difference in the means?
In [ ]:
# Solution goes here
In [ ]: