Code and exercises from my workshop on Bayesian statistics in Python.

``````

In [ ]:

from __future__ import print_function, division

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

from thinkbayes2 import Pmf, Suite
import thinkplot

``````

## Working with Pmfs

Create a Pmf object to represent a six-sided die.

``````

In [ ]:

d6 = Pmf()

``````

A Pmf is a map from possible outcomes to their probabilities.

``````

In [ ]:

for x in [1,2,3,4,5,6]:
d6[x] = 1

``````

Initially the probabilities don't add up to 1.

``````

In [ ]:

d6.Print()

``````

`Normalize` adds up the probabilities and divides through. The return value is the total probability before normalizing.

``````

In [ ]:

d6.Normalize()

``````

Now the Pmf is normalized.

``````

In [ ]:

d6.Print()

``````

And we can compute its mean (which only works if it's normalized).

``````

In [ ]:

d6.Mean()

``````

`Random` chooses a random value from the Pmf.

``````

In [ ]:

d6.Random()

``````

`thinkplot` provides methods for plotting Pmfs in a few different styles.

``````

In [ ]:

thinkplot.Hist(d6)

``````

Exercise 1: The Pmf object provides `__add__`, so you can use the `+` operator to compute the Pmf of the sum of two dice.

Compute and plot the Pmf of the sum of two 6-sided dice.

``````

In [ ]:

# Solution goes here

``````

Exercise 2: Suppose I roll two dice and tell you the result is greater than 3.

Plot the Pmf of the remaining possible outcomes and compute its mean.

``````

In [ ]:

# Solution goes here

``````

Create a Pmf with two equally likely hypotheses.

``````

In [ ]:

cookie = Pmf(['Bowl 1', 'Bowl 2'])

``````

Update each hypothesis with the likelihood of the data (a vanilla cookie).

``````

In [ ]:

``````

Print the posterior probabilities.

``````

In [ ]:

``````

Exercise 3: Suppose we put the first cookie back, stir, choose again from the same bowl, and get a chocolate cookie.

Hint: The posterior (after the first cookie) becomes the prior (before the second cookie).

``````

In [ ]:

# Solution goes here

``````

Exercise 4: Instead of doing two updates, what if we collapse the two pieces of data into one update?

Re-initialize `Pmf` with two equally likely hypotheses and perform one update based on two pieces of data, a vanilla cookie and a chocolate cookie.

The result should be the same regardless of how many updates you do (or the order of updates).

``````

In [ ]:

# Solution goes here

``````

## The dice problem

Create a Suite to represent dice with different numbers of sides.

``````

In [ ]:

pmf = Pmf([4, 6, 8, 12])
pmf.Print()

``````

Exercise 5: We'll solve this problem two ways. First we'll do it "by hand", as we did with the cookie problem; that is, we'll multiply each hypothesis by the likelihood of the data, and then renormalize.

In the space below, update `suite` based on the likelihood of the data (rolling a 6), then normalize and print the results.

``````

In [ ]:

# Solution goes here

``````

Exercise 6: Now let's do the same calculation using `Suite.Update`.

Write a definition for a new class called `Dice` that extends `Suite`. Then define a method called `Likelihood` that takes `data` and `hypo` and returns the probability of the data (the outcome of rolling the die) for a given hypothesis (number of sides on the die).

Hint: What should you do if the outcome exceeds the hypothetical number of sides on the die?

Here's an outline to get you started:

``````

In [ ]:

class Dice(Suite):
# hypo is the number of sides on the die
# data is the outcome
def Likelihood(self, data, hypo):
return 1

``````
``````

In [ ]:

# Solution goes here

``````

Now we can create a `Dice` object and update it.

``````

In [ ]:

dice = Dice([4, 6, 8, 12])
dice.Update(6)
dice.Print()

``````

If we get more data, we can perform more updates.

``````

In [ ]:

for roll in [8, 7, 7, 5, 4]:
dice.Update(roll)

``````

Here are the results.

``````

In [ ]:

dice.Print()

``````

## The German tank problem

The German tank problem is actually identical to the dice problem.

``````

In [ ]:

class Tank(Suite):
# hypo is the number of tanks
# data is an observed serial number
def Likelihood(self, data, hypo):
if data > hypo:
return 0
else:
return 1 / hypo

``````

Here are the posterior probabilities after seeing Tank #37.

``````

In [ ]:

tank = Tank(range(100))
tank.Update(37)
thinkplot.Pdf(tank)
tank.Mean()

``````

Exercise 7: Suppose we see another tank with serial number 17. What effect does this have on the posterior probabilities?

Update the suite again with the new data and plot the results.

``````

In [ ]:

# Solution goes here

``````

## The Euro problem

Exercise 8: Write a class definition for `Euro`, which extends `Suite` and defines a likelihood function that computes the probability of the data (heads or tails) for a given value of `x` (the probability of heads).

Note that `hypo` is in the range 0 to 100. Here's an outline to get you started.

``````

In [ ]:

class Euro(Suite):

def Likelihood(self, data, hypo):
"""
hypo is the prob of heads (0-100)
data is a string, either 'H' or 'T'
"""
return 1

``````
``````

In [ ]:

# Solution goes here

``````

``````

In [ ]:

euro = Euro(range(101))
thinkplot.Pdf(euro)

``````

Now we can update with a single heads:

``````

In [ ]:

euro.Update('H')
thinkplot.Pdf(euro)

``````

``````

In [ ]:

euro.Update('H')
thinkplot.Pdf(euro)

``````

And a tails:

``````

In [ ]:

euro.Update('T')
thinkplot.Pdf(euro)

``````

Starting over, here's what it looks like after 7 heads and 3 tails.

``````

In [ ]:

euro = Euro(range(101))

for outcome in 'HHHHHHHTTT':
euro.Update(outcome)

thinkplot.Pdf(euro)
euro.MaximumLikelihood()

``````

The maximum posterior probability is 70%, which is the observed proportion.

Here are the posterior probabilities after 140 heads and 110 tails.

``````

In [ ]:

euro = Euro(range(101))

evidence = 'H' * 140 + 'T' * 110
for outcome in evidence:
euro.Update(outcome)

thinkplot.Pdf(euro)

``````

The posterior mean s about 56%

``````

In [ ]:

euro.Mean()

``````

So is the value with Maximum Aposteriori Probability (MAP).

``````

In [ ]:

euro.MAP()

``````

The posterior credible interval has a 90% chance of containing the true value (provided that the prior distribution truly represents our background knowledge).

``````

In [ ]:

euro.CredibleInterval(90)

``````

## Swamping the prior

The following function makes a Euro object with a triangle prior.

``````

In [ ]:

def TrianglePrior():
"""Makes a Suite with a triangular prior."""
suite = Euro(label='triangle')
for x in range(0, 51):
suite[x] = x
for x in range(51, 101):
suite[x] = 100-x
suite.Normalize()
return suite

``````

And here's what it looks like:

``````

In [ ]:

euro1 = Euro(range(101), label='uniform')
euro2 = TrianglePrior()
thinkplot.Pdfs([euro1, euro2])
thinkplot.Config(title='Priors')

``````

Exercise 9: Update euro1 and euro2 with the same data we used before (140 heads and 110 tails) and plot the posteriors. How big is the difference in the means?

``````

In [ ]:

# Solution goes here

``````
``````

In [ ]:

``````