# Think Bayes

This notebook presents example code and exercise solutions for Think Bayes.

``````

In :

# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import classes from thinkbayes2
from thinkbayes2 import Pmf, Suite

import thinkbayes2
import thinkplot

``````

### The Dungeons and Dragons club

Suppose there are 10 people in my Dungeons and Dragons club; on any game day, each of them has a 70% chance of showing up.

Each player has one character and each character has 6 attributes, each of which is generated by rolling and adding up 3 6-sided dice.

At the beginning of the game, I ask whose character has the lowest attribute. The wizard says, "My constitution is 5; does anyone have a lower attribute?", and no one does.

The warrior says "My strength is 16; does anyone have a higher attribute?", and no one does.

How many characters are in the party?

### The prior

There are three ways to compute the prior distribution:

• Simulation

• Convolution

• Analytic distribution

First, simulation. Here's a function that flips a coin with probability `p`:

``````

In :

from random import random

def flip(p):
return random() < p

``````

We can use it to flip a coin for each member of the club.

``````

In :

flips = [flip(0.7) for i in range(10)]

``````
``````

Out:

[False, True, False, True, False, False, True, False, False, True]

``````

And count the number that show up on game day.

``````

In :

sum(flips)

``````
``````

Out:

4

``````

Let's encapsulate that in a function that simulates a game day.

``````

In :

def game_day(n, p):
flips = [flip(p) for i in range(n)]
return sum(flips)

``````
``````

In :

game_day(10, 0.7)

``````
``````

Out:

8

``````

If we run that function many times, we get a sample from the distribution of the number of players.

``````

In :

sample = [game_day(10, 0.7) for i in range(1000)]
pmf_sample = Pmf(sample)
thinkplot.Hist(pmf_sample)

``````
``````

``````

The second method is convolution. Instead of flipping a coin, we can create a `Pmf` object that represents the distribution of outcomes from a single flip.

``````

In :

def coin(p):
return Pmf({1:p, 0:1-p})

``````

Here's what it looks like.

``````

In :

player = coin(0.7)
player.Print()

``````
``````

0 0.30000000000000004
1 0.7

``````

If we have two players, there are three possible outcomes:

``````

In :

(player + player).Print()

``````
``````

0 0.09000000000000002
1 0.42000000000000004
2 0.48999999999999994

``````

If we have 10 players, we can get the prior distribution like this:

``````

In :

prior = sum([player]*10)
prior.Print()

``````
``````

0 5.9049000000000085e-06
1 0.00013778100000000018
2 0.0014467005000000017
3 0.009001692000000009
4 0.036756909000000025
5 0.10291934520000004
6 0.20012094900000005
7 0.26682793200000005
8 0.2334744405
9 0.12106082099999994
10 0.028247524899999984

``````

Now we can compare the results of simulation and convolution:

``````

In :

thinkplot.Hist(pmf_sample, color='C0')
thinkplot.Pmf(prior, color='C1')

thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')

``````
``````

``````

Finally, we can use an analytic distribution. The distribution we just computed is the binomial distribution, which has the following PMF:

\$ PMF(k; n, p) = P(k ~|~ n, p) = {n \choose k}\,p^{k}(1-p)^{n-k}\$

We could evalate the right hand side in Python, or use `MakeBinomialPmf`:

``````

In :

help(thinkbayes2.MakeBinomialPmf)

``````
``````

Help on function MakeBinomialPmf in module thinkbayes2.thinkbayes2:

MakeBinomialPmf(n, p)
Evaluates the binomial PMF.

n: number of trials
p: probability of success on each trial

Returns: Pmf of number of successes

``````

And we can confirm that the analytic result matches what we computed by convolution.

``````

In :

binomial = thinkbayes2.MakeBinomialPmf(10, 0.7)
thinkplot.Pmf(prior, color='C1')
thinkplot.Pmf(binomial, color='C2', linestyle='dotted')

thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')

``````
``````

``````

Since two players spoke, we can eliminate the possibility of 0 or 1 players:

``````

In :

thinkplot.Pmf(prior, color='gray')
del prior
del prior
prior.Normalize()
thinkplot.Pmf(prior, color='C1')

thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')

``````
``````

``````

### Likelihood

There are three components of the likelihood function:

• The probability that the highest attribute is 16.

• The probability that the lowest attribute is 5.

• The probability that the lowest and highest attributes are held by different players.

To compute the first component, we have to compute the distribution of the maximum of \$6n\$ attributes, where \$n\$ is the number of players.

Here is the distribution for a single die.

``````

In :

d6 = Pmf([1,2,3,4,5,6])
d6.Print()

``````
``````

1 0.16666666666666666
2 0.16666666666666666
3 0.16666666666666666
4 0.16666666666666666
5 0.16666666666666666
6 0.16666666666666666

``````

And here's the distribution for the sum of three dice.

``````

In :

thrice = sum([d6] * 3)
thinkplot.Pdf(thrice)
thinkplot.decorate(xlabel='Attribute',
ylabel='PMF')

``````
``````

``````

Here's the CDF for the sum of three dice.

``````

In :

cdf_thrice = thrice.MakeCdf()
thinkplot.Cdf(cdf_thrice)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF')

``````
``````

``````

The `Max` method raises the CDF to a power. So here's the CDF for the maximum of six attributes.

``````

In :

cdf_max_6 = cdf_thrice.Max(6)
thinkplot.Cdf(cdf_max_6)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6 attributes')

``````
``````

``````

If there are `n` players, there are `6*n` attributes. Here are the distributions for the maximum attribute of `n` players, for a few values of `n`.

``````

In :

for n in range(2, 10, 2):
cdf_max = cdf_thrice.Max(n*6)
thinkplot.Cdf(cdf_max, label='n=%s'%n)

thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6*n attributes')

``````
``````

``````

To check that, I'll compute the CDF for 7 players, and estimate it by simulation.

``````

In :

n = 7
cdf = cdf_thrice.Max(n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)

sample_max = [max(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_max), label='sample')

thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6*n attributes')

``````
``````

``````

Looks good.

Now, to compute the minimum, I have to write my own function, because `Cdf` doesn't provide a `Min` function.

``````

In :

def compute_cdf_min(cdf, k):
"""CDF of the min of k samples from cdf.

cdf: Cdf object
k: number of samples

returns: new Cdf object
"""
cdf_min = cdf.Copy()
cdf_min.ps = 1 - (1 - cdf_min.ps)**k
return cdf_min

``````

Now we can compute the CDF of the minimum attribute for `n` players, for several values of `n`.

``````

In :

for n in range(2, 10, 2):
cdf_min = compute_cdf_min(cdf_thrice, n*6)
thinkplot.Cdf(cdf_min, label='n=%s'%n)

thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Minimum of 6*n attributes')

``````
``````

``````

And again we can check it by comparing to simulation results.

``````

In :

n = 7
cdf = compute_cdf_min(cdf_thrice, n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)

sample_min = [min(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_min), label='sample')

thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Minimum of 6*n attributes')

``````
``````

``````

For efficiency and conciseness, it is helpful to precompute the distributions for the relevant values of `n`, and store them in dictionaries.

``````

In :

like_min = {}
like_max = {}

for n in range(2, 11):
cdf_min = compute_cdf_min(cdf_thrice, n*6)
like_min[n] = cdf_min.MakePmf()
cdf_max = cdf_thrice.Max(n*6)
like_max[n] = cdf_max.MakePmf()
print(like_min[n], like_max[n])

``````
``````

0.23288163935889017 0.23288163935889017
0.28826338107405935 0.2882633810740594
0.31794402625472684 0.3179440262547268
0.32955796250238156 0.32955796250238156
0.32871475364520075 0.3287147536452008
0.3195146933518256 0.3195146933518255
0.3049352170780888 0.30493521707808885
0.2871203018328896 0.28712030183288956
0.2675970126720095 0.2675970126720096

``````

The output shows that the particular data we saw is symmetric: the chance that 16 is the maximum is the same as the chance that 5 is the minimum.

Finally, we need the probability that the minimum and maximum are held by the same person. If there are `n` players, there are `6*n` attributes.

Let's call the player with the highest attribute Max. What is the chance that Max also has the lowest attribute? Well Max has 5 more attributes, out of a total of `6*n-1` remaining attributes.

So here's `prob_same` as a function of `n`.

``````

In :

def prob_same(n):
return 5 / (6*n-1)

for n in range(2, 11):
print(n, prob_same(n))

``````
``````

2 0.45454545454545453
3 0.29411764705882354
4 0.21739130434782608
5 0.1724137931034483
6 0.14285714285714285
7 0.12195121951219512
8 0.10638297872340426
9 0.09433962264150944
10 0.0847457627118644

``````

### The update

Here's a class that implements this likelihood function.

``````

In :

class Dungeons(Suite):

def Likelihood(self, data, hypo):
"""Probability of the data given the hypothesis.

data: lowest attribute, highest attribute, boolean
(whether the same person has both)
hypo: number of players

returns: probability
"""
lowest, highest, same = data
n = hypo

p = prob_same(n)
like = p if same else 1-p

like *= like_min[n][lowest]
like *= like_max[n][highest]

return like

``````

Here's the prior we computed above.

``````

In :

suite = Dungeons(prior)
thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
suite.Mean()

``````
``````

Out:

7.000868145040201

``````

And here's the update based on the data in the problem statement.

``````

In :

suite.Update((5, 16, False))

``````
``````

Out:

0.08548474490284354

``````

Here's the posterior.

``````

In :

thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
suite.Mean()

``````
``````

Out:

6.940862784521086

``````
``````

In :

suite.Print()

``````
``````

2 0.0005007044860801902
3 0.006177449626400222
4 0.03402191676923563
5 0.10823000113979393
6 0.21684925990462955
7 0.279837163496623
8 0.22697589141459018
9 0.10574761295351938
10 0.02166000020912791

``````

Based on the data, I am 94% sure there are between 5 and 9 players.

``````

In :

suite.CredibleInterval()

``````
``````

Out:

(5, 9)

``````
``````

In :

sum(suite[n] for n in [5,6,7,8,9])

``````
``````

Out:

0.9376399289091562

``````
``````

In [ ]:

``````