This notebook presents example code and exercise solutions for Think Bayes.
Copyright 2018 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# import classes from thinkbayes2
from thinkbayes2 import Pmf, Suite
import thinkbayes2
import thinkplot
Suppose there are 10 people in my Dungeons and Dragons club; on any game day, each of them has a 70% chance of showing up.
Each player has one character and each character has 6 attributes, each of which is generated by rolling and adding up 3 6-sided dice.
At the beginning of the game, I ask whose character has the lowest attribute. The wizard says, "My constitution is 5; does anyone have a lower attribute?", and no one does.
The warrior says "My strength is 16; does anyone have a higher attribute?", and no one does.
How many characters are in the party?
In [2]:
from random import random
def flip(p):
return random() < p
We can use it to flip a coin for each member of the club.
In [3]:
flips = [flip(0.7) for i in range(10)]
Out[3]:
And count the number that show up on game day.
In [4]:
sum(flips)
Out[4]:
Let's encapsulate that in a function that simulates a game day.
In [5]:
def game_day(n, p):
flips = [flip(p) for i in range(n)]
return sum(flips)
In [6]:
game_day(10, 0.7)
Out[6]:
If we run that function many times, we get a sample from the distribution of the number of players.
In [7]:
sample = [game_day(10, 0.7) for i in range(1000)]
pmf_sample = Pmf(sample)
thinkplot.Hist(pmf_sample)
The second method is convolution. Instead of flipping a coin, we can create a Pmf
object that represents the distribution of outcomes from a single flip.
In [8]:
def coin(p):
return Pmf({1:p, 0:1-p})
Here's what it looks like.
In [9]:
player = coin(0.7)
player.Print()
If we have two players, there are three possible outcomes:
In [10]:
(player + player).Print()
If we have 10 players, we can get the prior distribution like this:
In [11]:
prior = sum([player]*10)
prior.Print()
Now we can compare the results of simulation and convolution:
In [12]:
thinkplot.Hist(pmf_sample, color='C0')
thinkplot.Pmf(prior, color='C1')
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
Finally, we can use an analytic distribution. The distribution we just computed is the binomial distribution, which has the following PMF:
$ PMF(k; n, p) = P(k ~|~ n, p) = {n \choose k}\,p^{k}(1-p)^{n-k}$
We could evalate the right hand side in Python, or use MakeBinomialPmf
:
In [13]:
help(thinkbayes2.MakeBinomialPmf)
And we can confirm that the analytic result matches what we computed by convolution.
In [14]:
binomial = thinkbayes2.MakeBinomialPmf(10, 0.7)
thinkplot.Pmf(prior, color='C1')
thinkplot.Pmf(binomial, color='C2', linestyle='dotted')
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
Since two players spoke, we can eliminate the possibility of 0 or 1 players:
In [15]:
thinkplot.Pmf(prior, color='gray')
del prior[0]
del prior[1]
prior.Normalize()
thinkplot.Pmf(prior, color='C1')
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
There are three components of the likelihood function:
The probability that the highest attribute is 16.
The probability that the lowest attribute is 5.
The probability that the lowest and highest attributes are held by different players.
To compute the first component, we have to compute the distribution of the maximum of $6n$ attributes, where $n$ is the number of players.
Here is the distribution for a single die.
In [16]:
d6 = Pmf([1,2,3,4,5,6])
d6.Print()
And here's the distribution for the sum of three dice.
In [17]:
thrice = sum([d6] * 3)
thinkplot.Pdf(thrice)
thinkplot.decorate(xlabel='Attribute',
ylabel='PMF')
Here's the CDF for the sum of three dice.
In [18]:
cdf_thrice = thrice.MakeCdf()
thinkplot.Cdf(cdf_thrice)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF')
The Max
method raises the CDF to a power. So here's the CDF for the maximum of six attributes.
In [19]:
cdf_max_6 = cdf_thrice.Max(6)
thinkplot.Cdf(cdf_max_6)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6 attributes')
If there are n
players, there are 6*n
attributes. Here are the distributions for the maximum attribute of n
players, for a few values of n
.
In [20]:
for n in range(2, 10, 2):
cdf_max = cdf_thrice.Max(n*6)
thinkplot.Cdf(cdf_max, label='n=%s'%n)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6*n attributes')
To check that, I'll compute the CDF for 7 players, and estimate it by simulation.
In [21]:
n = 7
cdf = cdf_thrice.Max(n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)
sample_max = [max(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_max), label='sample')
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Maximum of 6*n attributes')
Looks good.
Now, to compute the minimum, I have to write my own function, because Cdf
doesn't provide a Min
function.
In [22]:
def compute_cdf_min(cdf, k):
"""CDF of the min of k samples from cdf.
cdf: Cdf object
k: number of samples
returns: new Cdf object
"""
cdf_min = cdf.Copy()
cdf_min.ps = 1 - (1 - cdf_min.ps)**k
return cdf_min
Now we can compute the CDF of the minimum attribute for n
players, for several values of n
.
In [23]:
for n in range(2, 10, 2):
cdf_min = compute_cdf_min(cdf_thrice, n*6)
thinkplot.Cdf(cdf_min, label='n=%s'%n)
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Minimum of 6*n attributes')
And again we can check it by comparing to simulation results.
In [24]:
n = 7
cdf = compute_cdf_min(cdf_thrice, n*6)
thinkplot.Cdf(cdf, label='n=%s'%n)
sample_min = [min(cdf_thrice.Sample(42)) for i in range(1000)]
thinkplot.Cdf(thinkbayes2.Cdf(sample_min), label='sample')
thinkplot.decorate(xlabel='Attribute',
ylabel='CDF',
title='Minimum of 6*n attributes')
For efficiency and conciseness, it is helpful to precompute the distributions for the relevant values of n
, and store them in dictionaries.
In [25]:
like_min = {}
like_max = {}
for n in range(2, 11):
cdf_min = compute_cdf_min(cdf_thrice, n*6)
like_min[n] = cdf_min.MakePmf()
cdf_max = cdf_thrice.Max(n*6)
like_max[n] = cdf_max.MakePmf()
print(like_min[n][5], like_max[n][16])
The output shows that the particular data we saw is symmetric: the chance that 16 is the maximum is the same as the chance that 5 is the minimum.
Finally, we need the probability that the minimum and maximum are held by the same person. If there are n
players, there are 6*n
attributes.
Let's call the player with the highest attribute Max. What is the chance that Max also has the lowest attribute? Well Max has 5 more attributes, out of a total of 6*n-1
remaining attributes.
So here's prob_same
as a function of n
.
In [26]:
def prob_same(n):
return 5 / (6*n-1)
for n in range(2, 11):
print(n, prob_same(n))
Here's a class that implements this likelihood function.
In [27]:
class Dungeons(Suite):
def Likelihood(self, data, hypo):
"""Probability of the data given the hypothesis.
data: lowest attribute, highest attribute, boolean
(whether the same person has both)
hypo: number of players
returns: probability
"""
lowest, highest, same = data
n = hypo
p = prob_same(n)
like = p if same else 1-p
like *= like_min[n][lowest]
like *= like_max[n][highest]
return like
Here's the prior we computed above.
In [28]:
suite = Dungeons(prior)
thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
suite.Mean()
Out[28]:
And here's the update based on the data in the problem statement.
In [29]:
suite.Update((5, 16, False))
Out[29]:
Here's the posterior.
In [30]:
thinkplot.Hist(suite)
thinkplot.decorate(xlabel='Number of players',
ylabel='PMF')
suite.Mean()
Out[30]:
In [31]:
suite.Print()
Based on the data, I am 94% sure there are between 5 and 9 players.
In [32]:
suite.CredibleInterval()
Out[32]:
In [33]:
sum(suite[n] for n in [5,6,7,8,9])
Out[33]:
In [ ]: