Allen Downey
This notebook describes the ThinkBayes2 module, which I wrote to demonstrate a simple approach to Bayesian statistics. The documentation for the module is here.
My book, Think Bayes, is based on the previous version of this module, thinkbayes.py. There are only a few changes changes that are not backward-compatible, so if the read the book, you can use this module.
thinkbayes2.py and supporting code are in this GitHub repository.
thinkbayes2 is a Python module that provides a framework for Bayesian statistics. The most important class is Suite
, which represents a set of hypotheses that are mutually exclusive and collectively exhaustive; in other words, exactly one of the hypotheses is true. Suite
provides a method, Update
, that performs a Bayesian update.
Suite
is based on Pmf
, which represents a probability mass function. A Pmf is a map from possible values to their probabilities. Pmf
is implemented using a Python dictionary, so the values can be any hashable type. The probabilities are normally floating-point.
This UML diagram shows the relationships among these classes:
Suite
provides Update
, which performs a Bayesian update. It calls Likelihood
, which is provided by a child class, Coin
in this example. It also uses Normalize
, which is provided by Pmf
.
To use this framework, you normally define a new class that inherits from Suite
, then provide Likelihood
.
Suppose you are given a strange new coin and you notice that it is unbalanced. When spun on edge, it seems to land with the "heads" side up more often than the "tails" side. You might want to estimate the probability of landing "heads" up, which I'll call $x$.
In [1]:
from __future__ import print_function, division
import thinkbayes2
class Coin(thinkbayes2.Suite):
def Likelihood(self, data, hypo):
x = hypo / 100
if data == 'H':
return x
else:
return 1-x
I'm importing print_function
and division
so this code should be compatible with Python 2 and Python 3.
The Likelihood
function computes the likelihood of the data under a given hypothesis. In this example, data
is a string, either "H"
or "T"
; hypo
is the hypothetical value of $x$ in the range 0 to 100.
The next step is to instantiate a Coin suite that represents the prior distribution of $x$. To start simple, I'll use a uniform distribution on the integers from 0 to 100:
In [2]:
suite = Coin(range(0, 101))
thinkplot
is a module that comes with ThinkBayes2. It provides wrappers for some of the functions in Matplotlib, and includes functions like Pdf
that known how to plot Suites
and other objects from ThinkBayes2.
In [3]:
import thinkplot
thinkplot.Pdf(suite)
As expected, the prior is uniform.
The next step is to update the prior with data. Since Update modifies the Suite, I'll make a copy before updating.
In [4]:
posterior = suite.Copy()
posterior.Update('H')
Out[4]:
The return value from Update is the normalizing constant, which is the average likelihood of the data across all hypotheses.
After the update, large values of $x$ are more likely than small ones, and the hypothesis that $x=0$ has been eliminated.
In [5]:
thinkplot.Pdf(posterior)
With additional data we can do a sequence of updates. The result shows the cumulative effect of all updates.
In [6]:
results = 'HTHHTHHHTTHHHTH'
for data in results:
posterior.Update(data)
thinkplot.Pdf(posterior)
In [7]:
class Pmf(object):
def Normalize(self, fraction=1.0):
"""Normalizes this PMF so the sum of all probs is fraction.
Args:
fraction: what the total should be after normalization
Returns: the total probability before normalizing
"""
if self.log:
raise ValueError("Normalize: Pmf is under a log transform")
total = self.Total()
if total == 0.0:
raise ValueError('Normalize: total probability is zero.')
#logging.warning('Normalize: total probability is zero.')
#return total
factor = fraction / total
for x in self.d:
self.d[x] *= factor
return total
And the only substantial method in Suite
is Update
:
In [8]:
class Suite(Pmf):
"""Represents a suite of hypotheses and their probabilities."""
def Update(self, data):
"""Updates each hypothesis based on the data.
data: any representation of the data
returns: the normalizing constant
"""
for hypo in self.Values():
like = self.Likelihood(data, hypo)
self.Mult(hypo, like)
return self.Normalize()
In [8]: