This notebook presents example code and exercise solutions for Think Bayes.
Copyright 2016 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
In [1]:
from __future__ import print_function, division
% matplotlib inline
from thinkbayes2 import Hist, Pmf, Suite
In [2]:
pmf = Pmf()
for x in [1,2,3,4,5,6]:
pmf[x] = 1
pmf.Print()
To be true probabilities, they have to add up to 1. So we can normalize the Pmf:
In [3]:
pmf.Normalize()
Out[3]:
The return value from Normalize
is the sum of the probabilities before normalizing.
In [4]:
pmf.Print()
A faster way to make a Pmf is to provide a sequence of values. The constructor adds the values to the Pmf and then normalizes:
In [5]:
pmf = Pmf([1,2,3,4,5,6])
pmf.Print()
To extract a value from a Pmf, you can use Prob
In [6]:
pmf.Prob(1)
Out[6]:
Or you can use the bracket operator. Either way, if you ask for the probability of something that's not in the Pmf, the result is 0.
In [7]:
pmf[1]
Out[7]:
In [8]:
pmf = Pmf()
pmf['Bowl 1'] = 0.5
pmf['Bowl 2'] = 0.5
pmf.Print()
And we can update it using Mult
In [9]:
pmf.Mult('Bowl 1', 0.75)
pmf.Mult('Bowl 2', 0.5)
pmf.Print()
Or here's the shorter way to construct the prior.
In [10]:
pmf = Pmf(['Bowl 1', 'Bowl 2'])
pmf.Print()
And we can use *=
for the update.
In [11]:
pmf['Bowl 1'] *= 0.75
pmf['Bowl 2'] *= 0.5
pmf.Print()
Either way, we have to normalize the posterior distribution.
In [12]:
pmf.Normalize()
pmf.Print()
In [13]:
class Cookie(Pmf):
"""A map from string bowl ID to probablity."""
def __init__(self, hypos):
"""Initialize self.
hypos: sequence of string bowl IDs
"""
Pmf.__init__(self)
for hypo in hypos:
self.Set(hypo, 1)
self.Normalize()
def Update(self, data):
"""Updates the PMF with new data.
data: string cookie type
"""
for hypo in self.Values():
like = self.Likelihood(data, hypo)
self.Mult(hypo, like)
self.Normalize()
mixes = {
'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
}
def Likelihood(self, data, hypo):
"""The likelihood of the data under the hypothesis.
data: string cookie type
hypo: string bowl ID
"""
mix = self.mixes[hypo]
like = mix[data]
return like
We can confirm that we get the same result.
In [14]:
pmf = Cookie(['Bowl 1', 'Bowl 2'])
pmf.Update('vanilla')
pmf.Print()
But this implementation is more general; it can handle any sequence of data.
In [15]:
dataset = ['vanilla', 'chocolate', 'vanilla']
for data in dataset:
pmf.Update(data)
pmf.Print()
The Monty Hall problem might be the most contentious question in the history of probability. The scenario is simple, but the correct answer is so counterintuitive that many people just can't accept it, and many smart people have embarrassed themselves not just by getting it wrong but by arguing the wrong side, aggressively, in public.
Monty Hall was the original host of the game show Let's Make a Deal. The Monty Hall problem is based on one of the regular games on the show. If you are on the show, here's what happens:
Monty shows you three closed doors and tells you that there is a prize behind each door: one prize is a car, the other two are less valuable prizes like peanut butter and fake finger nails. The prizes are arranged at random.
The object of the game is to guess which door has the car. If you guess right, you get to keep the car.
You pick a door, which we will call Door A. We'll call the other doors B and C.
Before opening the door you chose, Monty increases the suspense by opening either Door B or C, whichever does not have the car. (If the car is actually behind Door A, Monty can safely open B or C, so he chooses one at random.)
Then Monty offers you the option to stick with your original choice or switch to the one remaining unopened door.
The question is, should you "stick" or "switch" or does it make no difference?
Most people have the strong intuition that it makes no difference. There are two doors left, they reason, so the chance that the car is behind Door A is 50%.
But that is wrong. In fact, the chance of winning if you stick with Door A is only 1/3; if you switch, your chances are 2/3.
Here's a class that solves the Monty Hall problem.
In [16]:
class Monty(Pmf):
"""Map from string location of car to probability"""
def __init__(self, hypos):
"""Initialize the distribution.
hypos: sequence of hypotheses
"""
Pmf.__init__(self)
for hypo in hypos:
self.Set(hypo, 1)
self.Normalize()
def Update(self, data):
"""Updates each hypothesis based on the data.
data: any representation of the data
"""
for hypo in self.Values():
like = self.Likelihood(data, hypo)
self.Mult(hypo, like)
self.Normalize()
def Likelihood(self, data, hypo):
"""Compute the likelihood of the data under the hypothesis.
hypo: string name of the door where the prize is
data: string name of the door Monty opened
"""
if hypo == data:
return 0
elif hypo == 'A':
return 0.5
else:
return 1
And here's how we use it.
In [17]:
pmf = Monty('ABC')
pmf.Update('B')
pmf.Print()
In [18]:
class Monty(Suite):
def Likelihood(self, data, hypo):
if hypo == data:
return 0
elif hypo == 'A':
return 0.5
else:
return 1
And it works.
In [19]:
pmf = Monty('ABC')
pmf.Update('B')
pmf.Print()
M&Ms are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&Ms, changes the mixture of colors from time to time.
In 1995, they introduced blue M&Ms. Before then, the color mix in a bag of plain M&Ms was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. Afterward it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.
Suppose a friend of mine has two bags of M&Ms, and he tells me that one is from 1994 and one from 1996. He won't tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?
Here's a solution:
In [20]:
class M_and_M(Suite):
"""Map from hypothesis (A or B) to probability."""
mix94 = dict(brown=30,
yellow=20,
red=20,
green=10,
orange=10,
tan=10,
blue=0)
mix96 = dict(blue=24,
green=20,
orange=16,
yellow=14,
red=13,
brown=13,
tan=0)
hypoA = dict(bag1=mix94, bag2=mix96)
hypoB = dict(bag1=mix96, bag2=mix94)
hypotheses = dict(A=hypoA, B=hypoB)
def Likelihood(self, data, hypo):
"""Computes the likelihood of the data under the hypothesis.
hypo: string hypothesis (A or B)
data: tuple of string bag, string color
"""
bag, color = data
mix = self.hypotheses[hypo][bag]
like = mix[color]
return like
And here's an update:
In [21]:
suite = M_and_M('AB')
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()
Exercise: Suppose you draw another M&M from bag1
and it's blue. What can you conclude? Run the update to confirm your intuition.
In [22]:
suite.Update(('bag1', 'blue'))
suite.Print()
Exercise: Now suppose you draw an M&M from bag2
and it's blue. What does that mean? Run the update to see what happens.
In [23]:
# Solution goes here
Exercise: This one is from one of my favorite books, David MacKay's "Information Theory, Inference, and Learning Algorithms":
Elvis Presley had a twin brother who died at birth. What is the probability that Elvis was an identical twin?"
To answer this one, you need some background information: According to the Wikipedia article on twins: ``Twins are estimated to be approximately 1.9% of the world population, with monozygotic twins making up 0.2% of the total---and 8% of all twins.''
In [24]:
# Solution goes here
In [25]:
# Solution goes here
Exercise: Let's consider a more general version of the Monty Hall problem where Monty is more unpredictable. As before, Monty never opens the door you chose (let's call it A) and never opens the door with the prize. So if you choose the door with the prize, Monty has to decide which door to open. Suppose he opens B with probability p
and C with probability 1-p
. If you choose A and Monty opens B, what is the probability that the car is behind A, in terms of p
? What if Monty opens C?
Hint: you might want to use SymPy to do the algebra for you.
In [26]:
from sympy import symbols
p = symbols('p')
In [27]:
# Solution goes here
In [28]:
# Solution goes here
In [29]:
# Solution goes here
In [30]:
# Solution goes here
Exercise: According to the CDC, ``Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.'' Also, among adults in the U.S. in 2014:
Nearly 19 of every 100 adult men (18.8%) Nearly 15 of every 100 adult women (14.8%)
If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?
In [31]:
# Solution goes here
Exercise In Section 2.3 I said that the solution to the cookie problem generalizes to the case where we draw multiple cookies with replacement.
But in the more likely scenario where we eat the cookies we draw, the likelihood of each draw depends on the previous draws.
Modify the solution in this chapter to handle selection without replacement. Hint: add instance variables to Cookie to represent the hypothetical state of the bowls, and modify Likelihood accordingly. You might want to define a Bowl object.
In [32]:
# Solution goes here
In [33]:
# Solution goes here
In [34]:
# Solution goes here
In [35]:
# Solution goes here
In [36]:
# Solution goes here
In [37]:
# Solution goes here
In [38]:
# Solution goes here
In [ ]: