Think Bayes: Chapter 2

This notebook presents example code and exercise solutions for Think Bayes.

Copyright 2016 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
from __future__ import print_function, division

% matplotlib inline

from thinkbayes2 import Hist, Pmf, Suite

The Pmf class

I'll start by making a Pmf that represents the outcome of a six-sided die. Initially there are 6 values with equal probability.


In [2]:
pmf = Pmf()
for x in [1,2,3,4,5,6]:
    pmf[x] = 1
    
pmf.Print()


1 1
2 1
3 1
4 1
5 1
6 1

To be true probabilities, they have to add up to 1. So we can normalize the Pmf:


In [3]:
pmf.Normalize()


Out[3]:
6

The return value from Normalize is the sum of the probabilities before normalizing.


In [4]:
pmf.Print()


1 0.166666666667
2 0.166666666667
3 0.166666666667
4 0.166666666667
5 0.166666666667
6 0.166666666667

A faster way to make a Pmf is to provide a sequence of values. The constructor adds the values to the Pmf and then normalizes:


In [5]:
pmf = Pmf([1,2,3,4,5,6])
pmf.Print()


1 0.166666666667
2 0.166666666667
3 0.166666666667
4 0.166666666667
5 0.166666666667
6 0.166666666667

To extract a value from a Pmf, you can use Prob


In [6]:
pmf.Prob(1)


Out[6]:
0.16666666666666666

Or you can use the bracket operator. Either way, if you ask for the probability of something that's not in the Pmf, the result is 0.


In [7]:
pmf[1]


Out[7]:
0.16666666666666666

Here's a Pmf that represents the prior distribution.


In [8]:
pmf = Pmf()
pmf['Bowl 1'] = 0.5
pmf['Bowl 2'] = 0.5
pmf.Print()


Bowl 1 0.5
Bowl 2 0.5

And we can update it using Mult


In [9]:
pmf.Mult('Bowl 1', 0.75)
pmf.Mult('Bowl 2', 0.5)
pmf.Print()


Bowl 1 0.375
Bowl 2 0.25

Or here's the shorter way to construct the prior.


In [10]:
pmf = Pmf(['Bowl 1', 'Bowl 2'])
pmf.Print()


Bowl 1 0.5
Bowl 2 0.5

And we can use *= for the update.


In [11]:
pmf['Bowl 1'] *= 0.75
pmf['Bowl 2'] *= 0.5
pmf.Print()


Bowl 1 0.375
Bowl 2 0.25

Either way, we have to normalize the posterior distribution.


In [12]:
pmf.Normalize()
pmf.Print()


Bowl 1 0.6
Bowl 2 0.4

The Bayesian framework

Here's the same computation encapsulated in a class.


In [13]:
class Cookie(Pmf):
    """A map from string bowl ID to probablity."""

    def __init__(self, hypos):
        """Initialize self.

        hypos: sequence of string bowl IDs
        """
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()

    def Update(self, data):
        """Updates the PMF with new data.

        data: string cookie type
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

    mixes = {
        'Bowl 1':dict(vanilla=0.75, chocolate=0.25),
        'Bowl 2':dict(vanilla=0.5, chocolate=0.5),
        }

    def Likelihood(self, data, hypo):
        """The likelihood of the data under the hypothesis.

        data: string cookie type
        hypo: string bowl ID
        """
        mix = self.mixes[hypo]
        like = mix[data]
        return like

We can confirm that we get the same result.


In [14]:
pmf = Cookie(['Bowl 1', 'Bowl 2'])
pmf.Update('vanilla')
pmf.Print()


Bowl 1 0.6
Bowl 2 0.4

But this implementation is more general; it can handle any sequence of data.


In [15]:
dataset = ['vanilla', 'chocolate', 'vanilla']
for data in dataset:
    pmf.Update(data)
    
pmf.Print()


Bowl 1 0.627906976744
Bowl 2 0.372093023256

The Monty Hall problem

The Monty Hall problem might be the most contentious question in the history of probability. The scenario is simple, but the correct answer is so counterintuitive that many people just can't accept it, and many smart people have embarrassed themselves not just by getting it wrong but by arguing the wrong side, aggressively, in public.

Monty Hall was the original host of the game show Let's Make a Deal. The Monty Hall problem is based on one of the regular games on the show. If you are on the show, here's what happens:

  • Monty shows you three closed doors and tells you that there is a prize behind each door: one prize is a car, the other two are less valuable prizes like peanut butter and fake finger nails. The prizes are arranged at random.

  • The object of the game is to guess which door has the car. If you guess right, you get to keep the car.

  • You pick a door, which we will call Door A. We'll call the other doors B and C.

  • Before opening the door you chose, Monty increases the suspense by opening either Door B or C, whichever does not have the car. (If the car is actually behind Door A, Monty can safely open B or C, so he chooses one at random.)

  • Then Monty offers you the option to stick with your original choice or switch to the one remaining unopened door.

The question is, should you "stick" or "switch" or does it make no difference?

Most people have the strong intuition that it makes no difference. There are two doors left, they reason, so the chance that the car is behind Door A is 50%.

But that is wrong. In fact, the chance of winning if you stick with Door A is only 1/3; if you switch, your chances are 2/3.

Here's a class that solves the Monty Hall problem.


In [16]:
class Monty(Pmf):
    """Map from string location of car to probability"""

    def __init__(self, hypos):
        """Initialize the distribution.

        hypos: sequence of hypotheses
        """
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, 1)
        self.Normalize()

    def Update(self, data):
        """Updates each hypothesis based on the data.

        data: any representation of the data
        """
        for hypo in self.Values():
            like = self.Likelihood(data, hypo)
            self.Mult(hypo, like)
        self.Normalize()

    def Likelihood(self, data, hypo):
        """Compute the likelihood of the data under the hypothesis.

        hypo: string name of the door where the prize is
        data: string name of the door Monty opened
        """
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1

And here's how we use it.


In [17]:
pmf = Monty('ABC')
pmf.Update('B')
pmf.Print()


A 0.333333333333
B 0.0
C 0.666666666667

The Suite class

Most Bayesian updates look pretty much the same, especially the Update method. So we can encapsulate the framework in a class, Suite, and create new classes that extend it.

Child classes of Suite inherit Update and provide Likelihood. So here's the short version of Monty


In [18]:
class Monty(Suite):

    def Likelihood(self, data, hypo):
        if hypo == data:
            return 0
        elif hypo == 'A':
            return 0.5
        else:
            return 1

And it works.


In [19]:
pmf = Monty('ABC')
pmf.Update('B')
pmf.Print()


A 0.333333333333
B 0.0
C 0.666666666667

The M&M problem

M&Ms are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&Ms, changes the mixture of colors from time to time.

In 1995, they introduced blue M&Ms. Before then, the color mix in a bag of plain M&Ms was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. Afterward it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.

Suppose a friend of mine has two bags of M&Ms, and he tells me that one is from 1994 and one from 1996. He won't tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

Here's a solution:


In [20]:
class M_and_M(Suite):
    """Map from hypothesis (A or B) to probability."""

    mix94 = dict(brown=30,
                 yellow=20,
                 red=20,
                 green=10,
                 orange=10,
                 tan=10,
                 blue=0)

    mix96 = dict(blue=24,
                 green=20,
                 orange=16,
                 yellow=14,
                 red=13,
                 brown=13,
                 tan=0)

    hypoA = dict(bag1=mix94, bag2=mix96)
    hypoB = dict(bag1=mix96, bag2=mix94)

    hypotheses = dict(A=hypoA, B=hypoB)

    def Likelihood(self, data, hypo):
        """Computes the likelihood of the data under the hypothesis.

        hypo: string hypothesis (A or B)
        data: tuple of string bag, string color
        """
        bag, color = data
        mix = self.hypotheses[hypo][bag]
        like = mix[color]
        return like

And here's an update:


In [21]:
suite = M_and_M('AB')
suite.Update(('bag1', 'yellow'))
suite.Update(('bag2', 'green'))
suite.Print()


A 0.740740740741
B 0.259259259259

Exercise: Suppose you draw another M&M from bag1 and it's blue. What can you conclude? Run the update to confirm your intuition.


In [22]:
suite.Update(('bag1', 'blue'))
suite.Print()


A 0.0
B 1.0

Exercise: Now suppose you draw an M&M from bag2 and it's blue. What does that mean? Run the update to see what happens.


In [23]:
# Solution goes here

Exercises

Exercise: This one is from one of my favorite books, David MacKay's "Information Theory, Inference, and Learning Algorithms":

Elvis Presley had a twin brother who died at birth. What is the probability that Elvis was an identical twin?"

To answer this one, you need some background information: According to the Wikipedia article on twins: ``Twins are estimated to be approximately 1.9% of the world population, with monozygotic twins making up 0.2% of the total---and 8% of all twins.''


In [24]:
# Solution goes here

In [25]:
# Solution goes here

Exercise: Let's consider a more general version of the Monty Hall problem where Monty is more unpredictable. As before, Monty never opens the door you chose (let's call it A) and never opens the door with the prize. So if you choose the door with the prize, Monty has to decide which door to open. Suppose he opens B with probability p and C with probability 1-p. If you choose A and Monty opens B, what is the probability that the car is behind A, in terms of p? What if Monty opens C?

Hint: you might want to use SymPy to do the algebra for you.


In [26]:
from sympy import symbols
p = symbols('p')

In [27]:
# Solution goes here

In [28]:
# Solution goes here

In [29]:
# Solution goes here

In [30]:
# Solution goes here

Exercise: According to the CDC, ``Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.'' Also, among adults in the U.S. in 2014:

Nearly 19 of every 100 adult men (18.8%) Nearly 15 of every 100 adult women (14.8%)

If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?


In [31]:
# Solution goes here

Exercise In Section 2.3 I said that the solution to the cookie problem generalizes to the case where we draw multiple cookies with replacement.

But in the more likely scenario where we eat the cookies we draw, the likelihood of each draw depends on the previous draws.

Modify the solution in this chapter to handle selection without replacement. Hint: add instance variables to Cookie to represent the hypothetical state of the bowls, and modify Likelihood accordingly. You might want to define a Bowl object.


In [32]:
# Solution goes here

In [33]:
# Solution goes here

In [34]:
# Solution goes here

In [35]:
# Solution goes here

In [36]:
# Solution goes here

In [37]:
# Solution goes here

In [38]:
# Solution goes here

In [ ]: