Bayes Theorem Problems

This notebook presents code and exercises from Think Bayes, second edition.

MIT License: https://opensource.org/licenses/MIT



In [2]:

    
from __future__ import print_function, division

% matplotlib inline
import warnings
warnings.filterwarnings('ignore')

import numpy as np

from thinkbayes2 import Hist, Pmf, Cdf, Suite, Beta
import thinkplot

The sock problem

Yuzhong Huang

There are two drawers of socks. The first drawer has 40 white socks and 10 black socks; the second drawer has 20 white socks and 30 black socks. We randomly get 2 socks from a drawer, and it turns out to be a pair(same color) but we don't know the color of these socks. What is the chance that we picked the first drawer.

To make calculating our likelihood easier, we start by defining a multiply function. The function is written in a functional way primarily for fun.



In [22]:

    
from functools import reduce
import operator

def multiply(items):
    """
    multiply takes a list of numbers, multiplies all of them, and returns the result
    
    Args:
        items (list): The list of numbers
        
    Return:
        the items multiplied together
    """
    return reduce(operator.mul, items, 1)

Next we define a drawer suite. This suite will allow us to take n socks up to the least number of socks in a drawer. To make our likelihood function simpler, we ignore the case where we take 11 black socks and the only possible drawer is drawer 2.



In [42]:

    
class Drawers(Suite):
    def Likelihood(self, data, hypo):
        """
        Likelihood returns the likelihood given a bayesian update 
        consisting of a particular hypothesis and new data. In the
        case of our drawer problem, the probabilities change with the
        number of pairs we take (without replacement) so we we start
        by defining lists for each color sock in each drawer.
        
        Args:
            data (int): The number of socks we take
            hypo (str): The hypothesis we are updating
            
        Return:
            the likelihood for a hypothesis
        """
        
        drawer1W = []
        drawer1B = []
        drawer2W = []
        drawer2B = []
        for i in range(data):
            drawer1W.append(40-i)
            drawer1B.append(10-i)
            drawer2W.append(20-i)
            drawer2B.append(30-i)
        
        if hypo == 'drawer1':
            return multiply(drawer1W)+multiply(drawer1B)
        if hypo == 'drawer2':
            return multiply(drawer2W)+multiply(drawer2B)

Next, define our hypotheses and create the drawer Suite.



In [45]:

    
hypos = ['drawer1','drawer2']
drawers = Drawers(hypos)
drawers.Print()









    



drawer1 0.5
drawer2 0.5

Next, update the drawers by taking two matching socks.



In [44]:

    
drawers.Update(2)
drawers.Print()









    



drawer1 0.5689655172413792
drawer2 0.43103448275862066

It seems that the drawer with many of a single sock (40 white 10 black) is more likely after the update. To confirm this suspicion, let's update with 3 more socks.



In [46]:

    
drawers.Update(3)
drawers.Print()









    



drawer1 0.6578947368421052
drawer2 0.3421052631578947

Now we can conclude that bayesian updates make the drawer with extreme amounts of one sock more likely when our information is just matching socks.

Chess-playing twins

Allen Downey

Two identical twins are members of my chess club, but they never show up on the same day; in fact, they strictly alternate the days they show up. I can't tell them apart except that one is a better player than the other: Avery beats me 60% of the time and I beat Blake 70% of the time. If I play one twin on Monday and win, and the other twin on Tuesday and lose, which twin did I play on which day?



In [7]:

    
twins = Pmf()
twins['AB'] = 1
twins['BA'] = 1
twins.Normalize()
twins.Print()









    



AB 0.5
BA 0.5



In [8]:

    
#win day 1
twins['AB'] *= .4
twins['BA'] *= .7

#lose day 1
twins['AB'] *= .6
twins['BA'] *= .3

twins.Normalize()









    Out[8]:





0.22499999999999998



In [9]:

    
twins.Print()









    



AB 0.5333333333333333
BA 0.4666666666666667

1984

by Katerina Zoltan

The place: Airstrip One. The reason: thoughtcrime. The time: ???

John's parents were taken by the Thought Police and erased from all records. John is being initiated into the Youth League and must pass a test. He is asked whether his parents are good comrades. It is not clear what John's admission officer knows:

He may know that John's parents have been erased and that John did not give them away.
He may know only that John's parents have been erased.
He may not know that John's parents have been erased.

It is a well known fact that children who have parents that are 'good comrades' have twice the chances of passing the test. However, if the admission officer knows that their parents committed thoughtcrime (but not that they have been erased yet), a child that gave his parents away has three times the chances of getting in than a child who did not give them away.

And if the admission officer knows the specifics of the arrest, a child that denies that the records are false and their parents existed has a 1/3 chance of getting in, while one who pretends that his parents never existed has a 2/3 chance. Lying to an admission officer that knows the parents have been erased will ensure that the child does not get in. Telling an admission officer that your parents do not exist when he does not know this will give you a 1/3 chance of getting in.

There is a 60% chance the admission officer knows nothing, a 25% chance that he knows the parents have been erased, and a 15% chance that the officer knows all of the details. John says that he never had parents and is admitted into the Youth League. What did his admission officer know?



In [10]:

    
# Solution goes here



In [11]:

    
# Solution goes here

Where Am I? - The Robot Localization Problem

by Kathryn Hite

Bayes's Theorem proves to be extremely useful when building mobile robots that need to know where they are within an environment at any given time. Because of the error in motion and sensor systems, a robot's knowledge of its location in the world is based on probabilities. Let's look at a simplified example that could feasibly be scaled up to create a working localization model.

Part A: We have a robot that exists within a very simple environement. The map for this environment is a row of 6 grid cells that are colored either green or red and each labeled $x_1$, $x_2$, etc. In real life, a larger form of this grid environment could make up what is known as an occupancy grid, or a map of the world with places that the robot can go represented as green cells and obstacles as red cells.

G	R	R	G	G	G
$x_1$	$x_2$	$x_3$	$x_4$	$x_5$	$x_6$

The robot has a sensor that can detect color with an 80% chance of being accurate.

Given that the robot gets dropped in the environment and senses red, what is the probability of it being in each of the six locations?



In [12]:

    
# Solution goes here



In [13]:

    
# Solution goes here



In [14]:

    
# Solution goes here

Part B: This becomes an extremely useful tool as we begin to move around the map. Let's try to get a more accurate knowledge of where the robot falls in the world by telling it to move forward one cell.

The robot moves forward one cell from its previous position and the sensor reads green, again with an 80% accuracy rate. Update the probability of the robot having started in each location.



In [15]:

    
# Solution goes here



In [16]:

    
# Solution goes here



In [17]:

    
# Solution goes here

Red Dice problems

Suppose I have a six-sided die that is red on 2 sides and blue on 4 sides, and another die that's the other way around, red on 4 sides and blue on 2.

I choose a die at random and roll it, and I tell you it came up red. What is the probability that I rolled the second die (red on 4 sides)?



In [10]:

    
class Dice(Suite):
    def Likelihood(self, data, hypo):
        """
        Likelihood returns the likelihood given a bayesian update 
        constisting of a particular hypothesis and new data. In the
        case of our dice problem, the probabilities that the dice
        turns up red is different for each dice, so we start by
        testing which dice we are focused on.
        
        Args:
            data (str): Bayesian update which should be string
            hypo (str): The hypothesis we are updating
        """
        
        if hypo == 'Dice 1':
            probRed = 2 / (2+4)
            if data == 'red':
                return probRed
            else:
                return 1-probRed
            
        if hypo == 'Dice 2':
            probRed = 4 / (2+4)
            if data == 'red':
                return probRed
            else:
                return 1-probRed
            
        return "Hypothesis not in Suite"

hypotheses = ['Dice 1', 'Dice 2']
dice = Dice(hypotheses)
dice.Print()









    



Dice 1 0.5
Dice 2 0.5



In [11]:

    
dice.Update('red')
dice.Print()









    



Dice 1 0.3333333333333333
Dice 2 0.6666666666666666



In [20]:

    
# Solution goes here



In [21]:

    
# Solution goes here



In [22]:

    
# Solution goes here



In [23]:

    
# Solution goes here

Scenario B

Suppose I roll the same die again. What is the probability I get red?



In [24]:

    
# Solution goes here

Scenario A

Instead of rolling the same die, suppose I choosing a die at random and roll it. What is the probability that I get red?



In [25]:

    
# Solution goes here

Scenario C

Now let's run a different experiment. Suppose I choose a die and roll it. If the outcome is red, I report the outcome. Otherwise I choose a die again and roll again, and repeat until I get red.

What is the probability that the last die I rolled is the reddish one?



In [26]:

    
# Solution goes here



In [27]:

    
# Solution goes here

Scenario D

Finally, suppose I choose a die and roll it over and over until I get red, then report the outcome. What is the probability that the die I rolled is the reddish one?



In [28]:

    
# Solution goes here



In [29]:

    
# Solution goes here

The bus problem

Allen Downey

Two buses routes run past my house, headed for Arlington and Billerica. In theory, the Arlington bus runs every 20 minutes and the Billerica bus every 30 minutes, but by the time they get to me, the time between buses is well-modeled by exponential distributions with means 20 and 30.

Part 1: Suppose I see a bus outside my house, but I can't read the destination. What is the probability that it is an Arlington bus?

Part 2: Suppose I see a bus go by, but I don't see the destination, and 3 minutes later I see another bus. What is the probability that the second bus is going to Arlington?



In [30]:

    
# Solution goes here



In [31]:

    
# Solution goes here



In [32]:

    
# Solution goes here



In [33]:

    
# Solution goes here



In [34]:

    
# Solution goes here



In [35]:

    
# Solution goes here



In [36]:

    
# Solution goes here



In [ ]: