Think Bayes

This notebook presents example code and exercise solutions for Think Bayes.

Copyright 2018 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import classes from thinkbayes2
from thinkbayes2 import Pmf, Suite, Beta
import thinkplot

import numpy as np

The social desirability problem

Whenever you survey people about sensitive issues, you have to deal with social desirability bias, which is the tendency of people to shade their answers in the direction they think shows them in the most positive light.

One of the ways to improve the quality of the results is to collect responses in indirect ways. For example, here's a clever way one research group estimated the prevalence of atheists.

Another way is randomized response, as described in this presentation or this video.

As an example, suppose you ask 100 people to flip a coin and:

  • If they get heads, they report YES.

  • If they get tails, they honestly answer the question "Do you believe in God?"

And suppose you get 80 YESes and 20 NOs.

  1. Estimate the prevalence of believers in the surveyed population (by which, as always, I mean compute a posterior distribution).

  2. How efficient is this method? That is, how does the width of the posterior distribution compare to the distribution you would get if 100 people answered the question honestly?


In [2]:
# Solution

class Social(Suite):
    
    def Likelihood(self, data, hypo):
        """
        data: outcome of unreliable measurement, either 'YES' or 'NO'
        hypo: actual proportion of the thing we're measuring
        """
        p = hypo
        p_yes = 0.5 + p/2
        if data == 'YES':
            return p_yes
        else:
            return 1 - p_yes

In [3]:
# Solution

prior = np.linspace(0, 1, 101)
suite = Social(prior)

thinkplot.Pdf(suite, label='Prior')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [4]:
# Solution

for i in range(80):
    suite.Update('YES')
    
for i in range(20):
    suite.Update('NO')

In [5]:
# Solution

thinkplot.Pdf(suite, label='Posterior')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [6]:
# Solution

suite.Mean(), suite.MAP()


Out[6]:
(0.5882352942702531, 0.6)

In [7]:
# Solution

# For comparison, what would we think if we had been able 
# to survey 100 people directly?

beta = Beta(1, 1)
beta.Update((60, 40))
thinkplot.Pdf(beta.MakePmf(), label='Direct', color='gray')

thinkplot.Pdf(suite, label='Randomized')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [8]:
# Solution

# To see how efficient this method is, we can divide the sample size for
# the direct method by a factor.  It looks like we lose a factor of $2 \sqrt{2}$.

factor = 2 * np.sqrt(2)
beta = Beta(1, 1)
beta.Update((60/factor, 40/factor))
thinkplot.Pdf(beta.MakePmf(), label='Direct', color='gray')

thinkplot.Pdf(suite, label='Randomized')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [9]:
# Solution

# So the effective sample size is about 35.

100 / 2 / np.sqrt(2)


Out[9]:
35.35533905932737

In [ ]: