Think Bayes

This notebook presents example code and exercise solutions for Think Bayes.

MIT License: https://opensource.org/licenses/MIT



In [1]:

    
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import classes from thinkbayes2
from thinkbayes2 import Pmf, Suite, Beta
import thinkplot

import numpy as np

Whenever you survey people about sensitive issues, you have to deal with social desirability bias, which is the tendency of people to shade their answers in the direction they think shows them in the most positive light.

One of the ways to improve the quality of the results is to collect responses in indirect ways. For example, here's a clever way one research group estimated the prevalence of atheists.

Another way is randomized response, as described in this presentation or this video.

As an example, suppose you ask 100 people to flip a coin and:

If they get heads, they honestly answer the question "Do you believe in God?"
If they get tails, they flip a second coin and report YES for heads, tails for NO.

Assume that you cannot observe whether they flip one coin or two. And suppose you get 55 YESes and 45 NOs.

Estimate the prevalence of believers in the surveyed population (by which, as always, I mean compute a posterior distribution).
How efficient is this method? That is, how does the width of the posterior distribution compare to the distribution you would get if 100 people answered the question honestly?



In [2]:

    
# Solution

class Social(Suite):
    
    def Likelihood(self, data, hypo):
        """
        data: outcome of unreliable measurement, either 'YES' or 'NO'
        hypo: actual proportion of the thing we're measuring
        """
        p = hypo
        p_yes = 0.25 + p/2
        if data == 'YES':
            return p_yes
        else:
            return 1 - p_yes



In [3]:

    
# Solution

prior = np.linspace(0, 1, 101)
suite = Social(prior)

thinkplot.Pdf(suite, label='Prior')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [4]:

    
# Solution

for i in range(55):
    suite.Update('YES')
    
for i in range(45):
    suite.Update('NO')



In [5]:

    
# Solution

thinkplot.Pdf(suite, label='Posterior')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [6]:

    
# Solution

suite.Mean(), suite.MAP()









    Out[6]:





(0.5980373585330246, 0.6)



In [7]:

    
# Solution

# For comparison, what would we think if we had been able 
# to survey 100 people directly?

beta = Beta(1, 1)
beta.Update((60, 40))
thinkplot.Pdf(beta.MakePmf(), label='Direct', color='gray')

thinkplot.Pdf(suite, label='Randomized')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [8]:

    
# Solution

# To see how efficient this method is, we can divide the sample size for
# the direct method by a factor.  It looks like we lose a factor of about 4.

factor = 4
beta = Beta(1, 1)
beta.Update((60/factor, 40/factor))
thinkplot.Pdf(beta.MakePmf(), label='Direct', color='gray')

thinkplot.Pdf(suite, label='Randomized')
thinkplot.decorate(xlabel='Fraction of the population',
                   ylabel='PDF')



In [9]:

    
# Solution

# So the effective sample size is about 25.



In [ ]:

Think Bayes

The social desirability problem