Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey Read the female respondent file.

In [1]:
%matplotlib inline

import thinkstats2
import thinkplot
import chap01soln
resp = chap01soln.ReadFemResp()

Make a PMF of numkdhh, the number of children under 18 in the respondent's household.


In [2]:
kids = resp['numkdhh']
kids


Out[2]:
0     3
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     2
9     0
10    1
11    0
12    1
13    2
14    0
...
7628    0
7629    2
7630    0
7631    2
7632    2
7633    1
7634    2
7635    1
7636    2
7637    0
7638    0
7639    0
7640    0
7641    0
7642    0
Name: numkdhh, Length: 7643, dtype: int64

Display the PMF.


In [3]:
pmf = thinkstats2.Pmf(kids)
thinkplot.Pmf(pmf, label='PMF')
thinkplot.Show(xlabel='# of Children', ylabel='PMF')


<matplotlib.figure.Figure at 0x115dc37f0>

Define BiasPmf.


In [4]:
def BiasPmf(pmf, label=''):
    """Returns the Pmf with oversampling proportional to value.

    If pmf is the distribution of true values, the result is the
    distribution that would be seen if values are oversampled in
    proportion to their values; for example, if you ask students
    how big their classes are, large classes are oversampled in
    proportion to their size.

    Args:
      pmf: Pmf object.
      label: string label for the new Pmf.

     Returns:
       Pmf object
    """
    new_pmf = pmf.Copy(label=label)

    for x, p in pmf.Items():
        new_pmf.Mult(x, x)
        
    new_pmf.Normalize()
    return new_pmf

Make a the biased Pmf of children in the household, as observed if you surveyed the children instead of the respondents.


In [8]:
biasedpmf = BiasPmf(pmf, label='BiasPMF')

Display the actual Pmf and the biased Pmf on the same axes.


In [7]:
thinkplot.PrePlot(2)
thinkplot.Pmfs([pmf,biasedpmf])
thinkplot.Show(xlabel='# of Children', ylabel='PMF')


<matplotlib.figure.Figure at 0x10431e4a8>

Compute the means of the two Pmfs.


In [10]:
pmf.Mean()


Out[10]:
1.0242051550438309

In [11]:
biasedpmf.Mean()


Out[11]:
2.4036791006642821

In [2]: