Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey

Read the female respondent file and display the variables names.



In [5]:

    
%matplotlib inline

import chap01soln
resp = chap01soln.ReadFemResp()
resp.columns









    Out[5]:





Index([u'caseid', u'rscrinf', u'rdormres', u'rostscrn', u'rscreenhisp', u'rscreenrace', u'age_a', u'age_r', u'cmbirth', u'agescrn', u'marstat', u'fmarstat', u'fmarit', u'evrmarry', u'hisp', u'hispgrp', u'numrace', u'roscnt', u'hplocale', u'manrel', u'fl_rage', u'fl_rrace', u'fl_rhisp', u'goschol', u'vaca', u'higrade', u'compgrd', u'havedip', u'dipged', u'cmhsgrad', u'havedeg', u'degrees', u'wthparnw', u'onown', u'intact', u'parmarr', u'lvsit14f', u'lvsit14m', u'womrasdu', u'momdegre', u'momworkd', u'momchild', u'momfstch', u'mom18', u'manrasdu', u'daddegre', u'bothbiol', u'intact18', u'onown18', u'numbabes', u'totplacd', u'nplaced', u'ndied', u'nadoptv', u'hasbabes', u'cmlastlb', u'cmfstprg', u'cmlstprg', u'menarche', u'pregnowq', u'maybpreg', u'numpregs', u'everpreg', u'currpreg', u'moscurrp', u'giveadpt', u'ngivenad', u'otherkid', u'nothrkid', u'sexothkd', u'relothkd', u'adptotkd', u'tryadopt', u'tryeithr', u'stilhere', u'cmokdcam', u'othkdfos', u'cmokddob', u'othkdspn', u'othkdrac1', u'othkdrac2', u'kdbstrac', u'okbornus', u'okdisabl1', u'sexothkd2', u'relothkd2', u'adptotkd2', u'tryadopt2', u'tryeithr2', u'stilhere2', u'cmokdcam2', u'othkdfos2', u'cmokddob2', u'othkdspn2', u'othkdrac6', u'okbornus2', u'okdisabl5', u'sexothkd3', u'relothkd3', u'adptotkd3', ...], dtype='object')

Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.



In [6]:

    
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)

Display the histogram.



In [7]:

    
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()









    












    





<matplotlib.figure.Figure at 0x10b2d8e90>

Make a histogram of age_r, the respondent's age at the time of interview.



In [8]:

    
#hist = thinkstats2.Hist(resp.ager)
resp.ager
#thinkplot.Hist(hist, label='ager')
#thinkplot.Show()









    Out[8]:





0       27
1       42
2       43
3       15
4       20
5       42
6       17
7       22
8       38
9       21
10      43
11      26
12      23
13      34
14      28
15      28
16      23
17      33
18      16
19      24
20      22
21      32
22      41
23      37
24      38
25      29
26      21
27      37
28      39
29      26
        ..
7613    18
7614    24
7615    15
7616    30
7617    24
7618    34
7619    34
7620    26
7621    22
7622    19
7623    19
7624    37
7625    20
7626    23
7627    23
7628    17
7629    36
7630    44
7631    32
7632    40
7633    35
7634    35
7635    30
7636    41
7637    35
7638    34
7639    17
7640    29
7641    16
7642    28
Name: ager, dtype: int64

Make a histogram of numfmhh, the number of people in the respondent's household.



In [10]:

    
hist = thinkstats2.Hist(resp.numfmhh)
thinkplot.Hist(hist, label='humfmhh')
thinkplot.Show()

Make a histogram of parity, the number of children born by the respondent. How would you describe this distribution?



In [11]:

    
hist = thinkstats2.Hist(resp.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()

Use Hist.Largest to find the largest values of parity.



In [13]:

    
hist.Largest(10)









    Out[13]:





[(22, 1),
 (16, 1),
 (10, 3),
 (9, 2),
 (8, 8),
 (7, 15),
 (6, 29),
 (5, 95),
 (4, 309),
 (3, 828)]

Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.



In [15]:

    
rich = resp[resp.totincr == 14]
hist = thinkstats2.Hist(rich.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()

Find the largest parities for high income respondents.



In [16]:

    
hist.Largest(10)









    Out[16]:





[(8, 1), (7, 1), (5, 5), (4, 19), (3, 123), (2, 267), (1, 229), (0, 515)]

Compare the mean parity for high income respondents and others.



In [20]:

    
notrich = resp[resp.totincr < 14]
rich.parity.mean(), notrich.parity.mean()









    Out[20]:





(1.0758620689655172, 1.2495758136665125)

Investigate any other variables that look interesting.



In [18]:

    
hist = thinkstats2.Hist(resp.fmarno)
thinkplot.Hist(hist, label='fmarno')
thinkplot.Show()



In [40]: