Read the female respondent file and display the variables names.


In [2]:
import chap01soln
resp = chap01soln.ReadFemResp()
resp.columns


Out[2]:
Index([u'caseid', u'rscrinf', u'rdormres', u'rostscrn', u'rscreenhisp', u'rscreenrace', u'age_a', u'age_r', u'cmbirth', u'agescrn', u'marstat', u'fmarstat', u'fmarit', u'evrmarry', u'hisp', u'hispgrp', u'numrace', u'roscnt', u'hplocale', u'manrel', u'fl_rage', u'fl_rrace', u'fl_rhisp', u'goschol', u'vaca', u'higrade', u'compgrd', u'havedip', u'dipged', u'cmhsgrad', u'havedeg', u'degrees', u'wthparnw', u'onown', u'intact', u'parmarr', u'lvsit14f', u'lvsit14m', u'womrasdu', u'momdegre', u'momworkd', u'momchild', u'momfstch', u'mom18', u'manrasdu', u'daddegre', u'bothbiol', u'intact18', u'onown18', u'numbabes', u'totplacd', u'nplaced', u'ndied', u'nadoptv', u'hasbabes', u'cmlastlb', u'cmfstprg', u'cmlstprg', u'menarche', u'pregnowq', u'maybpreg', u'numpregs', u'everpreg', u'currpreg', u'moscurrp', u'giveadpt', u'ngivenad', u'otherkid', u'nothrkid', u'sexothkd', u'relothkd', u'adptotkd', u'tryadopt', u'tryeithr', u'stilhere', u'cmokdcam', u'othkdfos', u'cmokddob', u'othkdspn', u'othkdrac1', u'othkdrac2', u'kdbstrac', u'okbornus', u'okdisabl1', u'sexothkd2', u'relothkd2', u'adptotkd2', u'tryadopt2', u'tryeithr2', u'stilhere2', u'cmokdcam2', u'othkdfos2', u'cmokddob2', u'othkdspn2', u'othkdrac6', u'okbornus2', u'okdisabl5', u'sexothkd3', u'relothkd3', u'adptotkd3', ...], dtype='object')

Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.


In [7]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)

Display the histogram.


In [8]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()


Make a histogram of age_r, the respondent's age at the time of interview.


In [19]:
hist = thinkstats2.Hist(resp.ager)
thinkplot.Hist(hist, label='ager')
thinkplot.Show()


Make a histogram of numfmhh, the number of people in the respondent's household.


In [10]:
hist = thinkstats2.Hist(resp.numfmhh)
thinkplot.Hist(hist, label='humfmhh')
thinkplot.Show()


Make a histogram of parity, the number of children born by the respondent. How would you describe this distribution?


In [11]:
hist = thinkstats2.Hist(resp.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()


Use Hist.Largest to find the largest values of parity.


In [13]:
hist.Largest(10)


Out[13]:
[(22, 1),
 (16, 1),
 (10, 3),
 (9, 2),
 (8, 8),
 (7, 15),
 (6, 29),
 (5, 95),
 (4, 309),
 (3, 828)]

Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.


In [15]:
rich = resp[resp.totincr == 14]
hist = thinkstats2.Hist(rich.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()


Find the largest parities for high income respondents.


In [16]:
hist.Largest(10)


Out[16]:
[(8, 1), (7, 1), (5, 5), (4, 19), (3, 123), (2, 267), (1, 229), (0, 515)]

Compare the mean parity for high income respondents and others.


In [20]:
notrich = resp[resp.totincr < 14]
rich.parity.mean(), notrich.parity.mean()


Out[20]:
(1.0758620689655172, 1.2495758136665125)

Investigate any other variables that look interesting.


In [18]:
hist = thinkstats2.Hist(resp.fmarno)
thinkplot.Hist(hist, label='fmarno')
thinkplot.Show()



In [40]: