Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the female respondent file and display the variables names.
In [5]:
%matplotlib inline
import chap01soln
resp = chap01soln.ReadFemResp()
resp.columns
Out[5]:
Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.
In [6]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)
Display the histogram.
In [7]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()
Make a histogram of age_r, the respondent's age at the time of interview.
In [8]:
#hist = thinkstats2.Hist(resp.ager)
resp.ager
#thinkplot.Hist(hist, label='ager')
#thinkplot.Show()
Out[8]:
Make a histogram of numfmhh, the number of people in the respondent's household.
In [10]:
hist = thinkstats2.Hist(resp.numfmhh)
thinkplot.Hist(hist, label='humfmhh')
thinkplot.Show()
Make a histogram of parity, the number of children born by the respondent. How would you describe this distribution?
In [11]:
hist = thinkstats2.Hist(resp.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()
Use Hist.Largest to find the largest values of parity.
In [13]:
hist.Largest(10)
Out[13]:
Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.
In [15]:
rich = resp[resp.totincr == 14]
hist = thinkstats2.Hist(rich.parity)
thinkplot.Hist(hist, label='parity')
thinkplot.Show()
Find the largest parities for high income respondents.
In [16]:
hist.Largest(10)
Out[16]:
Compare the mean parity for high income respondents and others.
In [20]:
notrich = resp[resp.totincr < 14]
rich.parity.mean(), notrich.parity.mean()
Out[20]:
Investigate any other variables that look interesting.
In [18]:
hist = thinkstats2.Hist(resp.fmarno)
thinkplot.Hist(hist, label='fmarno')
thinkplot.Show()
In [40]: