Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the female respondent file and display the variables names.
In [49]:
%matplotlib inline
from operator import itemgetter
import chap01soln
Out[49]:
Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.
In [43]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)
resp = chap01soln.ReadFemResp()
resp.columns
Display the histogram.
In [3]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()
Make a histogram of age_r, the respondent's age at the time of interview.
In [4]:
hist2 = thinkstats2.Hist(resp.age_r)
thinkplot.Hist(hist2, label='age_r')
thinkplot.Show()
Make a histogram of numfmhh, the number of people in the respondent's household.
In [5]:
hist3 = thinkstats2.Hist(resp.numfmhh)
thinkplot.Hist(hist3, label='numfmhh')
thinkplot.Show()
Make a histogram of parity, the number children the respondent has borne. How would you describe this distribution?
In [6]:
hist4 = thinkstats2.Hist(resp.parity)
thinkplot.Hist(hist4, label='parity')
thinkplot.Show()
Use Hist.Largest to find the largest values of parity.
In [7]:
hist4.Largest()
Out[7]:
Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.
In [9]:
hist5 = thinkstats2.Hist(resp.parity[resp.totincr == 14])
thinkplot.Hist(hist5, label='parity_hi')
thinkplot.Show()
Find the largest parities for high income respondents.
In [10]:
hist5.Largest()
Out[10]:
Compare the mean parity for high income respondents and others.
In [11]:
hi_par = resp.parity[resp.totincr == 14]
par = resp.parity
hi_par.mean(), par.mean()
Out[11]:
Investigate any other variables that look interesting.
In [12]:
hi_par.std(), par.std()
Out[12]:
In [34]:
def Mode(h):
max = 0
for i in h:
if h.Freq(i) > h.Freq(max):
max = i
return max
In [42]:
def AllModes(h):
hist = h.Copy()
result = []
while len(hist) > 0:
max = Mode(hist)
result.append((max, hist.Freq(max)))
hist.Remove(max)
return result
In [47]:
def AllModes2(hist):
"""Returns value-freq pairs in decreasing order of frequency.
hist: Hist object
returns: iterator of value-freq pairs
"""
return sorted(hist.Items(), key=itemgetter(1), reverse=True)
In [51]:
%timeit AllModes(hist2)
%timeit AllModes2(hist2)
In [52]:
%timeit?
In [ ]: