Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the female respondent file and display the variables names.
In [49]:
    
%matplotlib inline
from operator import itemgetter
import chap01soln
    
    Out[49]:
Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.
In [43]:
    
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)
resp = chap01soln.ReadFemResp()
resp.columns
    
Display the histogram.
In [3]:
    
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()
    
    
    
Make a histogram of age_r, the respondent's age at the time of interview.
In [4]:
    
hist2 = thinkstats2.Hist(resp.age_r)
thinkplot.Hist(hist2, label='age_r')
thinkplot.Show()
    
    
    
Make a histogram of numfmhh, the number of people in the respondent's household.
In [5]:
    
hist3 = thinkstats2.Hist(resp.numfmhh)
thinkplot.Hist(hist3, label='numfmhh')
thinkplot.Show()
    
    
    
Make a histogram of parity, the number children the respondent has borne. How would you describe this distribution?
In [6]:
    
hist4 = thinkstats2.Hist(resp.parity)
thinkplot.Hist(hist4, label='parity')
thinkplot.Show()
    
    
    
Use Hist.Largest to find the largest values of parity.
In [7]:
    
hist4.Largest()
    
    Out[7]:
Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.
In [9]:
    
hist5 = thinkstats2.Hist(resp.parity[resp.totincr == 14])
thinkplot.Hist(hist5, label='parity_hi')
thinkplot.Show()
    
    
    
Find the largest parities for high income respondents.
In [10]:
    
hist5.Largest()
    
    Out[10]:
Compare the mean parity for high income respondents and others.
In [11]:
    
hi_par = resp.parity[resp.totincr == 14]
par = resp.parity
hi_par.mean(), par.mean()
    
    Out[11]:
Investigate any other variables that look interesting.
In [12]:
    
hi_par.std(), par.std()
    
    Out[12]:
In [34]:
    
def Mode(h):
    max = 0
    for i in h:
        if h.Freq(i) > h.Freq(max):
            max = i
    return max
    
In [42]:
    
def AllModes(h):
    hist = h.Copy()
    result = []
    while len(hist) > 0:
        max = Mode(hist)
        result.append((max, hist.Freq(max)))
        hist.Remove(max)
    return result
    
In [47]:
    
def AllModes2(hist):
    """Returns value-freq pairs in decreasing order of frequency.
    hist: Hist object
    returns: iterator of value-freq pairs
    """
    return sorted(hist.Items(), key=itemgetter(1), reverse=True)
    
In [51]:
    
%timeit AllModes(hist2)
%timeit AllModes2(hist2)
    
    
In [52]:
    
%timeit?
    
In [ ]: