Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey

Read the female respondent file and display the variables names.


In [ ]:
%matplotlib inline

import chap01soln
resp = chap01soln.ReadFemResp()
resp.columns

Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.


In [ ]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)

Display the histogram.


In [ ]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()

Make a histogram of age_r, the respondent's age at the time of interview.


In [ ]:
# 回答者の年齢をヒストグラムで表示する
import thinkstats2
import chap01soln
import thinkplot
    
def ex2pr1():
    df = chap01soln.ReadFemResp()
    #print(df.age_r.value_counts().sort_index())  #年齢別回答者数を表示(年齢順)
    #print(df.age_r.value_counts())  #年齢別回答者数を表示(人数順)
    hist = thinkstats2.Hist(df.age_r, label='age_r')
    thinkplot.Hist(hist)
    thinkplot.Show(xlabel='Age of respondent', ylabel='Frequency')


ex2pr1()

Make a histogram of numfmhh, the number of people in the respondent's household.


In [3]:

Make a histogram of parity, the number children the respondent has borne. How would you describe this distribution?


In [3]:

Use Hist.Largest to find the largest values of parity.


In [3]:

Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.


In [3]:

Find the largest parities for high income respondents.


In [3]:

Compare the mean parity for high income respondents and others.


In [3]:

Investigate any other variables that look interesting.


In [3]:


In [3]: