Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the female respondent file and display the variables names.
In [ ]:
%matplotlib inline
import chap01soln
resp = chap01soln.ReadFemResp()
resp.columns
Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.
In [ ]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)
Display the histogram.
In [ ]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()
Make a histogram of age_r, the respondent's age at the time of interview.
In [ ]:
# 回答者の年齢をヒストグラムで表示する
import thinkstats2
import chap01soln
import thinkplot
def ex2pr1():
df = chap01soln.ReadFemResp()
#print(df.age_r.value_counts().sort_index()) #年齢別回答者数を表示(年齢順)
#print(df.age_r.value_counts()) #年齢別回答者数を表示(人数順)
hist = thinkstats2.Hist(df.age_r, label='age_r')
thinkplot.Hist(hist)
thinkplot.Show(xlabel='Age of respondent', ylabel='Frequency')
ex2pr1()
Make a histogram of numfmhh, the number of people in the respondent's household.
In [3]:
Make a histogram of parity, the number children the respondent has borne. How would you describe this distribution?
In [3]:
Use Hist.Largest to find the largest values of parity.
In [3]:
Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.
In [3]:
Find the largest parities for high income respondents.
In [3]:
Compare the mean parity for high income respondents and others.
In [3]:
Investigate any other variables that look interesting.
In [3]:
In [3]: