Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the female respondent file and display the variables names.
In [135]:
%matplotlib inline
import chap01soln
resp = chap01soln.ReadFemResp()
resp
Out[135]:
Make a histogram of totincr the total income for the respondent's family. To interpret the codes see the codebook.
In [13]:
import thinkstats2
hist = thinkstats2.Hist(resp.totincr)
print resp.totincr
Display the histogram.
In [3]:
import thinkplot
thinkplot.Hist(hist, label='totincr')
thinkplot.Show()
Make a histogram of age_r, the respondent's age at the time of interview.
In [224]:
import matplotlib.pyplot as plt
age_r = resp.age_r.value_counts().sort_index()
In [225]:
age_r.plot(kind='bar', label = 'age_r', legend = True)
Out[225]:
In [226]:
resp.age_r.plot(kind='hist',label='age_r', range =(0,44), bins = 45, ylim=(0, 300), legend='True')
Out[226]:
Make a histogram of numfmhh, the number of people in the respondent's household.
In [89]:
df_numfmhh = resp.numfmhh
df = df_numfmhh.value_counts().sort_index()
df
Out[89]:
In [93]:
df.plot(kind='bar', label='numfmhh', legend = True)
Out[93]:
In [106]:
df_numfmhh.plot(kind='hist', label='numfmhh', legend = True)
Out[106]:
Make a histogram of parity, the number children the respondent has borne. How would you describe this distribution?
In [158]:
import pandas as pd
parity_vc = resp.parity.value_counts().sort_index()
df_parity_vc = pd.DataFrame(df_parity, columns=['frequency'])
In [142]:
parity_vc.plot(kind='bar',label='parity',legend=True)
Out[142]:
In [143]:
resp.parity.plot(kind='hist', label = 'parity', legend = True, range=(-5,25), ylim= (0,3500), bins = 30)
Out[143]:
Use Hist.Largest to find the largest values of parity.
In [121]:
resp.parity.describe()
Out[121]:
Use totincr to select the respondents with the highest income. Compute the distribution of parity for just the high income respondents.
In [184]:
resp_highest_income = resp[resp.totincr == resp.totincr.max()]
resp_other_income = resp[ resp.totincr != resp.totincr.max()]
resp_highest_income_parity = resp_highest_income.parity
resp_highest_income_parity_vc = resp_highest_income_parity.value_counts().sort_index()
resp_highest_income_parity_vc.plot(kind = 'bar', title = 'parity of those respondents with highest income')
Out[184]:
Find the largest parities for high income respondents.
In [181]:
resp_highest_income_parity.max()
Out[181]:
Compare the mean parity for high income respondents and others
In [185]:
print resp_highest_income_parity.mean()
print resp_other_income.parity.mean()
Investigate any other variables that look interesting.
Histogram of the age of respondents who had sex
In [208]:
resp_hadsex = resp[resp.hadsex == 1]
resp_hadsex_age_r = resp_hadsex.age_r
ax1 = resp_hadsex_age_r.plot(kind = 'hist')
Histogram of the age of respondents who never had sex
In [209]:
resp_nosex = resp[ resp.hadsex != 1]
resp_nosex_age_r = resp_nosex.age_r
resp_nosex_age_r.plot(kind = 'hist')
Out[209]:
In [312]:
df1 = pd.DataFrame(resp_hadsex.age_r.value_counts().sort_index(), columns = ['hadsex'])
df2 = pd.DataFrame(resp_nosex.age_r.value_counts().sort_index(), columns = ['nosex'])
df1['nosex'] = df2.nosex
df1
Out[312]:
In [313]:
df1.plot(kind = 'bar', alpha = 0.5, xlim = (0,15))
Out[313]:
In [ ]: