In [8]:
import nsfg
import first
live, firsts, others = first.MakeFrames()

Print value counts for birthord and compare to results published in the codebook


In [13]:
diff_weight = firsts.totalwgt_lb.mean() - others.totalwgt_lb.mean()
diff_weight


Out[13]:
-0.12476118453549034

In [14]:
diff_age = firsts.agepreg.mean() - others.agepreg.mean()
diff_age


Out[14]:
-3.5864347661501519

In [15]:
import statsmodels.formula.api as smf
results = smf.ols('totalwgt_lb ~ agepreg', data=live).fit()
slope = results.params['agepreg']
slope


Out[15]:
0.01745385147180277

In [18]:
slope * diff_age


Out[18]:
-0.062597099721694457

Print value counts for prglngth and compare to results published in the codebook


In [6]:
import pandas
live['isfirst'] = (live.birthord == 1)
pandas.pivot_table(live, rows='isfirst', values=['agepreg', 'totalwgt_lb'])


Out[6]:
agepreg totalwgt_lb
isfirst
False 26.670849 7.325856
True 23.084414 7.201094

2 rows × 2 columns

Print value counts for agepreg and compare to results published in the codebook.

Looking at this data, please remember my comments in the book about the obligation to approach data with consideration for the context and respect for the respondents.


In [36]:

Compute the mean birthweight.


In [37]:
df.totalwgt_lb.mean()


Out[37]:
7.2656284576233681

Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.


In [37]:

Look through the codebook and find a variable, other than the ones mentioned in the book, that you find interesting. Compute values counts, means, or other statistics.


In [37]:

Create a boolean Series.


In [38]:
df.outcome == 1


Out[38]:
0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
13    False
14    False
...
13578     True
13579     True
13580    False
13581     True
13582    False
13583    False
13584     True
13585    False
13586    False
13587    False
13588     True
13589    False
13590    False
13591     True
13592     True
Name: outcome, Length: 13593, dtype: bool

Use a boolean Series to select the records for the pregnancies that ended in live birth.


In [39]:
live = df[df.outcome == 1]
len(live)


Out[39]:
9148

Count the number of live births with birthwgt_lb between 0 and 5 pounds (including both). The result should be 1125.


In [40]:
len(live[(live.birthwgt_lb >= 0) & (live.birthwgt_lb <= 5)])


Out[40]:
1125

Count the number of live births with birthwgt_lb between 9 and 95 pounds (including both). The result should be 798


In [40]:

Use birthord to select the records for first babies and others. How many are there of each?


In [41]:
firsts = df[df.birthord==1]
others = df[df.birthord>1]
len(firsts), len(others)


Out[41]:
(4413, 4735)

Compute the mean weight for first babies and others.


In [42]:
firsts.totalwgt_lb.mean()


Out[42]:
7.201094430437772

In [43]:
others.totalwgt_lb.mean()


Out[43]:
7.3258556149732623

Compute the mean prglngth for first babies and others. Compute the difference in means, expressed in hours.


In [43]: