Copyright 2016 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
In [2]:
from __future__ import print_function, division
import nsfg
In [4]:
preg = nsfg.ReadFemPreg()
preg.head()
Out[4]:
Print the column names.
In [5]:
preg.columns
Out[5]:
Select a single column name.
In [6]:
preg.columns[1]
Out[6]:
Select a column and check what type it is.
In [7]:
pregordr = preg['pregordr']
type(pregordr)
Out[7]:
Print a column.
In [8]:
pregordr
Out[8]:
Select a single element from a column.
In [9]:
pregordr[0]
Out[9]:
Select a slice from a column.
In [10]:
pregordr[2:5]
Out[10]:
Select a column using dot notation.
In [11]:
pregordr = preg.pregordr
Count the number of times each value occurs.
In [12]:
preg.outcome.value_counts().sort_index()
Out[12]:
Check the values of another variable.
In [13]:
preg.birthwgt_lb.value_counts().sort_index()
Out[13]:
Make a dictionary that maps from each respondent's caseid
to a list of indices into the pregnancy DataFrame
. Use it to select the pregnancy outcomes for a single respondent.
In [14]:
caseid = 10229
preg_map = nsfg.MakePregMap(preg)
indices = preg_map[caseid]
preg.outcome[indices].values
Out[14]:
Select the birthord
column, print the value counts, and compare to results published in the codebook
In [15]:
# Solution
preg.birthord.value_counts().sort_index()
Out[15]:
We can also use isnull
to count the number of nans.
In [16]:
preg.birthord.isnull().sum()
Out[16]:
Select the prglngth
column, print the value counts, and compare to results published in the codebook
In [17]:
# Solution
preg.prglngth.value_counts().sort_index()
Out[17]:
To compute the mean of a column, you can invoke the mean
method on a Series. For example, here is the mean birthweight in pounds:
In [18]:
preg.totalwgt_lb.mean()
Out[18]:
Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.
In [19]:
# Solution
preg['totalwgt_kg'] = preg.totalwgt_lb / 2.2
preg.totalwgt_kg.mean()
Out[19]:
nsfg.py
also provides ReadFemResp
, which reads the female respondents file and returns a DataFrame
:
In [20]:
resp = nsfg.ReadFemResp()
DataFrame
provides a method head
that displays the first five rows:
In [21]:
resp.head()
Out[21]:
Select the age_r
column from resp
and print the value counts. How old are the youngest and oldest respondents?
In [22]:
# Solution
resp.age_r.value_counts().sort_index()
Out[22]:
We can use the caseid
to match up rows from resp
and preg
. For example, we can select the row from resp
for caseid
2298 like this:
In [23]:
resp[resp.caseid==2298]
Out[23]:
And we can get the corresponding rows from preg
like this:
In [24]:
preg[preg.caseid==2298]
Out[24]:
How old is the respondent with caseid
1?
In [25]:
# Solution
resp[resp.caseid==1].age_r
Out[25]:
What are the pregnancy lengths for the respondent with caseid
2298?
In [26]:
# Solution
preg[preg.caseid==2298].prglngth
Out[26]:
What was the birthweight of the first baby born to the respondent with caseid
5012?
In [27]:
# Solution
preg[preg.caseid==5012].birthwgt_lb
Out[27]: