Copyright 2016 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
In [1]:
from __future__ import print_function, division
import nsfg
In [2]:
preg = nsfg.ReadFemPreg()
preg.head()
Out[2]:
Print the column names.
In [3]:
preg.columns
Out[3]:
Select a single column name.
In [4]:
preg.columns[1]
Out[4]:
Select a column and check what type it is.
In [5]:
pregordr = preg['pregordr']
type(pregordr)
Out[5]:
Print a column.
In [6]:
pregordr
Out[6]:
Select a single element from a column.
In [7]:
pregordr[0]
Out[7]:
Select a slice from a column.
In [8]:
pregordr[2:5]
Out[8]:
Select a column using dot notation.
In [9]:
pregordr = preg.pregordr
Count the number of times each value occurs.
In [10]:
preg.outcome.value_counts().sort_index()
Out[10]:
Check the values of another variable.
In [11]:
preg.birthwgt_lb.value_counts().sort_index()
Out[11]:
Make a dictionary that maps from each respondent's caseid
to a list of indices into the pregnancy DataFrame
. Use it to select the pregnancy outcomes for a single respondent.
In [12]:
caseid = 10229
preg_map = nsfg.MakePregMap(preg)
indices = preg_map[caseid]
preg.outcome[indices].values
Out[12]:
Select the birthord
column, print the value counts, and compare to results published in the codebook
In [13]:
# Solution goes here
We can also use isnull
to count the number of nans.
In [14]:
preg.birthord.isnull().sum()
Out[14]:
Select the prglngth
column, print the value counts, and compare to results published in the codebook
In [15]:
# Solution goes here
To compute the mean of a column, you can invoke the mean
method on a Series. For example, here is the mean birthweight in pounds:
In [16]:
preg.totalwgt_lb.mean()
Out[16]:
Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.
In [17]:
# Solution goes here
nsfg.py
also provides ReadFemResp
, which reads the female respondents file and returns a DataFrame
:
In [18]:
resp = nsfg.ReadFemResp()
DataFrame
provides a method head
that displays the first five rows:
In [19]:
resp.head()
Out[19]:
Select the age_r
column from resp
and print the value counts. How old are the youngest and oldest respondents?
In [20]:
# Solution goes here
We can use the caseid
to match up rows from resp
and preg
. For example, we can select the row from resp
for caseid
2298 like this:
In [21]:
resp[resp.caseid==2298]
Out[21]:
And we can get the corresponding rows from preg
like this:
In [22]:
preg[preg.caseid==2298]
Out[22]:
How old is the respondent with caseid
1?
In [23]:
# Solution goes here
What are the pregnancy lengths for the respondent with caseid
2298?
In [24]:
# Solution goes here
What was the birthweight of the first baby born to the respondent with caseid
5012?
In [25]:
# Solution goes here
In [ ]: