notebook.community

Edit and run



In [8]:

    
import nsfg
import first
live, firsts, others = first.MakeFrames()

Print value counts for birthord and compare to results published in the codebook



In [13]:

    
diff_weight = firsts.totalwgt_lb.mean() - others.totalwgt_lb.mean()
diff_weight









    Out[13]:





-0.12476118453549034



In [14]:

    
diff_age = firsts.agepreg.mean() - others.agepreg.mean()
diff_age









    Out[14]:





-3.5864347661501519



In [15]:

    
import statsmodels.formula.api as smf
results = smf.ols('totalwgt_lb ~ agepreg', data=live).fit()
slope = results.params['agepreg']
slope









    Out[15]:





0.01745385147180277



In [18]:

    
slope * diff_age









    Out[18]:





-0.062597099721694457

Print value counts for prglngth and compare to results published in the codebook



In [6]:

    
import pandas
live['isfirst'] = (live.birthord == 1)
pandas.pivot_table(live, rows='isfirst', values=['agepreg', 'totalwgt_lb'])









    Out[6]:






  
    
      
      agepreg
      totalwgt_lb
    
    
      isfirst
      
      
    
  
  
    
      False
       26.670849
       7.325856
    
    
      True 
       23.084414
       7.201094
    
  

2 rows × 2 columns

Print value counts for agepreg and compare to results published in the codebook.

Looking at this data, please remember my comments in the book about the obligation to approach data with consideration for the context and respect for the respondents.



In [36]:

Compute the mean birthweight.



In [37]:

    
df.totalwgt_lb.mean()









    Out[37]:





7.2656284576233681

Create a new column named totalwgt_kg that contains birth weight in kilograms. Compute its mean. Remember that when you create a new column, you have to use dictionary syntax, not dot notation.



In [37]:

Look through the codebook and find a variable, other than the ones mentioned in the book, that you find interesting. Compute values counts, means, or other statistics.



In [37]:

Create a boolean Series.



In [38]:

    
df.outcome == 1









    Out[38]:





0      True
1      True
2      True
3      True
4      True
5      True
6      True
7      True
8      True
9      True
10     True
11     True
12     True
13    False
14    False
...
13578     True
13579     True
13580    False
13581     True
13582    False
13583    False
13584     True
13585    False
13586    False
13587    False
13588     True
13589    False
13590    False
13591     True
13592     True
Name: outcome, Length: 13593, dtype: bool

Use a boolean Series to select the records for the pregnancies that ended in live birth.



In [39]:

    
live = df[df.outcome == 1]
len(live)









    Out[39]:





9148

Count the number of live births with birthwgt_lb between 0 and 5 pounds (including both). The result should be 1125.



In [40]:

    
len(live[(live.birthwgt_lb >= 0) & (live.birthwgt_lb <= 5)])









    Out[40]:





1125

Count the number of live births with birthwgt_lb between 9 and 95 pounds (including both). The result should be 798



In [40]:

Use birthord to select the records for first babies and others. How many are there of each?



In [41]:

    
firsts = df[df.birthord==1]
others = df[df.birthord>1]
len(firsts), len(others)









    Out[41]:





(4413, 4735)

Compute the mean weight for first babies and others.



In [42]:

    
firsts.totalwgt_lb.mean()









    Out[42]:





7.201094430437772



In [43]:

    
others.totalwgt_lb.mean()









    Out[43]:





7.3258556149732623

Compute the mean prglngth for first babies and others. Compute the difference in means, expressed in hours.



In [43]: