Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey

Read the pregnancy file.


In [1]:
%matplotlib inline

import nsfg
preg = nsfg.ReadFemPreg()


nsfg.py:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.birthwgt_lb[df.birthwgt_lb > 20] = np.nan

Select live births, then make a CDF of totalwgt_lb.


In [ ]:


In [57]:

Display the CDF.


In [43]:



Find out how much you weighed at birth, if you can, and compute CDF(x).


In [44]:



Out[44]:
0.81422881168400085

If you are a first child, look up your birthweight in the CDF of first children; otherwise use the CDF of other children.


In [59]:



Out[59]:
0.79657754010695192

Compute the percentile rank of your birthweight


In [46]:



Out[46]:
81.422881168400082

Compute the median birth weight by looking up the value associated with p=0.5.


In [45]:



Out[45]:
7.375

Compute the interquartile range (IQR) by computing percentiles corresponding to 25 and 75.


In [47]:



Out[47]:
(6.5, 8.125)

Make a random selection from cdf.


In [48]:



Out[48]:
7.0

Draw a random sample from cdf.


In [49]:



Out[49]:
[6.25, 5.1875, 8.1875, 6.5, 7.9375, 6.6875, 5.75, 6.5625, 7.8125, 5.25]

Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.


In [50]:



Generate 1000 random values using random.random() and plot their PMF.


In [55]:



Assuming that the PMF doesn't work very well, try plotting the CDF instead.


In [56]:




In [60]:


In [64]:



Out[64]:
0.5

In [ ]: