Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey

Read the pregnancy file.



In [1]:

    
%matplotlib inline

import nsfg
preg = nsfg.ReadFemPreg()









    



nsfg.py:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.birthwgt_lb[df.birthwgt_lb > 20] = np.nan

Select live births, then make a CDF of totalwgt_lb.



In [ ]:



In [57]:

Display the CDF.



In [43]:

Find out how much you weighed at birth, if you can, and compute CDF(x).



In [44]:









    Out[44]:





0.81422881168400085

If you are a first child, look up your birthweight in the CDF of first children; otherwise use the CDF of other children.



In [59]:









    Out[59]:





0.79657754010695192

Compute the percentile rank of your birthweight



In [46]:









    Out[46]:





81.422881168400082

Compute the median birth weight by looking up the value associated with p=0.5.



In [45]:









    Out[45]:





7.375

Compute the interquartile range (IQR) by computing percentiles corresponding to 25 and 75.



In [47]:









    Out[47]:





(6.5, 8.125)

Make a random selection from cdf.



In [48]:









    Out[48]:





7.0

Draw a random sample from cdf.



In [49]:









    Out[49]:





[6.25, 5.1875, 8.1875, 6.5, 7.9375, 6.6875, 5.75, 6.5625, 7.8125, 5.25]

Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.



In [50]:

Generate 1000 random values using random.random() and plot their PMF.



In [55]:

Assuming that the PMF doesn't work very well, try plotting the CDF instead.



In [56]:



In [60]:



In [64]:









    Out[64]:





0.5



In [ ]: