Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey

Read the pregnancy file.



In [1]:

    
%matplotlib inline

import nsfg
preg = nsfg.ReadFemPreg()









    



nsfg.py:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.birthwgt_lb[df.birthwgt_lb > 20] = np.nan

Select live births, then make a CDF of totalwgt_lb.



In [4]:

    
import thinkstats2 as ts

live = preg[preg.outcome == 1]

wgt_cdf = ts.Cdf(live.totalwgt_lb, label = 'weight')

Display the CDF.



In [6]:

    
import thinkplot as tp

tp.Cdf(wgt_cdf, label = 'weight')
tp.Show()









    












    





<matplotlib.figure.Figure at 0x10b48f790>

Find out how much you weighed at birth, if you can, and compute CDF(x).



In [44]:









    Out[44]:





0.81422881168400085

If you are a first child, look up your birthweight in the CDF of first children; otherwise use the CDF of other children.



In [59]:









    Out[59]:





0.79657754010695192

Compute the percentile rank of your birthweight



In [46]:









    Out[46]:





81.422881168400082

Compute the median birth weight by looking up the value associated with p=0.5.



In [45]:









    Out[45]:





7.375

Compute the interquartile range (IQR) by computing percentiles corresponding to 25 and 75.



In [47]:









    Out[47]:





(6.5, 8.125)

Make a random selection from cdf.



In [48]:









    Out[48]:





7.0

Draw a random sample from cdf.



In [49]:









    Out[49]:





[6.25, 5.1875, 8.1875, 6.5, 7.9375, 6.6875, 5.75, 6.5625, 7.8125, 5.25]

Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.



In [50]:

Generate 1000 random values using random.random() and plot their PMF.



In [7]:

    
import random
random.random?



In [14]:

    
import random

thousand = [random.random() for x in range(1000)]
thousand_pmf = ts.Pmf(thousand, label = 'rando')
tp.Pmf(thousand_pmf, linewidth=0.1)
tp.Show()









    












    





<matplotlib.figure.Figure at 0x1063ba3d0>



In [22]:

    
t_hist = ts.Hist(thousand)
tp.Hist(t_hist, label = "rando")
tp.Show()









    












    





<matplotlib.figure.Figure at 0x10b4b6890>

Assuming that the PMF doesn't work very well, try plotting the CDF instead.



In [15]:

    
thousand_cdf = ts.Cdf(thousand, label='rando')
tp.Cdf(thousand_cdf)
tp.Show()









    












    





<matplotlib.figure.Figure at 0x10b315750>



In [17]:

    
import scipy.stats
scipy.stats?



In [64]:









    Out[64]:





0.5



In [ ]: