Exercise from Think Stats, 2nd Edition (thinkstats2.com)
Allen Downey
Read the pregnancy file.
In [1]:
%matplotlib inline
import nsfg
preg = nsfg.ReadFemPreg()
Select live births, then make a CDF of totalwgt_lb.
In [4]:
import thinkstats2 as ts
live = preg[preg.outcome == 1]
wgt_cdf = ts.Cdf(live.totalwgt_lb, label = 'weight')
Display the CDF.
In [6]:
import thinkplot as tp
tp.Cdf(wgt_cdf, label = 'weight')
tp.Show()
Find out how much you weighed at birth, if you can, and compute CDF(x).
In [44]:
Out[44]:
If you are a first child, look up your birthweight in the CDF of first children; otherwise use the CDF of other children.
In [59]:
Out[59]:
Compute the percentile rank of your birthweight
In [46]:
Out[46]:
Compute the median birth weight by looking up the value associated with p=0.5.
In [45]:
Out[45]:
Compute the interquartile range (IQR) by computing percentiles corresponding to 25 and 75.
In [47]:
Out[47]:
Make a random selection from cdf.
In [48]:
Out[48]:
Draw a random sample from cdf.
In [49]:
Out[49]:
Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.
In [50]:
Generate 1000 random values using random.random() and plot their PMF.
In [7]:
import random
random.random?
In [14]:
import random
thousand = [random.random() for x in range(1000)]
thousand_pmf = ts.Pmf(thousand, label = 'rando')
tp.Pmf(thousand_pmf, linewidth=0.1)
tp.Show()
In [22]:
t_hist = ts.Hist(thousand)
tp.Hist(t_hist, label = "rando")
tp.Show()
Assuming that the PMF doesn't work very well, try plotting the CDF instead.
In [15]:
thousand_cdf = ts.Cdf(thousand, label='rando')
tp.Cdf(thousand_cdf)
tp.Show()
In [17]:
import scipy.stats
scipy.stats?
In [64]:
Out[64]:
In [ ]: