scipy.stats contains objects that represent analytic distributions



In [2]:

    
import scipy.stats

For example scipy.stats.norm represents a normal distribution.



In [3]:

    
mu = 178
sigma = 7.7
dist = scipy.stats.norm(loc=mu, scale=sigma)
type(dist)









    Out[3]:





scipy.stats.distributions.rv_frozen

A "frozen random variable" can compute its mean and standard deviation.



In [4]:

    
dist.mean(), dist.std()









    Out[4]:





(178.0, 7.7000000000000002)

It can also evaluate its CDF. How many people are more than one standard deviation below the mean? About 16%



In [5]:

    
dist.cdf(mu-sigma)









    Out[5]:





0.15865525393145741

How many people are between 5'10" and 6'1"?



In [ ]:

scipy.stats.pareto represents a pareto distribution. In Pareto world, the distribution of human heights has parameters alpha=1.7 and xmin=1 meter. So the shortest person is 100 cm and the median is 150.



In [7]:

    
alpha = 1.7
xmin = 1
dist = scipy.stats.pareto(b=alpha, scale=xmin)
dist.median()









    Out[7]:





1.5034066538560549

What is the mean height in Pareto world?



In [ ]:

What fraction of people are shorter than the mean?



In [ ]:

Out of 7 billion people, how many do we expect to be taller than 1 km? You could use dist.cdf or dist.sf.



In [ ]:

How tall do we expect the tallest person to be? Hint: find the height that yields about 1 person.



In [ ]:

Generate a sample from a Weibull distribution and plot it using a transform that makes a Weibull distribution look like a straight line.



In [13]:

    
import random
import thinkstats2
import thinkplot



In [ ]:

Make a random selection from cdf.



In [1]:

Draw a random sample from cdf.



In [1]:

Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.



In [23]:









    Out[23]:





{'xscale': 'linear', 'yscale': 'linear'}

Generate 1000 random values using random.random() and plot their PMF.



In [27]:

    
values = [random.random() for _ in range(1000)]
pmf = thinkstats2.Pmf(values)
thinkplot.Pmf(pmf, linewidth=0.1)









    



WARNING:root:Pmf: width is very small; Pmf may not be visible.

Assuming that the PMF doesn't work very well, try plotting the CDF instead.



In [28]:

    
cdf = thinkstats2.Cdf(values)
thinkplot.Cdf(cdf)
thinkplot.Show()









    












    





<matplotlib.figure.Figure at 0x3adc0d0>



In [29]:



In [1]:



In [ ]: