scipy.stats contains objects that represent analytic distributions
In [2]:
import scipy.stats
For example scipy.stats.norm represents a normal distribution.
In [3]:
mu = 178
sigma = 7.7
dist = scipy.stats.norm(loc=mu, scale=sigma)
type(dist)
Out[3]:
A "frozen random variable" can compute its mean and standard deviation.
In [4]:
dist.mean(), dist.std()
Out[4]:
It can also evaluate its CDF. How many people are more than one standard deviation below the mean? About 16%
In [5]:
dist.cdf(mu-sigma)
Out[5]:
How many people are between 5'10" and 6'1"?
In [ ]:
scipy.stats.pareto represents a pareto distribution. In Pareto world, the distribution of human heights has parameters alpha=1.7 and xmin=1 meter. So the shortest person is 100 cm and the median is 150.
In [7]:
alpha = 1.7
xmin = 1
dist = scipy.stats.pareto(b=alpha, scale=xmin)
dist.median()
Out[7]:
What is the mean height in Pareto world?
In [ ]:
What fraction of people are shorter than the mean?
In [ ]:
Out of 7 billion people, how many do we expect to be taller than 1 km? You could use dist.cdf or dist.sf.
In [ ]:
How tall do we expect the tallest person to be? Hint: find the height that yields about 1 person.
In [ ]:
Generate a sample from a Weibull distribution and plot it using a transform that makes a Weibull distribution look like a straight line.
In [13]:
import random
import thinkstats2
import thinkplot
In [ ]:
Make a random selection from cdf.
In [1]:
Draw a random sample from cdf.
In [1]:
Draw a random sample from cdf, then compute the percentile rank for each value, and plot the distribution of the percentile ranks.
In [23]:
Out[23]:
Generate 1000 random values using random.random() and plot their PMF.
In [27]:
values = [random.random() for _ in range(1000)]
pmf = thinkstats2.Pmf(values)
thinkplot.Pmf(pmf, linewidth=0.1)
Assuming that the PMF doesn't work very well, try plotting the CDF instead.
In [28]:
cdf = thinkstats2.Cdf(values)
thinkplot.Cdf(cdf)
thinkplot.Show()
In [29]:
In [1]:
In [ ]: