empirical distributions - based on empirical observations. Necessarily finite.
analytic distribution - CDF is a mathematical function.
model - simplification that leaves out unneeded details
In [10]:
%matplotlib inline
import math
import numpy as np
import pandas
import nsfg
import thinkplot
import thinkstats2
import analytic
The CDF for the exponential distribution is $$CDF(x) = 1 - e^{-\lambda x}$$
where $\lambda$ determines the shape of the distribution.
In [11]:
df = analytic.ReadBabyBoom()
diffs = df.minutes.diff()
cdf = thinkstats2.Cdf(diffs, label='actual')
thinkplot.Cdf(cdf)
thinkplot.Show(xlabel='minutes', ylabel='CDF')
plot the complimentary CDF, which is 1 - CDF(x) on a log(y) scale. If you plot a CCDF of a dataset that you think is exponential, you expect to see
$$y \approx e^{-\lambda x} $$
taking the log of both sides:
$$\log y \approx -\lambda x$$
so on a log-y scalse, the CCDF is a straight line with slope $-\lambda$.
In [12]:
thinkplot.Cdf(cdf, complement=True)
thinkplot.Show(xlabel="minutes",
ylabel="CCDF",
yscale='log')
$\lambda$ can be interpreted as the number of events that occur on average in a unit of time.
mean of exponential distribution is ${1 }\ /\ {\lambda}$ .
import scipy.stats
scipy.stats.norm.cdf(0)
--> 0.5
to test normal distribution, use normal pobability plot.
Sort the values in the sample.
From a standard normal distribution, generate a random sample with the same size as the sample and sort it.
Plot the sorted values from the sample versus the random values.
if the distribution of the sample is approximately normal the result is a straight line with intercept $\mu$ and slope $\sigma$.
To do this use:
xs, ys = thinkstats2.NormalProbability(sample)
ys
contains the sorted values from sample
; xs
contains the random values from the standard normal distribution.
In [24]:
def MakeNormalPlot(weights):
mean = weights.mean()
std = weights.std()
xs = [-4, 4]
fxs, fys = thinkstats2.FitLine(xs, inter=mean, slope=std)
thinkplot.Plot(fxs, fys, color='gray', label='model')
xs, ys = thinkstats2.NormalProbability(weights)
thinkplot.Plot(xs, ys, label='birth weights')
MakeNormalPlot(df.weight_g)
lognormal distribution when the logarithms of a set of values have a normal distribution.
$$ CDF_{lognormal}(x) = CDF_{normal}(logx) $$$\mu$ and $\sigma$ are the parameters, but not the mean and standard deviation.
mean is $\exp(\mu + \sigma^2 / 2)$ and the standard deviation is ugly.
If you plot a lognormal's CDF on a logx scale, it will look like a normal distribution.
To test it, you can make a normal probability plot using the log of the values in the sample.
In [ ]: