Chapter 3: Probability and Statistical Distributions



In [190]:

    
from IPython.display import Image
from IPython.display import display

intersection = Image(url='http://www.astroml.org/_images/fig_prob_sum_1.png',width=500)
joint_probability = Image(url='http://www.astroml.org/_images/fig_conditional_probability_1.png',width=800)
random_conversion = Image(url='http://www.astroml.org/_images/fig_transform_distribution_1.png',width=800)
kurtosis_skew = Image(url='http://www.astroml.org/_images/fig_kurtosis_skew_1.png',width=500)
uniform_dist = Image(url='http://www.astroml.org/_images/fig_uniform_distribution_1.png',width=500)
gaussian_dist = Image(url='http://www.astroml.org/_images/fig_gaussian_distribution_1.png',width=500)
binomial_dist = Image(url='http://www.astroml.org/_images/fig_binomial_distribution_1.png',width=500)
poisson_dist = Image(url='http://www.astroml.org/_images/fig_poisson_distribution_1.png',width=500)
cauchy_dist = Image(url='http://www.astroml.org/_images/fig_cauchy_distribution_1.png',width=500)
cauchy_dist2 = Image(url='http://www.astroml.org/_images/fig_cauchy_median_mean_1.png',width=500)
laplace_dist = Image(url='http://www.astroml.org/_images/fig_laplace_distribution_1.png',width=500)
chi_squared_dist = Image(url='http://www.astroml.org/_images/fig_chi2_distribution_1.png',width=500)
student_t_dist = Image(url='http://www.astroml.org/_images/fig_student_t_distribution_1.png',width=500)
fisher_f_dist = Image(url='http://www.astroml.org/_images/fig_fisher_f_distribution_1.png',width=500)
beta_dist = Image(url='http://www.astroml.org/_images/fig_beta_distribution_1.png',width=500)
gamma_dist = Image(url='http://www.astroml.org/_images/fig_gamma_distribution_1.png',width=500)
weibull_dist = Image(url='http://www.astroml.org/_images/fig_weibull_distribution_1.png',width=500)

display(intersection)

Overview of Probability

Definitions

$A$, $B$ are events
$p(A)$ is the probability of $A$ occurring

Union

$p(A ∪ B) = p(A) + p(B) − p(A ∩ B)$

Intersection

$p(A ∩ B) = p(A|B) p(B) = p(B|A) p(A)$

where $p(A|B)$ means the probability of event $A$ occuring "given" that event $B$ has occurred.

Law of Total Probability (derives from intersection definition)

$p(A) = \sum_i{p(A ∩ B_i)} = \sum_i{p(A|B_i) p(B_i)}$

Given that events $B_i$, $i = 1,...,N$ are disjoint and their union is the set of all possible outcomes.

Overview of Random Numbers

Definitions

$x$, $y$ are random variables

Independent variables

Joint probability = Intersection of two events
$p(x, y) = p(x) p(y)$

Dependent variables

$p(x,y) = p(x ∩ y) = p(x|y) p(y) = p(y|x) p(x)$ (see intersection above)

Marginal Probability Function

$p(x) = \int{p(x,y)dy}$ (see law of total probability above)

Law of Total Probability (continuous)

$p(x) = \int{p(x|y)p(y)dy}$ (see law of total probability above)

As an example of joint probability and conditional probability for two variables $x$ and $y$, see the image below.



In [191]:

    
display(joint_probability)

Bayes Rule (Theorem)

$p(x|y) = \frac{p(y|x) p(x)}{p(y)}$
$p(x|y) = \frac{p(y|x) p(x)}{\int{p(x|y)p(y)dy}}$

We can derive this using the definitions above (intersection and law of total probability).
But what does it mean?

Example: Monty Hall Problem

1000 boxes
1 prize in the box
You guess which one has the prize with $p$(prize in your box) = 1/1000; $p$(prize not in your box) = 999/1000
Someone else who knows where the prize is, opens 998 of the non-prize boxes, so only 1 remains;
Do you switch boxes? or Keep the one you have?
Still $p$(prize in your box) = 1/1000; $p$(prize not in your box) = $p$(prize in other box) = 999/1000
Much different than probability if someone walked into room seeing two boxes.

There is a full derivation in the text, but I'm not going to repeat it here, but I do have a numerical demonstration:

The Monty Hall Problem Tester



In [192]:

    
import numpy as np

# Number of boxes and trials
n_boxes = 3
n_trials = 1000
n_boxes_to_remove = n_boxes - 2 # n-2 leaves 1 box to choose from

# Initializing vars
keep_box_win = 0
switch_box_win = 0
no_win = 0
debug = False

for i in range(n_trials):
    
    available_boxes = range(n_boxes)
    prize_box = np.random.choice(n_boxes, 1)
    if debug: print "Prize", prize_box

    # We choose the first one
    our_choice = np.random.choice(n_boxes, 1)
    if debug: print "Our choice", our_choice
    available_boxes.remove(our_choice)
    if debug: print "Available boxes", available_boxes
    
    # We first temporarily remove the prize, so to make sure *Monty* doesn't remove it.
    if prize_box != our_choice:
        available_boxes.remove(prize_box)

    # Now Monty Hall removes all-but-one remaining box (none have the prize)    
    # This must be done serially in order to assure no duplicate removals
    for j in range(n_boxes_to_remove):
        available_boxes.remove(np.random.choice(available_boxes, 1))
    if debug: print "Monty removes %d box(es) leaving %d box(es)." % (n_boxes_to_remove, n_boxes-n_boxes_to_remove-1)
        
    # Put the prize box back in the remaining boxes if possible
    if prize_box != our_choice:
        available_boxes.extend(prize_box)
    if debug: print "Available boxes", available_boxes
    
    # Do we win for keeping our box?
    if prize_box == our_choice:
        keep_box_win += 1.
    elif np.random.choice(available_boxes, 1) == prize_box:
        switch_box_win += 1.
    else:
        no_win += 1.
        
n_wins = keep_box_win + switch_box_win
print "Wins by keeping our first choice box = %d / %d = %4.3f    Percent of wins = %4.3f" \
       % (keep_box_win, n_trials, keep_box_win/n_trials, keep_box_win/n_wins)
print "Wins by switching to available box   = %d / %d = %4.3f    Percent of wins = %4.3f" \
       % (switch_box_win, n_trials, switch_box_win/n_trials, switch_box_win/n_wins)
print "No winner                            = %d / %d = %4.3f" % (no_win, n_trials, no_win/n_trials)









    



Wins by keeping our first choice box = 336 / 1000 = 0.336    Percent of wins = 0.336
Wins by switching to available box   = 664 / 1000 = 0.664    Percent of wins = 0.664
No winner                            = 0 / 1000 = 0.000

The Effect of Transforming Random Variables

$x$ is random variable
$y = \Phi(x)$
$x = \Phi^{-1}(y)$
If we know the probability distribution of $p(x)$, what is the probability distribution of $p(y)$?
Start with: $\int{p(y)dy} = \int{p(x)dx}$
to: $p(y) = p(x)\frac{dx}{dy}$
to: $p(y) = p(\Phi^{-1}(y)) \frac{d\Phi^{-1}(y)}{dy}$

Using an example of $y = exp(x)$ -> $x = ln(y)$, where x is a uniform random distribution between (0,1). Plugging in:

$\Phi = exp$
$\Phi^{-1} = ln$
$\frac{d\Phi^{-1}(y)}{dy} = \frac{d(ln(y))}{dy} = \frac{1}{y}$
$p(y) = p(\Phi^{-1}(y)) \frac{d\Phi^{-1}(y)}{dy} = \frac{p(x)}{y} = \frac{1}{y}$

So the distribution will follow a $1/y$ profile from $y = \Phi(x) = exp(x)$ for x = (0,1) -> $y = (1,e)$ as seen below:



In [193]:

    
display(random_conversion)

Definitions of Descriptive Statistics

For a distribution $h(x)$ of values from the full population set = population statistics For a partial sampling of $h(x)$, we refer to it as $f(x)$ = sample statistics

Arithemetic Mean (also known as expectation value)

$\mu = E(x) = \int_{-\inf}^{\inf}x h(x) dx$
np.mean()

Variance

$V = \int_{-\inf}^{\inf}(x-\mu)^2 h(x) dx$
Broadness of distribution
np.var()

Standard Deviation

$\sigma = \sqrt V$
np.std()

Skewness

$\Sigma = \int_{-\inf}^{\inf}(\frac{x-\mu}{\sigma})^3 h(x) dx$
Asymmetry of distribution
Use carefully in cases with small sample size
scipy.stats.skew()

Kurtosis

$K = \int_{-\inf}^{\inf}(\frac{x-\mu}{\sigma})^4 h(x) dx - 3$
Peakedness of distribution relative to Gaussian ($K_{gauss}$ = 0)
Use carefully in cases with small sample size
scipy.stats.kurtosis()

Absolute Deviation about $d$

$\delta = \int_{-\inf}^{\inf}|x-d| h(x) dx$

Mode

$x_m = (\frac{dh(x)}{dx})_{x_m}$
scipy.stats.mode()

Quantiles (also known as percentiles for $p$)

$\frac{p}{100} = \int_{-\inf}^{q_p} h(x)dx$
np.percentile()



In [194]:

    
display(kurtosis_skew)

Median/Quartile estimates vs Mean/Sigma estimates for a sample

Median and quartile values are much less susceptible to deviation from outliers.
But median and quartile values are more expensive to calculate for large samples than mean and standard deviation.
Can estimate sigma based on the interquartile range:

$\sigma_G = 0.7413 (q_{75} - q_{25})$

This can be found in the stats package: stats.sigmaG()

A function for plotting up any arbitrary distribution and identifying its characteristics



In [280]:

    
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

def plot_dist(pop, dist=None):

    fig = plt.figure(figsize=(10,7))
    ax = fig.add_subplot(111)
    n, bins, patches = ax.hist(pop, 15, histtype='stepfilled', normed=True)
    #ax.hist(pop, 15, histtype='stepfilled', normed=True)
    #ax.plot(bins, y

    
    ylim = [0,10]
    mean = np.mean(pop)
    median = np.median(pop)
    std1 = (mean-np.std(pop))*np.ones(2)
    std2 = (mean+np.std(pop))*np.ones(2)
    q25 = np.percentile(pop,[25])*np.ones(2)
    q75 = np.percentile(pop,[75])*np.ones(2)
    var = np.var(pop)
    skew = stats.skew(pop)
    kurtosis = stats.kurtosis(pop)
    
    if dist:
        x = np.linspace(pop.min(), pop.max(), 1000)
        pdf = dist.pdf(x)
        ax.plot(x, pdf, '-r', linewidth=3, label='PDF')
        
    ax.plot(mean*np.ones(2), ylim, label='mean')
    ax.plot(median*np.ones(2), ylim, label='median')
    ax.plot(std1, ylim, label='mean - std')
    ax.plot(std2, ylim, label='mean + std')
    ax.plot(q25, ylim, label='1st quartile')
    ax.plot(q75, ylim, label='3rd quartile')
    ax.text(0.05,0.95,'Mean = %4.1f' % mean, transform=ax.transAxes, fontsize=15)
    ax.text(0.05,0.90,'Median = %4.1f' % median, transform=ax.transAxes, fontsize=15)
    ax.text(0.05,0.85,'Variance = %4.1f' % var, transform=ax.transAxes, fontsize=15)
    ax.text(0.05,0.80,'Skew = %4.1f' % skew, transform=ax.transAxes, fontsize=15)
    ax.text(0.05,0.75,'Kurtosis = %4.1f' % kurtosis, transform=ax.transAxes, fontsize=15)

    ax.set_ylim(0,1.4*max(n))
    ax.legend()



In [281]:

    
# 10000 nums from a uniform dist
dist = stats.uniform(0,1)
pop = dist.rvs(10000)
# alternatively
# pop = np.random.random(1000)
plot_dist(pop, dist)

All of the Distributions!

Normal Distribution (Top Hat)

$p(x|\mu,W) = \frac{1}{W}$ for $|x-\mu| \le \frac{W}{2}$
$p(x|\mu,W) = 0$ elsewhere
for width of interval = W
scipy.stats.uniform(left_edge, right_edge)
np.random.random()
Continuous Distribution
When someone asks you for a random number between x and y, this is the standard distribution function.



In [282]:

    
display(uniform_dist)



In [283]:

    
# 10000 nums from a uniform dist
dist = stats.uniform(0,1)
pop = dist.rvs(10000)
# alternatively
# pop = np.random.random(1000)
plot_dist(pop, dist)

Gaussian Distribution (Normal Distribution)

$p(x|\mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}}exp(\frac{-(x-\mu)^2}{2\sigma^{2}})$
scipy.stats.norm(mean, std)
Continuous Distribution
Special distribution:
- A convolution of two gaussians is also gaussian
- Fourier transform of a gaussian is also gaussian
- Sample mean and sample variance are independent
- Central Limit Theorem tells us that the mean of samples drawn from an almost arbitrary distrubtion follow a Gaussian



In [284]:

    
display(gaussian_dist)



In [285]:

    
# 10000 nums from a gaussian dist
dist = stats.norm(0,1)
pop = dist.rvs(10000)
# alternatively
# pop = np.random.random(1000)
plot_dist(pop, dist)

Binomial Distribution

$p(k|b, N) = \frac{N!}{k!(N-k)!}b^k(1-b)^{N-k}$
scipy.stats.binom(N trials, prob b)
Discrete Distribution
A variable can be either 0 or 1 with probability of success = b; N trials
Appropriate for a number of discrete boolean events (yes or no) which are independent
Example: How many heads do you get when you flip a coin N times?
Can be generalized to multinomial distribution



In [286]:

    
display(binomial_dist)



In [287]:

    
# 10000 nums from a binomial dist
dist = stats.binom(20,0.5) # 20 trials, probability 0.5
pop = dist.rvs(10000)
plot_dist(pop)

Poisson Distribution

$p(k|\mu) = \frac{\mu^k exp(-\mu)}{k!}$
scipy.stats.poisson(mu)
Discrete Distribution
Special case of binomial distribution when $N$ approaches infinity, so prob of success remains fixed.
"Law of rare events", p is small not \mu
The mean $\mu$ fully describes the Poisson Distribution:
- mode = $\mu - 1$
- std = $\sqrt \mu$
- skewness = $\frac{1}{\sqrt \mu}$
- kurtosis = $\frac{1}{\mu}$
As $\mu$ increases, approximates a gaussian
Poisson distribution describes the distribution of the number of photons counted in a given interval.



In [288]:

    
display(poisson_dist)



In [289]:

    
# 10000 nums from a poisson dist
dist = stats.poisson(5) # mean = 20
pop = dist.rvs(10000)
plot_dist(pop)

Cauchy (Lorentzian) Distribution

$p(x|\mu, \gamma) = \frac{1}{\pi \gamma} (\frac{\gamma^2}{\gamma^2 + (x - \mu)^2})$
scipy.stats.cauchy(mu, gamma)
Continuous Distribution
Symmetric distribution described by the location parameter $\mu$ and the scale parameter $\gamma$.
Median = mode = $\mu$
Tails descrease slowly for large $|x|$, mean, variance, std, and higher moments do not exist
Can estimate std from interquartile range: $\sigma_G = 1.483\gamma$
Clipped mean approach for value estimates



In [290]:

    
display(cauchy_dist)



In [293]:

    
# 10000 nums from a poisson dist
dist = stats.cauchy(0,1)
pop = dist.rvs(100) #increase N and see divergence
plot_dist(pop)

Exponential (Laplace) Distribution

$p(x|\mu, \Delta) = \frac{1}{2\Delta}exp(\frac{-|x-\mu|}{\Delta})$
scipy.stats.laplace(mu, delta)
Continuous Distribution
Symmetric distribution described by the location parameter $\mu$ and the scale parameter $\Delta$.
Median = mode = $\mu$
It describes the time between two successive events which occur continuously and independently
Example: photons arriving at a detector



In [294]:

    
display(laplace_dist)



In [295]:

    
# 10000 nums from a laplace dist
dist = stats.laplace(0, 10)
pop = dist.rvs(10000) #increase N and see divergence
plot_dist(pop)

$\chi^2$ Distribution

$p(Q|k) = \chi^2(Q|k) = \frac{1}{2^{k/2}\Gamma(k/2)} Q^{k/2-1} exp(\frac{-Q}{2})$ for $Q>0$
scipy.stats.chi2(k)
Continuous Distribution
RAN OUT OF STEAM



In [296]:

    
display(chi_squared_dist)



In [297]:

    
# 10000 nums from a chi^2 dist
dist = stats.chi2(10)
pop = dist.rvs(10000) #increase N and see divergence
plot_dist(pop)



In [ ]: