Reasoning Under Uncertainty Workshop

PyCon 2015

Part 1 : An invitation to probability

Author : Ronojoy Adhikari
Email : rjoy@imsc.res.in | Web : www.imsc.res.in/~rjoy
Github : www.github.com/ronojoy | Twitter: @phyrjoy



In [ ]:

    
%matplotlib inline
import numpy as np
import scipy.stats as stats
import matplotlib.pylab as pylab
from matplotlib import pyplot as plt

Random variables in python

The probability that a random variable $X$ takes on a value $X=x$ is given by $P(x)$

A random variable is not a number, but rather, the pair consisting of the variable and its probability distribution
Probabilities always sum to one : $\sum_x P(x) = 1$
Random variables can be discrete or continuous, according to the values the random variable can assume.
Discrete random variable - coin tosses, die rolls, ...
Continuous random variable - rainfall, temperature, financial indices ...



In [ ]:

    
# the scipy.stats package has many discrete and continuous random variables

# the coin toss random variable
X = stats.bernoulli(p=0.5)

# bernoulli mean = p, variance = p(1-p)
#X.mean(), X.var()  

# entropy = -\sum_x p(x) * ln p(x)
# quantifies the uncertainty in the random variable
#X.entropy() 

# lots of throws of the coin
x = X.rvs(1000)



In [ ]:

    
# the empirical frequency distribution - aka the histogram
pylab.rcParams['figure.figsize'] = 6, 4 # medium size figures
plt.hist(x, normed = True, color="#348ABD", alpha = 0.4);



In [ ]:

    
# the 'central limit theorem' random variable
X = stats.norm()

# the normal deviate has zero mean and unit variance
#X.mean(), X.var()

# the normal deviate has the greatest entropy amongst all continuous, unbounded 
# random variables with given mean and variance
# see Jaynes - Probability theory, the logic of science - for proof and discussion

#X.entropy()

# a million draws from the normal distribution
x = X.rvs(1e6)



In [ ]:

    
# the empirical frequency distribution - aka the histogram
pylab.rcParams['figure.figsize'] = 6, 4 # medium size figure
plt.hist(x, bins = 50, normed = True, color="#348ABD", alpha = 0.4 );

The central limit theorem

Intuitive idea : add a lot of independent, and identically distributed random variabes and the result will normally distributed.

More precisely : Consider i.i.d random varibles $X_i$ with mean $\mu$ and variance $\sigma^2$.

The sum $S_n = \frac{X_1 + X_2 + \ldots X_n}{n}$ approaches a normal distribution with mean $\mu$ and variance $\frac{\sigma^2}{n}$ as $n\rightarrow \infty$.

Lets see this in Python ....



In [ ]:

    
# lets add lots independent, identically distributed numbers together
X = stats.uniform() # uniform deviates
X = stats.bernoulli(0.5) # coin tosses
X = stats.poisson(1) # telephone calls

print 'The mean and variance of the uniform distribtion are', X.mean(), 'and', X.var(), 'respectively.'

N = [1, 2, 4, 6, 8, 16, 32, 64]

for k, n in enumerate(N):

    # draw the n random variables
    x = X.rvs(size = [n, 1e5])
    
    # get their sum 
    s = x.mean(axis=0)
    
    # plot the distribution of the sum
    sx = plt.subplot(len(N) / 2, 2, k + 1)
    plt.setp(sx.get_yticklabels(), visible=False)
    plt.hist(s, 
             bins = 50,
             label="$\mu = $ %f \n $n\sigma^2 = $ %f " % (s.mean(), n*s.var()),
             color="#348ABD", 
             alpha=0.4)
    
    leg = plt.legend()
    leg.get_frame().set_alpha(0.4)
    plt.autoscale(tight=True)

The central limit theorem

Intuitive idea : add a lot of independent, and identically distributed random variabes and the result will normally distributed.

More precisely : Consider i.i.d random varibles $X_i$ with mean $\mu$ and variance $\sigma^2$.

The sum $S_n = \frac{X_1 + X_2 + \ldots X_n}{n}$ approaches a normal distribution with mean $\mu$ and variance $\frac{\sigma^2}{n}$ as $n\rightarrow \infty$.

The normal distribution $P(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ is important!

Is this a fair coin ? Bayesian inference



In [ ]:

    
# N throws of a coin with parameter theta0
N, theta0 = 500, 0.5

# data contains the outcome of the trial
data = stats.bernoulli.rvs(p = theta0, size = N)

# theta is distributed in the unit interval
theta = np.linspace(0, 1, 128)



In [ ]:

    
# inspiration : Devinder Sivia and Cam Pilon
# compute posterior after the nthrow-th trial

nthrow = [0, 1, 2, 3, 4, 5, 8, 16, 32, N]

for k, n in enumerate(nthrow):

    # number of heads
    heads = data[:n].sum()
    
    # posterior probability of theta with conjugate prior (see Sivia for derivation)
    p = stats.beta.pdf(theta, 1 + heads, 1 + n - heads)

    # plot the posterior
    sx = plt.subplot(len(nthrow) / 2, 2, k + 1)
    plt.setp(sx.get_yticklabels(), visible=False)
    plt.plot(theta, p, label="tosses %d \n heads %d " % (n, heads))
    plt.fill_between(theta, 0, p, color="#348ABD", alpha=0.4)
    plt.vlines(theta0, 0, 4, color="k", linestyles="--", lw=1)
    leg = plt.legend()
    leg.get_frame().set_alpha(0.4)
    plt.autoscale(tight=True)

Exercise 1.1

Compute the Bayesian credible intervals which contain 90% of the probability at each iteration and plot the result
Compute the entropy of the posterior distribution at each stage. How do you interpret changing entropy values ?
How would you answer the question "is this a fair coin" after this analysis ?

Why we need probabilistic programming



In [ ]:

    
X = stats.bernoulli(0.5) # one fair coin
Y = stats.bernoulli(0.5) # another fair coin

# want a new random variable which is their sum
Z = Y + X

Why you no grok random variables ?

Why we need probabilistic programming

# only if inference were this easy
theta ~ uniform(0,1)
coin ~ bernoulli(theta, N=10)
observe(coin.head = 7)
infer(theta)

Why we need probabilistic programming

Need random variables (i.e. both values and distributions) as objects
Need algebra and calculus of random variables to be automatic
Need conditioning, that is $P(A | B)$
Need marginalisation, that is $P(A) = \sum_{B}P(A, B)$

Why we need probabilistic programming

Amazing machine learning and inference applications can be built if programs have probability semantics!

Microsoft : Infer.net
DARPA : Church, BLOG ...
Oxford : Probabilistic C, Anglican
many more contenders, see probabilistic-programming.org

What's happening in the Python world ?