Metropolis Sandbox

This notebook is for playing with the Metropolis algorithm in the context of fitting a linear model. Tinker with how it works and see if you can make it fail.

Preliminaries

Import things


In [ ]:
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import scipy.stats
%matplotlib inline

Read in a pre-prepared data file (or go off and make your own). See what the data look like.

The pre-prepared data were produced from the model

  • $x \sim \mathrm{Normal}(e, 1)$
  • $y \sim \mathrm{Normal}(\pi+\phi\,x, 1)$, where $\phi$ is the Golden Ratio

In [ ]:
data = np.loadtxt('mc1_sandbox.dat')
x = data[:,0]
y = data[:,1]
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(x, y, 'o');

It will be convenient to compare what we get from MCMC with the exact solution. Here is a class that packages that up (assuming uniform priors). Note that a and b need to be changed if you made up data according to a different model (as well as some of the aranges below).


In [ ]:
a, b = np.pi, 1.6818
class ExactPosterior:
    def __init__(self, x, y, a0, b0):
        X = np.matrix(np.vstack([np.ones(len(x)), x]).T)
        Y = np.matrix(y).T
        self.invcov = X.T * X
        self.covariance = np.linalg.inv(self.invcov)
        self.mean = self.covariance * X.T * Y
        self.a_array = np.arange(0.0, 6.0, 0.02)
        self.b_array = np.arange(0.0, 3.25, 0.02)
        self.P_of_a = np.array([self.marg_a(a) for a in self.a_array])
        self.P_of_b = np.array([self.marg_b(b) for b in self.b_array])
        self.P_of_ab = np.array([[self.lnpost(a,b) for a in self.a_array] for b in self.b_array])
        self.P_of_ab = np.exp(self.P_of_ab)
        self.renorm = 1.0/np.sum(self.P_of_ab)
        self.P_of_ab = self.P_of_ab * self.renorm
        self.levels = scipy.stats.chi2.cdf(np.arange(4,1,-1)**2, 1) # confidence levels corresponding to contours below
        self.contourLevels = self.renorm*np.exp(self.lnpost(a0,b0)-0.5*scipy.stats.chi2.ppf(self.levels, 2))
    def lnpost(self, a, b): # the 2D posterior
        z = self.mean - np.matrix([[a],[b]])
        return -0.5 * (z.T * self.invcov * z)[0,0]
    def marg_a(self, a): # marginal posterior of a
        return scipy.stats.norm.pdf(a, self.mean[0,0], np.sqrt(self.covariance[0,0]))
    def marg_b(self, b): # marginal posterior of b
        return scipy.stats.norm.pdf(b, self.mean[1,0], np.sqrt(self.covariance[1,1]))
exact = ExactPosterior(x, y, a, b)

Demo some plots of the exact posterior distribution


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(exact.a_array, exact.P_of_a); plt.xlabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(exact.b_array, exact.P_of_b); plt.xlabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.contour(exact.a_array, exact.b_array, exact.P_of_ab, colors='blue', levels=exact.contourLevels);
plt.plot(a, b, 'o', color='red'); plt.xlabel('a'); plt.ylabel('b');

The model

We'll use uniform priors, for ease of comparison to the exact solution.


In [ ]:
def lnPrior(params):
    return 0.0

Define a likelihood function.


In [ ]:
def lnLike(params, x, y):
    a = params[0]
    b = params[1]
    return -0.5*np.sum((a+b*x - y)**2)

Package up a log-posterior function.


In [ ]:
def lnPost(params, x, y):
    return lnPrior(params) + lnLike(params, x, y)

The sampler

Improve as you see fit!

Let's use a simple Gaussian proposal distribution, for lack of any better ideas.


In [ ]:
def propose(params, width):
    return params + width * np.random.randn(params.shape[0])

Here's a function for taking a step (or not).


In [ ]:
def step(current_params, current_lnP, width=1.0):
    trial_params = propose(current_params, width=width)
    trial_lnP = lnPost(trial_params, x, y)
    if np.log(np.random.rand(1)) <= trial_lnP-current_lnP:
        return (trial_params, trial_lnP)
    else:
        return (current_params, current_lnP)

And away we go

Here we set a random initial value on [-5,5] for each parameter, then run the sampler.

Nsamples is set to 100 below, but you'll want to increase it after seeing that everything works.


In [ ]:
params = -5.0 + np.random.rand(2) * 10.0
lnP = lnPost(params, x, y)

Nsamples = 100
samples = np.zeros((Nsamples, 2))

for i in range(Nsamples):
    params, lnP = step(params, lnP)
    samples[i,:] = params

Visualize the chain in two dimensions


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 7.0)
plt.plot(samples[:,0], samples[:,1]);
plt.plot(samples[0,0], samples[0,1], 'ro');
plt.xlabel('a'); plt.ylabel('b');

Look at the traces of each parameter (vs time).


In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,0], 'o', ms=1.0); plt.ylabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,1], 'o', ms=1.0); plt.ylabel('b');

On the basis of these diagnostics, we should identify a burn-in period to throw away hereafter. We could also thin the remaining chain to reduce the number of highly correlated and therefore redundant points.

Here I've removed the first 500 points, but you should change this to whatever makes sense.


In [ ]:
samples = samples[np.arange(501,Nsamples),:]

Repeat earlier plots, "zoomed in" on the remaining samples


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 7.0)
plt.plot(samples[:,0], samples[:,1]);
plt.xlabel('a'); plt.ylabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,0], 'o', ms=1.0); plt.ylabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,1], 'o', ms=1.0); plt.ylabel('b');

Compare the marginal and joint posterior distributions to the exact solution.


In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.hist(samples[:,0], 20, normed=True, color='cyan');
plt.plot(exact.a_array, exact.P_of_a, color='red');
plt.xlabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.hist(samples[:,1], 20, normed=True, color='cyan');
plt.plot(exact.b_array, exact.P_of_b, color='red');
plt.xlabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.plot(samples[:,0], samples[:,1], 'o', ms=1.0);
plt.contour(exact.a_array, exact.b_array, exact.P_of_ab, colors='red', levels=exact.contourLevels);
plt.xlabel('a'); plt.ylabel('b');

Play

Mess around with the specifics of our Metropolis implementation above, in particular

  1. the initial state
  2. the proposal distribution width
  3. the chain length

You could also insert code to keep track of the acceptance fraction (or compute it afterwards).

In practice, we don't usually know where the best fit will be, and can't wait around while the posterior is evaluated an arbitrarily large number of times. With this in mind, brainstorm strategies for accelerating convergence. We'll come back to some concrete methods later in the course.