Week 3 Tutorial

The Metropolis Sampler

This notebook is for playing with the Metropolis algorithm in the context of fitting a linear model.

As usual, there is some code provided below, which you will need to complete in places (or write your own from scratch, alternatively).

Reminder!

After pulling down the tutorial notebook, immediately make a copy. Then do not modify the original. Do your work in the copy. This will prevent the possibility of git conflicts should the version-controlled file change at any point in the future. (The same exhortation applies to homeworks.)

Preliminaries

Import things


In [ ]:
import numpy as np
import statsmodels.api as sm
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import scipy.stats
%matplotlib inline

class SolutionMissingError(Exception):
    def __init__(self):
        Exception.__init__(self,"You need to complete the solution for this code to work!")
def REPLACE_WITH_YOUR_SOLUTION():
    raise SolutionMissingError
REMOVE_THIS_LINE = REPLACE_WITH_YOUR_SOLUTION

First, we'll generate a mock data set to fit. You can choose different values below, but otherwise let's use some easily recognizable numbers.

  • $x \sim \mathrm{Normal}(e, 1)$
  • $y \sim \mathrm{Normal}(\pi+\sqrt{2} ~ x, 1)$

For simplicity, we'll take the $x$ values to be measured precisely, and the $y$ values to have (known) unit error bars, accounting for the scatter in $y$ above. Hence the model parameters to be fit are just $a$ and $b$, the intercept and slope of the linear model.


In [ ]:
true_a = np.pi
true_b = np.sqrt(2.0)

In [ ]:
try:
    exec(open('Solution/mock_data.py').read())
except IOError:
    ndata = 50
    x = np.random.normal(np.exp(1.0), 1.0, ndata)
    y = REPLACE_WITH_YOUR_SOLUTION()

plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(x, y, 'o');

It will be convenient to compare what we get from MCMC with the exact solution, which is easy to calculate in this case (assuming uniform priors). Here is a class that packages that up. Note that if you made changed the model that the data are drawn from, the plot ranges below may need to be updated to reflect that.


In [ ]:
class ExactPosterior:
    def __init__(self, x, y, a0, b0):
        # Here's the linear algebra done manually
        ##X = np.matrix(np.vstack([np.ones(len(x)), x]).T)
        ##Y = np.matrix(y).T
        ##self.invcov = X.T * X
        ##self.covariance = np.linalg.inv(self.invcov)
        ##self.mean = self.covariance * X.T * Y
        # It's more easily generalizable to use a library instead
        model = sm.OLS(y, sm.add_constant(x))
        ols = model.fit()
        self.mean = np.matrix(ols.params).T
        self.covariance = ols.normalized_cov_params
        self.invcov = np.linalg.inv(self.covariance)
        self.a_array = np.arange(0.0, 6.0, 0.02)
        self.b_array = np.arange(0.0, 3.25, 0.02)
        self.P_of_a = np.array([self.marg_a(a) for a in self.a_array])
        self.P_of_b = np.array([self.marg_b(b) for b in self.b_array])
        self.P_of_ab = np.array([[self.lnpost(a,b) for a in self.a_array] for b in self.b_array])
        self.P_of_ab = np.exp(self.P_of_ab)
        self.renorm = 1.0/np.sum(self.P_of_ab)
        self.P_of_ab = self.P_of_ab * self.renorm
        self.levels = scipy.stats.chi2.cdf(np.arange(4,1,-1)**2, 1) # confidence levels corresponding to contours below
        self.contourLevels = self.renorm*np.exp(self.lnpost(a0,b0)-0.5*scipy.stats.chi2.ppf(self.levels, 2))
    def lnpost(self, a, b): # the 2D posterior
        z = self.mean - np.matrix([[a],[b]])
        return -0.5 * (z.T * self.invcov * z)[0,0]
    def marg_a(self, a): # marginal posterior of a
        return scipy.stats.norm.pdf(a, self.mean[0,0], np.sqrt(self.covariance[0,0]))
    def marg_b(self, b): # marginal posterior of b
        return scipy.stats.norm.pdf(b, self.mean[1,0], np.sqrt(self.covariance[1,1]))
exact = ExactPosterior(x, y, true_a, true_b)

Demo some plots of the exact posterior distribution


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(exact.a_array, exact.P_of_a); plt.xlabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.plot(exact.b_array, exact.P_of_b); plt.xlabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.contour(exact.a_array, exact.b_array, exact.P_of_ab, colors='blue', levels=exact.contourLevels);
plt.plot(true_a, true_b, 'o', color='red'); plt.xlabel('a'); plt.ylabel('b');

Coding the model

Use uniform priors, for ease of comparison to the exact solution.


In [ ]:
def lnPrior(params):
    return 0.0

Define a likelihood function.


In [ ]:
try:
    exec(open('Solution/lnLike.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def lnLike(params, x, y):
        a = params[0]
        b = params[1]
        REPLACE_WITH_YOUR_SOLUTION()

Package up a log-posterior function.


In [ ]:
try:
    exec(open('Solution/lnPost.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def lnPost(params, x, y):
        REPLACE_WITH_YOUR_SOLUTION()

Coding the sampler

Improve as you see fit!

Let's use a simple Gaussian proposal distribution, for lack of any better ideas. The width of the distribution might as well be an option.


In [ ]:
def propose(params, width):
    return params + width * np.random.randn(params.shape[0])

Next, we need a function to propose a step and decide whether to accept or reject it.


In [ ]:
try:
    exec(open('Solution/step.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def step(current_params, current_lnP, width=1.0):
        trial_params = REPLACE_WITH_YOUR_SOLUTION()
        trial_lnP = REPLACE_WITH_YOUR_SOLUTION()
        if REPLACE_WITH_YOUR_SOLUTION():
            return (trial_params, trial_lnP)
        else:
            return (current_params, current_lnP)

And away we go

Here we set a random initial value on [-5,5] for each parameter, then run the sampler.

Start with Nsamples set to something small (like 100) to verify that the code works as expected. You'll probably need to use a larger value to get convergence, though.


In [ ]:
# choose an intial state and evaluate the posterior there
params = -5.0 + np.random.rand(2) * 10.0
lnP = lnPost(params, x, y)

# set up an array to hold the chain
try:
    exec(open('Solution/Nsamples.py').read())
except IOError:
    Nsamples = REPLACE_WITH_YOUR_SOLUTION()
samples = np.zeros((Nsamples, 2))

# run the sampler
for i in range(Nsamples):
    params, lnP = step(params, lnP)
    samples[i,:] = params

Visualize the chain in two dimensions


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 7.0)
plt.plot(samples[:,0], samples[:,1]);
plt.plot(samples[0,0], samples[0,1], 'ro');
plt.xlabel('a'); plt.ylabel('b');

Look at the traces of each parameter (vs time).


In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,0], 'o', ms=1.0); plt.ylabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,1], 'o', ms=1.0); plt.ylabel('b');

On the basis of these diagnostics, we should identify a burn-in period to throw away hereafter. We could also thin the remaining chain to reduce the number of highly correlated and therefore redundant points.

Here I've removed the first half of the chain, but you should change this to whatever makes sense.


In [ ]:
samples = samples[np.arange(int(0.5*Nsamples),Nsamples),:]

Repeat earlier plots, "zoomed in" on the remaining samples


In [ ]:
plt.rcParams['figure.figsize'] = (7.0, 7.0)
plt.plot(samples[:,0], samples[:,1]);
plt.xlabel('a'); plt.ylabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,0], 'o', ms=1.0); plt.ylabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (12.0, 3.0)
plt.plot(samples[:,1], 'o', ms=1.0); plt.ylabel('b');

Compare the marginal and joint posterior distributions to the exact solution.


In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.hist(samples[:,0], 20, normed=True, color='cyan');
plt.plot(exact.a_array, exact.P_of_a, color='red');
plt.xlabel('a');

In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.hist(samples[:,1], 20, normed=True, color='cyan');
plt.plot(exact.b_array, exact.P_of_b, color='red');
plt.xlabel('b');

In [ ]:
plt.rcParams['figure.figsize'] = (5.0, 5.0)
plt.plot(samples[:,0], samples[:,1], 'o', ms=1.0);
plt.contour(exact.a_array, exact.b_array, exact.P_of_ab, colors='red', levels=exact.contourLevels);
plt.xlabel('a'); plt.ylabel('b');

Play

If you didn't make any changes to the above suggested code, chances are what you've got as a result is junk (clearly not in agreement with the correct answer, which we happen to know in this case). This is not an accident. You should be able to improve things by playing with the various knobs and levers at our disposal, including

  1. the proposal distribution width
  2. the chain length
  3. the initial state

Try modifying each of these to see what happens. In each case, there are tradeoffs involved.

In practice, we don't usually know where the best fit will be, and can't wait around while the posterior is evaluated an arbitrarily large number of times. There are plenty of strategies for sampling more efficiently, which we'll come back to later in the course.


In [ ]:

Diagnostics

Since we don't normally have the correct solution to compare to, we need another way to reassure ourselves that the MCMC sampler is getting us the right answer. To that end, run multiple chains with well separated starting positions, and use the diagnostic methods covered in class to assess their convergence.


In [ ]: