A simple MCMC example

Here we will set up a simple MCMC code to estimate the parameters of a straight line: $y=mx+c$.

We will simulate some data containing a line with parameters $m=1$ and $c=2.5$, and Gaussian noise ($\sim N(0, 2.5^2)$).


In [1]:
%matplotlib inline
import matplotlib.pyplot as pl

# import numpy
import numpy as np

# set the x-axis linearly spaced between 0 and 10 in 100 steps
x = np.linspace(0, 10, 100)
#x = np.linspace(-5,5,100)

# create our simulate line
m = 1.
c = 2.5

# a function defining our line model
def linemodel(m,c,x):
    return m*x + c
    
line = linemodel(m,c,x)

# create some noise to add to our line
sig = 2.5 #  the noise standard deviation
noise = sig*np.random.randn(100)

# create data
y = line + noise

# let's plot the data too
pl.plot(x, y, 'bo', x, line, 'k--')


Out[1]:
[<matplotlib.lines.Line2D at 0x7fe161cd4990>,
 <matplotlib.lines.Line2D at 0x7fe161cd4c10>]

Now we need to define our likelihood function. The noise in the data is Gaussian, so we can use a Gaussian likelihood function. Due to numerical precision issues we will work with natural logarithms of the likelihood function (the likelihood value could become very, very small).

Note that in the likelihood function we do not include the normalisation factor of the Gaussian distribution. This is because the MCMC just works with probability ratios, so the normalisation would cancel out.


In [2]:
# a Gaussian likelihood function given some data and a model for that data,
# and a known noise standard deviation sigma 
def loglikelihood(data, model, sigma):
    return -np.sum((data-model)**2)/(2.*sigma**2)

Now we need to define our priors on the parameters $m$ and $c$. We will use flat priors for both parameters over a given range, such that: $$ p(m|I) = \begin{cases} 1/10, -5 \le m \le 5 \\ 0~{\rm otherwise} \end{cases} $$ and $$ p(c|I) = \begin{cases} 1/20, -10 \le c \le 10 \\ 0~{\rm otherwise} \end{cases} $$ Note that in the MCMC below we will just use the extent of these prior ranges rather than the actual values, as (again) we are just interested in probability ratios, so these (flat) prior values would just cancel. If not using flat priors then they may need to be included in forming the posterior probability ratio.


In [3]:
ranges = [[-5., 5.], # the range of allowed values for m
          [-10., 10.]] # the range of allowed values for c

# create a function that is true if a point is within the above ranges
def priorfunc(sample):
    for i in range(len(sample)):
        if sample[i] <= ranges[i][0] or sample[i] >= ranges[i][1]:
            return 0
    
    return 1

We now set up some parameters for the MCMC:

  • the number of burn-in points (samples in the MCMC that we will discard assuming the chain has not converged)
  • the number of samples used to sample the posterior pdf
  • the proposal distribution for $m$ and $c$
    • this will be a multi-variate Gaussian pdf with covariance matrix refined during burn-in
  • an initial start point for $m$ and $c$ drawn at random from within there allow ranges

In [4]:
# the number of burn-in points
Nb = 50000

# the number of sample points
Ns = 50000

# the total number of points
N = Nb + Ns

# initial proposal widths for m and c (use a best guess!)
cov = [[1., 0.], [0., 1.]] # initial covariance matrix of proposal (uncorrelated)
propscale = 2.5 # a scale factor for the proposal covariance (to try and get an acceptance probability of ~40%)

# set the number point to use in recalculating the proposal width during burn-in
Nprop = 1000

# create a random sample start point
samples = [[ranges[0][0] + np.diff(ranges[0])[0]*np.random.rand(), ranges[1][0] + np.diff(ranges[1])[0]*np.random.rand()]]

# get the log likelihood for this point
logl = [loglikelihood(y, linemodel(samples[0][0], samples[0][1], x), sig)]

# a counter to see the acceptance ratio as the MCMC progresses
acc = 0

Now we'll start our MCMC! for our Metropolis-Hastings ratio that determines acceptance of a new point (where the ratio of proposal probabilities is 1 as we are using symmetric proposals).


In [5]:
# use a for loop to run the MCMC
for i in range(1, N):
    # sample a new point in m and c from the proposal distribution
    newsample = np.random.multivariate_normal(samples[i-1], cov)
    
    # check if the point is within the prior range
    if not priorfunc(newsample):
        # reject the new point (i.e. add the current point on to the chain)
        samples.append(samples[i-1])
        logl.append(logl[i-1])
        continue
    
    # get the log likelihood of the new point
    loglnew = loglikelihood(y, linemodel(newsample[0], newsample[1], x), sig)
    
    # get the Metropolis-Hastings ratio (the ratio of proposals will just be 1)
    logr = loglnew - logl[i-1] # note in log space the ratio is a difference
    
    # accept if the ratio is greater than 1, otherwise accept with probabiliy r
    # in log-space this means accept if logr greater than 0
    if logr > np.log(np.random.rand()):
        # accepted!
        acc = acc + 1
        
        # add new sample onto chain
        samples.append(list(newsample))
        logl.append(loglnew)
    else:
        # rejected!
        # add current sample onto chain
        samples.append(samples[i-1])
        logl.append(logl[i-1])

    #  print out current acceptance probability every Nprop points
    if i%Nprop == 0:
        print('Num. accepted %d, acceptance probability = %.2f' % (acc, acc/float(Nprop)))
        acc = 0 # reset counter
        
    # every Nprop points during burn-in recalculate the covariance matrix using the last
    # Nprop points
    if i < Nb and i%Nprop == 0:
        cov = propscale*np.cov(np.array(samples[-Nprop:]).T)


Num. accepted 31, acceptance probability = 0.03
Num. accepted 53, acceptance probability = 0.05
Num. accepted 370, acceptance probability = 0.37
Num. accepted 410, acceptance probability = 0.41
Num. accepted 403, acceptance probability = 0.40
Num. accepted 397, acceptance probability = 0.40
Num. accepted 364, acceptance probability = 0.36
Num. accepted 398, acceptance probability = 0.40
Num. accepted 395, acceptance probability = 0.40
Num. accepted 383, acceptance probability = 0.38
Num. accepted 349, acceptance probability = 0.35
Num. accepted 378, acceptance probability = 0.38
Num. accepted 389, acceptance probability = 0.39
Num. accepted 369, acceptance probability = 0.37
Num. accepted 395, acceptance probability = 0.40
Num. accepted 381, acceptance probability = 0.38
Num. accepted 380, acceptance probability = 0.38
Num. accepted 383, acceptance probability = 0.38
Num. accepted 382, acceptance probability = 0.38
Num. accepted 407, acceptance probability = 0.41
Num. accepted 354, acceptance probability = 0.35
Num. accepted 376, acceptance probability = 0.38
Num. accepted 405, acceptance probability = 0.41
Num. accepted 370, acceptance probability = 0.37
Num. accepted 374, acceptance probability = 0.37
Num. accepted 345, acceptance probability = 0.34
Num. accepted 381, acceptance probability = 0.38
Num. accepted 371, acceptance probability = 0.37
Num. accepted 388, acceptance probability = 0.39
Num. accepted 405, acceptance probability = 0.41
Num. accepted 390, acceptance probability = 0.39
Num. accepted 363, acceptance probability = 0.36
Num. accepted 374, acceptance probability = 0.37
Num. accepted 393, acceptance probability = 0.39
Num. accepted 357, acceptance probability = 0.36
Num. accepted 397, acceptance probability = 0.40
Num. accepted 373, acceptance probability = 0.37
Num. accepted 370, acceptance probability = 0.37
Num. accepted 400, acceptance probability = 0.40
Num. accepted 321, acceptance probability = 0.32
Num. accepted 400, acceptance probability = 0.40
Num. accepted 396, acceptance probability = 0.40
Num. accepted 369, acceptance probability = 0.37
Num. accepted 388, acceptance probability = 0.39
Num. accepted 343, acceptance probability = 0.34
Num. accepted 413, acceptance probability = 0.41
Num. accepted 427, acceptance probability = 0.43
Num. accepted 326, acceptance probability = 0.33
Num. accepted 371, acceptance probability = 0.37
Num. accepted 368, acceptance probability = 0.37
Num. accepted 365, acceptance probability = 0.36
Num. accepted 361, acceptance probability = 0.36
Num. accepted 380, acceptance probability = 0.38
Num. accepted 364, acceptance probability = 0.36
Num. accepted 366, acceptance probability = 0.37
Num. accepted 361, acceptance probability = 0.36
Num. accepted 404, acceptance probability = 0.40
Num. accepted 373, acceptance probability = 0.37
Num. accepted 373, acceptance probability = 0.37
Num. accepted 338, acceptance probability = 0.34
Num. accepted 368, acceptance probability = 0.37
Num. accepted 386, acceptance probability = 0.39
Num. accepted 374, acceptance probability = 0.37
Num. accepted 381, acceptance probability = 0.38
Num. accepted 357, acceptance probability = 0.36
Num. accepted 388, acceptance probability = 0.39
Num. accepted 393, acceptance probability = 0.39
Num. accepted 375, acceptance probability = 0.38
Num. accepted 397, acceptance probability = 0.40
Num. accepted 360, acceptance probability = 0.36
Num. accepted 357, acceptance probability = 0.36
Num. accepted 379, acceptance probability = 0.38
Num. accepted 360, acceptance probability = 0.36
Num. accepted 378, acceptance probability = 0.38
Num. accepted 384, acceptance probability = 0.38
Num. accepted 361, acceptance probability = 0.36
Num. accepted 360, acceptance probability = 0.36
Num. accepted 370, acceptance probability = 0.37
Num. accepted 345, acceptance probability = 0.34
Num. accepted 396, acceptance probability = 0.40
Num. accepted 376, acceptance probability = 0.38
Num. accepted 367, acceptance probability = 0.37
Num. accepted 395, acceptance probability = 0.40
Num. accepted 362, acceptance probability = 0.36
Num. accepted 370, acceptance probability = 0.37
Num. accepted 384, acceptance probability = 0.38
Num. accepted 364, acceptance probability = 0.36
Num. accepted 361, acceptance probability = 0.36
Num. accepted 380, acceptance probability = 0.38
Num. accepted 394, acceptance probability = 0.39
Num. accepted 380, acceptance probability = 0.38
Num. accepted 386, acceptance probability = 0.39
Num. accepted 340, acceptance probability = 0.34
Num. accepted 350, acceptance probability = 0.35
Num. accepted 370, acceptance probability = 0.37
Num. accepted 376, acceptance probability = 0.38
Num. accepted 368, acceptance probability = 0.37
Num. accepted 377, acceptance probability = 0.38
Num. accepted 377, acceptance probability = 0.38

Plotting the output

Now plot the output chains, making sure to not include the burn-in points. (A nice python module for this is triangle.py). On the plot we'll include the parameters of the line that we simulated and the maximum posterior value that we recover from our joint pdf.


In [6]:
fig = pl.figure(figsize=(8,7), dpi=150)

# convert samples list into numpy array
nsamples = np.array(samples)

# get the maximum of the m posterior
imax = np.argmax(logl)

# plot the marginalised posterior of m, p(m|d,I)
pl.subplot(2,2,1)
pl.hist(nsamples[-Ns:,0], bins=20, normed=True, histtype='stepfilled', alpha=0.3)
ax = pl.gca()
pl.plot([m, m], [0., ax.get_ylim()[1]], 'k--', lw=2, label='True') # plot true value of m
pl.plot([nsamples[imax,0], nsamples[imax,0]], [0., ax.get_ylim()[1]], 'g--', lw=2, label='Max. pdf') # plot maximum posterior value
pl.legend(loc='best', fancybox=True, framealpha=0.3, prop={'size': 14})
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$p(m|d,I)$', fontsize=16)

# plot the marginalised posterior of c, p(c|d,I)
pl.subplot(2,2,4)
pl.hist(nsamples[-Ns:,1], bins=20, normed=True, histtype='stepfilled', alpha=0.3)
ax = pl.gca()
pl.plot([c, c], [0., ax.get_ylim()[1]], 'k--', lw=2) # plot true value of c
pl.plot([nsamples[imax,1], nsamples[imax,1]], [0., ax.get_ylim()[1]], 'g--', lw=2) # plot maximum posterior value
pl.xlabel('$c$', fontsize=16)
pl.ylabel('$p(c|d,I)$', fontsize=16)

# plot the joint posterior of m and c, p(m,c|d,I) (just use the samples, athough you could create a contour plot)
pl.subplot(2,2,3)
pl.plot(nsamples[-Ns:,0], nsamples[-Ns:,1], 'b.', ms=1, alpha=0.1)
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$c$', fontsize=16)
pl.tight_layout()


Comparison with least squares fit

Compare these maximum posterior values, and the covariance of the parameters, with a standard least squares fit output.


In [8]:
# get the least squares fit values for m and c
mfit, cfit = np.linalg.lstsq(np.vstack([x, np.ones(len(x))]).T, y)[0]

print('Maximum posterior: m = %.2f, c = %.2f' % (nsamples[imax,0], nsamples[imax,1]))
print('Least squares fit: m = %.2f, c = %.2f' % (mfit, cfit))

# least squares (co)variance
denom = (len(x)*np.sum(x**2)-np.sum(x)**2)
lsmvar = sig**2*len(x)/denom
lscvar = sig**2*np.sum(x**2)/denom
lscov = -sig**2*np.sum(x)/denom

# covariance from MCMC posterior
mcmccov = np.cov(nsamples[-Ns:].T)

print('var[m] = %.2f, var[c] = %.2f, cov[m,c] = %.2f' % (mcmccov[0,0], mcmccov[1,1], mcmccov[0,1]))
print('Least squares: var[m] = %.2f, var[c] = %.2f, cov[m,c] = %.2f' % (lsmvar, lscvar, lscov))


Maximum posterior: m = 1.04, c = 2.38
Least squares fit: m = 1.04, c = 2.38
var[m] = 0.01, var[c] = 0.24, cov[m,c] = -0.04
Least squares: var[m] = 0.01, var[c] = 0.25, cov[m,c] = -0.04

Credible intervals

We might also want to get, say, 95% credible intervals on our parameters. We will use a greedy binning algorithm to get the 95% credible intervals from a histogram of the data.


In [11]:
# a function to get the credible intervals using a greedy binning method
def credible_interval(dsamples, ci):
    n, binedges = np.histogram(dsamples, bins=100)
    dbins = binedges[1]-binedges[0] # width of a histogram bin
    bins = binedges[0:-1]+dbins/2. # centres of bins

    histIndices=np.argsort(n)[::-1]  # indices of the points of the histogram in decreasing order

    frac = 0.0
    j = 0
    for i in histIndices:
        frac += float(n[i])/float(len(dsamples))
        j = j+1
        if frac >= ci:
            break

    return (np.min(bins[histIndices[:j]]), np.max(bins[histIndices[:j]]))
        
# get 95% credible interval for m
minterval = credible_interval(nsamples[-Ns:,0], 0.95)

# get 95% credible interval for c
cinterval = credible_interval(nsamples[-Ns:,1], 0.95)

# now reproduce the above plot with these 95% confidence intervals included
# define a function to produce a patch to fill in the interval
def intervalpatch(binedges, nbin, interval):
    lowi = np.arange(len(binedges))[binedges>interval[0]][0]
    highi = np.arange(len(binedges))[binedges<interval[1]][-1]
    binv = [interval[0]]
    nv = [nbin[lowi-1]]
    for j in range(lowi, highi+1):
        binv.append(binedges[j])
        binv.append(binedges[j])
        nv.append(nbin[j-1])
        nv.append(nbin[j])
    binv.append(interval[1])
    nv.append(nbin[highi])
    
    return binv, nv

fig = pl.figure(figsize=(6,8), dpi=150)

pl.subplot(2,1,1)
n, binedges, patch = pl.hist(nsamples[-Ns:,0], bins=25, normed=True, histtype='step')

ax = pl.gca()
pl.plot([m, m], [0., ax.get_ylim()[1]], 'k--', lw=2, label='True') # plot true value of m
pl.plot([nsamples[imax,0], nsamples[imax,0]], [0., ax.get_ylim()[1]], 'g--', lw=2, label='Max. pdf') # plot maximum posterior value

# get the patch positions for filling confidence intervals
binv, nv = intervalpatch(binedges, n, minterval)

# fill in the confidence intervals
pl.fill_between(binv, nv, np.zeros(len(nv)), facecolor='green', alpha=0.5, edgecolor='none')
#pl.legend(loc='best', fancybox=True, framealpha=0.3, prop={'size': 14})
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$p(m|d,I)$', fontsize=16)
pl.ylim(0., ax.get_ylim()[1])

# plot the marginalised posterior of c, p(c|d,I)
pl.subplot(2,1,2)
n, binedges, patch = pl.hist(nsamples[-Ns:,1], bins=25, normed=True, histtype='step')
binmids = binedges[:-1]+(binedges[1]-binedges[0])/2.
ax = pl.gca()
pl.plot([c, c], [0., ax.get_ylim()[1]], 'k--', lw=2) # plot true value of c
pl.plot([nsamples[imax,1], nsamples[imax,1]], [0., ax.get_ylim()[1]], 'g--', lw=2) # plot maximum posterior value

# get the patch positions for filling confidence intervals
binv, nv = intervalpatch(binedges, n, cinterval)

# fill in the confidence intervals
pl.fill_between(binv, nv, np.zeros(len(nv)), facecolor='green', alpha=0.5, edgecolor='none')
pl.xlabel('$c$', fontsize=16)
pl.ylabel('$p(c|d,I)$', fontsize=16)
pl.ylim(0., ax.get_ylim()[1])

pl.tight_layout()