Here we will set up a simple MCMC code to estimate the parameters of a straight line: $y=mx+c$.
We will simulate some data containing a line with parameters $m=1$ and $c=2.5$, and Gaussian noise ($\sim N(0, 2.5^2)$).
In [1]:
%matplotlib inline
import matplotlib.pyplot as pl
# import numpy
import numpy as np
# set the x-axis linearly spaced between 0 and 10 in 100 steps
x = np.linspace(0, 10, 100)
#x = np.linspace(-5,5,100)
# create our simulate line
m = 1.
c = 2.5
# a function defining our line model
def linemodel(m,c,x):
return m*x + c
line = linemodel(m,c,x)
# create some noise to add to our line
sig = 2.5 # the noise standard deviation
noise = sig*np.random.randn(100)
# create data
y = line + noise
# let's plot the data too
pl.plot(x, y, 'bo', x, line, 'k--')
Out[1]:
Now we need to define our likelihood function. The noise in the data is Gaussian, so we can use a Gaussian likelihood function. Due to numerical precision issues we will work with natural logarithms of the likelihood function (the likelihood value could become very, very small).
Note that in the likelihood function we do not include the normalisation factor of the Gaussian distribution. This is because the MCMC just works with probability ratios, so the normalisation would cancel out.
In [2]:
# a Gaussian likelihood function given some data and a model for that data,
# and a known noise standard deviation sigma
def loglikelihood(data, model, sigma):
return -np.sum((data-model)**2)/(2.*sigma**2)
Now we need to define our priors on the parameters $m$ and $c$. We will use flat priors for both parameters over a given range, such that: $$ p(m|I) = \begin{cases} 1/10, -5 \le m \le 5 \\ 0~{\rm otherwise} \end{cases} $$ and $$ p(c|I) = \begin{cases} 1/20, -10 \le c \le 10 \\ 0~{\rm otherwise} \end{cases} $$ Note that in the MCMC below we will just use the extent of these prior ranges rather than the actual values, as (again) we are just interested in probability ratios, so these (flat) prior values would just cancel. If not using flat priors then they may need to be included in forming the posterior probability ratio.
In [3]:
ranges = [[-5., 5.], # the range of allowed values for m
[-10., 10.]] # the range of allowed values for c
# create a function that is true if a point is within the above ranges
def priorfunc(sample):
for i in range(len(sample)):
if sample[i] <= ranges[i][0] or sample[i] >= ranges[i][1]:
return 0
return 1
We now set up some parameters for the MCMC:
In [4]:
# the number of burn-in points
Nb = 50000
# the number of sample points
Ns = 50000
# the total number of points
N = Nb + Ns
# initial proposal widths for m and c (use a best guess!)
cov = [[1., 0.], [0., 1.]] # initial covariance matrix of proposal (uncorrelated)
propscale = 2.5 # a scale factor for the proposal covariance (to try and get an acceptance probability of ~40%)
# set the number point to use in recalculating the proposal width during burn-in
Nprop = 1000
# create a random sample start point
samples = [[ranges[0][0] + np.diff(ranges[0])[0]*np.random.rand(), ranges[1][0] + np.diff(ranges[1])[0]*np.random.rand()]]
# get the log likelihood for this point
logl = [loglikelihood(y, linemodel(samples[0][0], samples[0][1], x), sig)]
# a counter to see the acceptance ratio as the MCMC progresses
acc = 0
Now we'll start our MCMC! for our Metropolis-Hastings ratio that determines acceptance of a new point (where the ratio of proposal probabilities is 1 as we are using symmetric proposals).
In [5]:
# use a for loop to run the MCMC
for i in range(1, N):
# sample a new point in m and c from the proposal distribution
newsample = np.random.multivariate_normal(samples[i-1], cov)
# check if the point is within the prior range
if not priorfunc(newsample):
# reject the new point (i.e. add the current point on to the chain)
samples.append(samples[i-1])
logl.append(logl[i-1])
continue
# get the log likelihood of the new point
loglnew = loglikelihood(y, linemodel(newsample[0], newsample[1], x), sig)
# get the Metropolis-Hastings ratio (the ratio of proposals will just be 1)
logr = loglnew - logl[i-1] # note in log space the ratio is a difference
# accept if the ratio is greater than 1, otherwise accept with probabiliy r
# in log-space this means accept if logr greater than 0
if logr > np.log(np.random.rand()):
# accepted!
acc = acc + 1
# add new sample onto chain
samples.append(list(newsample))
logl.append(loglnew)
else:
# rejected!
# add current sample onto chain
samples.append(samples[i-1])
logl.append(logl[i-1])
# print out current acceptance probability every Nprop points
if i%Nprop == 0:
print('Num. accepted %d, acceptance probability = %.2f' % (acc, acc/float(Nprop)))
acc = 0 # reset counter
# every Nprop points during burn-in recalculate the covariance matrix using the last
# Nprop points
if i < Nb and i%Nprop == 0:
cov = propscale*np.cov(np.array(samples[-Nprop:]).T)
Now plot the output chains, making sure to not include the burn-in points. (A nice python module for this is triangle.py). On the plot we'll include the parameters of the line that we simulated and the maximum posterior value that we recover from our joint pdf.
In [6]:
fig = pl.figure(figsize=(8,7), dpi=150)
# convert samples list into numpy array
nsamples = np.array(samples)
# get the maximum of the m posterior
imax = np.argmax(logl)
# plot the marginalised posterior of m, p(m|d,I)
pl.subplot(2,2,1)
pl.hist(nsamples[-Ns:,0], bins=20, normed=True, histtype='stepfilled', alpha=0.3)
ax = pl.gca()
pl.plot([m, m], [0., ax.get_ylim()[1]], 'k--', lw=2, label='True') # plot true value of m
pl.plot([nsamples[imax,0], nsamples[imax,0]], [0., ax.get_ylim()[1]], 'g--', lw=2, label='Max. pdf') # plot maximum posterior value
pl.legend(loc='best', fancybox=True, framealpha=0.3, prop={'size': 14})
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$p(m|d,I)$', fontsize=16)
# plot the marginalised posterior of c, p(c|d,I)
pl.subplot(2,2,4)
pl.hist(nsamples[-Ns:,1], bins=20, normed=True, histtype='stepfilled', alpha=0.3)
ax = pl.gca()
pl.plot([c, c], [0., ax.get_ylim()[1]], 'k--', lw=2) # plot true value of c
pl.plot([nsamples[imax,1], nsamples[imax,1]], [0., ax.get_ylim()[1]], 'g--', lw=2) # plot maximum posterior value
pl.xlabel('$c$', fontsize=16)
pl.ylabel('$p(c|d,I)$', fontsize=16)
# plot the joint posterior of m and c, p(m,c|d,I) (just use the samples, athough you could create a contour plot)
pl.subplot(2,2,3)
pl.plot(nsamples[-Ns:,0], nsamples[-Ns:,1], 'b.', ms=1, alpha=0.1)
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$c$', fontsize=16)
pl.tight_layout()
Compare these maximum posterior values, and the covariance of the parameters, with a standard least squares fit output.
In [8]:
# get the least squares fit values for m and c
mfit, cfit = np.linalg.lstsq(np.vstack([x, np.ones(len(x))]).T, y)[0]
print('Maximum posterior: m = %.2f, c = %.2f' % (nsamples[imax,0], nsamples[imax,1]))
print('Least squares fit: m = %.2f, c = %.2f' % (mfit, cfit))
# least squares (co)variance
denom = (len(x)*np.sum(x**2)-np.sum(x)**2)
lsmvar = sig**2*len(x)/denom
lscvar = sig**2*np.sum(x**2)/denom
lscov = -sig**2*np.sum(x)/denom
# covariance from MCMC posterior
mcmccov = np.cov(nsamples[-Ns:].T)
print('var[m] = %.2f, var[c] = %.2f, cov[m,c] = %.2f' % (mcmccov[0,0], mcmccov[1,1], mcmccov[0,1]))
print('Least squares: var[m] = %.2f, var[c] = %.2f, cov[m,c] = %.2f' % (lsmvar, lscvar, lscov))
We might also want to get, say, 95% credible intervals on our parameters. We will use a greedy binning algorithm to get the 95% credible intervals from a histogram of the data.
In [11]:
# a function to get the credible intervals using a greedy binning method
def credible_interval(dsamples, ci):
n, binedges = np.histogram(dsamples, bins=100)
dbins = binedges[1]-binedges[0] # width of a histogram bin
bins = binedges[0:-1]+dbins/2. # centres of bins
histIndices=np.argsort(n)[::-1] # indices of the points of the histogram in decreasing order
frac = 0.0
j = 0
for i in histIndices:
frac += float(n[i])/float(len(dsamples))
j = j+1
if frac >= ci:
break
return (np.min(bins[histIndices[:j]]), np.max(bins[histIndices[:j]]))
# get 95% credible interval for m
minterval = credible_interval(nsamples[-Ns:,0], 0.95)
# get 95% credible interval for c
cinterval = credible_interval(nsamples[-Ns:,1], 0.95)
# now reproduce the above plot with these 95% confidence intervals included
# define a function to produce a patch to fill in the interval
def intervalpatch(binedges, nbin, interval):
lowi = np.arange(len(binedges))[binedges>interval[0]][0]
highi = np.arange(len(binedges))[binedges<interval[1]][-1]
binv = [interval[0]]
nv = [nbin[lowi-1]]
for j in range(lowi, highi+1):
binv.append(binedges[j])
binv.append(binedges[j])
nv.append(nbin[j-1])
nv.append(nbin[j])
binv.append(interval[1])
nv.append(nbin[highi])
return binv, nv
fig = pl.figure(figsize=(6,8), dpi=150)
pl.subplot(2,1,1)
n, binedges, patch = pl.hist(nsamples[-Ns:,0], bins=25, normed=True, histtype='step')
ax = pl.gca()
pl.plot([m, m], [0., ax.get_ylim()[1]], 'k--', lw=2, label='True') # plot true value of m
pl.plot([nsamples[imax,0], nsamples[imax,0]], [0., ax.get_ylim()[1]], 'g--', lw=2, label='Max. pdf') # plot maximum posterior value
# get the patch positions for filling confidence intervals
binv, nv = intervalpatch(binedges, n, minterval)
# fill in the confidence intervals
pl.fill_between(binv, nv, np.zeros(len(nv)), facecolor='green', alpha=0.5, edgecolor='none')
#pl.legend(loc='best', fancybox=True, framealpha=0.3, prop={'size': 14})
pl.xlabel('$m$', fontsize=16)
pl.ylabel('$p(m|d,I)$', fontsize=16)
pl.ylim(0., ax.get_ylim()[1])
# plot the marginalised posterior of c, p(c|d,I)
pl.subplot(2,1,2)
n, binedges, patch = pl.hist(nsamples[-Ns:,1], bins=25, normed=True, histtype='step')
binmids = binedges[:-1]+(binedges[1]-binedges[0])/2.
ax = pl.gca()
pl.plot([c, c], [0., ax.get_ylim()[1]], 'k--', lw=2) # plot true value of c
pl.plot([nsamples[imax,1], nsamples[imax,1]], [0., ax.get_ylim()[1]], 'g--', lw=2) # plot maximum posterior value
# get the patch positions for filling confidence intervals
binv, nv = intervalpatch(binedges, n, cinterval)
# fill in the confidence intervals
pl.fill_between(binv, nv, np.zeros(len(nv)), facecolor='green', alpha=0.5, edgecolor='none')
pl.xlabel('$c$', fontsize=16)
pl.ylabel('$p(c|d,I)$', fontsize=16)
pl.ylim(0., ax.get_ylim()[1])
pl.tight_layout()