In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import emcee
emcee
In this post, I'll give a simple demo of using the Gelman-Rubin statistic (see this paper for an overview and alternatives to quantify convergance when using the emcee
software package. There are issues with this approach which remain to be addressed. The Gelman-Rubin statistic applies to cases where one runs multiple independent MCMC simulations. emcee
uses an ensemble sampler: n
parallel walkers are run and when jumping, the position of the other walkers is used to generate proposal stes (for details, you may want to read this). As such, the n
walkers are certainly not independent. I won't go into details of the issues this may cause here as I've not yet investigated it fully, but instead I will simply implement it naively and see if it works.
The initial setup of the problem follows this post with some minor modifications.
First, we generate some fake data to which we will fit a model;
In [2]:
# Model parameters
A = 2.3
f = 1
X0 = 2.5
# Noise parameter
sigma = 0.6
N = 500
t = np.sort(np.random.uniform(0, 10, N))
X = X0 + A * np.sin(2 * np.pi * f * t) + np.random.normal(0, sigma, N)
plt.plot(t, X, "o")
plt.xlabel("$t$")
plt.ylabel(r"$X_{\mathrm{obs}}$")
plt.show()
In [3]:
def Generic_lnuniformprior(theta, theta_lims):
""" Generic uniform priors on theta
Parameters
----------
theta : array_like
Value of the parameters
theta_lims: array_like of shape (len(theta), 2)
Array of pairs [min, max] for each theta paramater
"""
theta_lims = np.array(theta_lims)
if all(theta - theta_lims[:, 0] > 0 ) and all(theta_lims[:, 1] - theta > 0):
return np.prod(1.0/np.diff(theta_lims, axis=1))
return -np.inf
def Generic_lnlike(theta, t, x, model):
""" Generic likelihood function for signal in Gaussian noise
Parameters
----------
theta : array_like
Value of the parameters, the noise strength `sigma` should ALWAYS be
the last element in the list
t : array_like
The independant variable
x : array_like
The observed dependent variable
model : func
Signal model, calling `model(theta[:-1], t)` should
produced the corresponding value of the signal
alone `x_val`, without noise.
"""
sigma2 = theta[-1]**2
return -0.5*(np.sum((x-model(theta[:-1], t))**2 / sigma2 + np.log(2*np.pi*sigma2)))
def PlotWalkers(sampler, symbols=None, alpha=0.4, color="k", PSRF=None, PSRFx=None):
""" Plot all the chains from a sampler """
nwalkers, nsteps, ndim = sampler.chain.shape
fig, axes = plt.subplots(ndim, 1, sharex=True, figsize=(8, 9))
for i in range(ndim):
axes[i].plot(sampler.chain[:, :, i].T, color="k", alpha=alpha)
if symbols:
axes[i].set_ylabel(symbols[i])
if PSRF is not None:
PSRF = np.array(PSRF)
PSRFx = np.array(PSRFx)
ax = axes[i].twinx()
idx = np.argmin(np.abs(PSRF[:, i] - 1.1))
ax.plot(PSRFx[:idx+1], PSRF[:idx+1, i], '-r')
ax.plot(PSRFx[idx:], PSRF[idx:, i], '--r')
In [23]:
def model(theta, t):
A, f, X0 = theta
return X0 + A * np.sin(2 * np.pi * f * t)
lnlike = lambda theta, x, y: Generic_lnlike(theta, x, y, model)
# Tie the two together
def lnprob(theta, x, y):
""" Generic function ties together the lnprior and lnlike """
lp = lnprior(theta)
if not np.isfinite(lp):
return -np.inf
return lp + lnlike(theta, x, y)
# Prior
theta_lims = [[-100, 200],
[0.95, 1.05],
[0.0, 100.0],
[0.1, 1.3]
]
lnprior = lambda theta: Generic_lnuniformprior(theta, theta_lims)
Here, I follow the description on this blog for how to calculate the Potential Scale Reduction Factor (PSRF), a measure of the converegance. For PSRF < 1.2, Gelman & Rubin state the the chains have converged.
In practical terms, the sampler is run for a number of steps, the PSRF is calculated, and then the sampler is run again. Here I simply plot how this changes for each parameter. This becomes practically useful when some measuure of the convergence (say for example the maximum PSRF) is used to end a run prematurely (before the maximum number of steps).
First, here is a convienience function which computes the PSRF:
In [12]:
def get_convergence_statistic(i, sampler, nwalkers, convergence_length=10,
convergence_period=10):
s = sampler.chain[:, i-convergence_length+1:i+1, :]
within_std = np.mean(np.var(s, axis=1), axis=0)
per_walker_mean = np.mean(s, axis=1)
mean = np.mean(per_walker_mean, axis=0)
between_std = np.sqrt(np.mean((per_walker_mean-mean)**2, axis=0))
W = within_std
B_over_n = between_std**2 / convergence_period
Vhat = ((convergence_period-1.)/convergence_period * W
+ B_over_n + B_over_n / float(nwalkers))
c = Vhat/W
return i - convergence_period/2, c
Okay, now here is an example
In [29]:
ndim = 4
nwalkers = 100
nsteps = 400
# Initialise the walkers
walker_pos = [[np.random.uniform(tl[0], tl[1]) for tl in theta_lims]
for i in range(nwalkers)]
# Run the sampler
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=(t, X))
PSRF = []
PSRFx = []
for i, output in enumerate(sampler.sample(walker_pos, iterations=nsteps)):
idx, c = get_convergence_statistic(i, sampler, nwalkers,
convergence_length=10, convergence_period=10)
PSRF.append(c)
PSRFx.append(idx)
symbols = ["$A$", "$f$", "$X_{0}$", "$\sigma$"]
PlotWalkers(sampler, symbols=symbols, PSRF=PSRF, PSRFx=PSRFx)
plt.show()
The reality is that the convergence_period
and convergence_length
may need to be fine-tuned. And one has to decide how to intepret the four convergence tests. From simple experimentation, if the length is too long, the test provides evidence for convergence when visual inspection indicates they have not converged. It is likely this is because the Gelam & Rubin sta