This example demonstrates how to use nested rejection sampling [1] to sample from the posterior distribution for a logistic model fitted to model-simulated data.
Nested sampling is the craziest way to calculate an integral that you'll ever come across, which has found widespread application in physics. The idea is based upon repeatedly partitioning the prior density to a given area of parameter space based on likelihood thresholds. These repeated partitions form sort of Matryoshka dolls of spaces, where the later surfaces are "nested" within the earlier ones. The space between the Matryoshka volumes constitutes "shells", whose volume can itself be approximated. By summing the volumes of these shells, the marginal likelihood can be calculated. It's bonkers, but it works. It works especially well for multimodal distributions, where traditional methods of calculating the marginal likelihood fail. As a very useful bi-product of nested sampling, posterior samples can be produced by importance sampling.
[1] "Nested Sampling for General Bayesian Computation", John Skilling, Bayesian Analysis (2006) https://projecteuclid.org/download/pdf_1/euclid.ba/1340370944.
First create fake data.
In [1]:
import pints
import pints.toy as toy
import numpy as np
import matplotlib.pyplot as plt
# Load a forward model
model = toy.LogisticModel()
# Create some toy data
r = 0.015
k = 500
real_parameters = [r, k]
times = np.linspace(0, 1000, 100)
signal_values = model.simulate(real_parameters, times)
# Add independent Gaussian noise
sigma = 10
observed_values = signal_values + pints.noise.independent(sigma, signal_values.shape)
# Plot
plt.plot(times,signal_values,label = 'signal')
plt.plot(times,observed_values,label = 'observed')
plt.xlabel('Time')
plt.ylabel('Values')
plt.legend()
plt.show()
Create the nested sampler that will be used to sample from the posterior.
In [2]:
# Create an object with links to the model and time series
problem = pints.SingleOutputProblem(model, times, observed_values)
# Create a log-likelihood function (adds an extra parameter!)
log_likelihood = pints.GaussianLogLikelihood(problem)
# Create a uniform prior over both the parameters and the new noise variable
log_prior = pints.UniformLogPrior(
[0.01, 400, sigma * 0.5],
[0.02, 600, sigma * 1.5])
# Create a nested ellipsoidal rejectection sampler
sampler = pints.NestedController(log_likelihood, log_prior, method=pints.NestedRejectionSampler)
# Set number of iterations
sampler.set_iterations(3000)
# Set the number of posterior samples to generate
sampler.set_n_posterior_samples(300)
Run the sampler!
In [3]:
samples = sampler.run()
print('Done!')
In [4]:
# Plot output
import pints.plot
pints.plot.histogram([samples], ref_parameters=[r, k, sigma])
plt.show()
In [5]:
vTheta = samples[0]
pints.plot.pairwise(samples, kde=True)
plt.show()
In [6]:
pints.plot.series(samples[:100], problem)
plt.show()
In [7]:
print('marginal log-likelihood = ' + str(sampler.marginal_log_likelihood())
+ ' ± ' + str(sampler.marginal_log_likelihood_standard_deviation()))
With PINTS we can access the segments of the discretised integral, meaning we can plot the function being integrated.
In [8]:
v_log_likelihood = sampler.log_likelihood_vector()
v_log_likelihood = v_log_likelihood[:-sampler._sampler.n_active_points()]
X = sampler.prior_space()
X = X[:-1]
plt.plot(X, v_log_likelihood)
plt.xlabel('prior volume enclosed by X(L) > L')
plt.ylabel('log likelihood')
plt.show()
At each step of the nested sampling algorithm, the point with the lowest likelihood is discarded (and inactivated) and a new active point is drawn from the prior, with the restriction of that its likelihood exceeds the discarded one. The likelihood of the inactived point essentially defines the height of a segment of the discretised integral for $Z$. Its width is approximately given by $w_i = X_{i-1}-X_{i+1}$, where $X_i = \text{exp}(-i / N)$ and $N$ is the number of active particles and $i$ is the iteration.
PINTS keeps track of active and inactive points at the end of the nested sampling run. The active points (orange) are concentrated in a region of high likelihood, whose likelihood always exceeds the discarded inactive points (blue).
In [9]:
m_active = sampler.active_points()
m_inactive = sampler.inactive_points()
f, axarr = plt.subplots(1,3,figsize=(15,6))
axarr[0].scatter(m_inactive[:,0],m_inactive[:,1])
axarr[0].scatter(m_active[:,0],m_active[:,1],alpha=0.1)
axarr[0].set_xlim([0.008,0.022])
axarr[0].set_xlabel('r')
axarr[0].set_ylabel('k')
axarr[1].scatter(m_inactive[:,0],m_inactive[:,2])
axarr[1].scatter(m_active[:,0],m_active[:,2],alpha=0.1)
axarr[1].set_xlim([0.008,0.022])
axarr[1].set_xlabel('r')
axarr[1].set_ylabel('sigma')
axarr[2].scatter(m_inactive[:,1],m_inactive[:,2])
axarr[2].scatter(m_active[:,1],m_active[:,2],alpha=0.1)
axarr[2].set_xlabel('k')
axarr[2].set_ylabel('sigma')
plt.show()
In nested sampling, we can apply importance sampling to the inactivated points to generate posterior samples. In this case, the weight of each inactive point is given by $w_i \mathcal{L}_i$, where $\mathcal{L}_i$ is its likelihood. Since we use importance sampling, we can always generate an alternative set of posterior samples by re-applying this method.
In [10]:
samples_new = sampler.sample_from_posterior(1000)
pints.plot.pairwise(samples_new, kde=True)
plt.show()