Classically, one ensemble, prior sampling, likelihood weighting : posterior estimates
posterior $\propto$ likelihood * prior
p( x | o ) $\propto$ p( o | x ) * b(x)
in the Gaussian case: $$ p( o | x ) \propto e^{-0.5 (\frac{x-o}{ \sigma_o}) ^2}$$
Basic procedure:
Or alternatively, instead of carrying weights, do some resampling
The main problem of such an approach, is that if the prior belief is too far off the posterior, the weights may all equal zero, or be concentrated on a few samples only, making a poor approximation of the PDF. The problem is especially acute when exploring a high-dimensional parameter space (since there is an exponential relationship between the number of parameters and the degrees of freedom).
The idea behind iterative importance sampling is to repeat such weighting/resampling procedure iteratively using intermediate, or bridging distributions, so that the sequence of distributions sampled by the particles at each step smoothly converges toward the posterior (this has much bearing with Monte Carlo Markov Chains, except that
here we work with an ensemble of particles at each iteration step). A challenge will be do design the weighting to avoid an over-concentration of weights, a problem sometimes called ensemble collapse or degenerescence.
Annan and Hargreaves (2010) proposed an iterative importance sampling (IIS) scheme where the weighting function is a scaled version of the posterior, via an exponent $\epsilon$ (0.05 by default). Moreover, each weighting/resampling step will be counterbalanced by addition of noise, or jitter. The noise is chosen as a (multivariate) normal approximation of the ensemble, but with variance scaled down by the same factor $\epsilon$.
Each iteration, two steps:
In the Gaussian case, these two steps correspond to the folling operations on the PDF of any proposal distribution $\phi$:
It is not too difficult to prove that if one first sample the model parameters from a proposal distribution $g$,
the ensemble will follow at each time step $i$ the distribution:
where $\alpha_i$ is a series that verifies:
$$\alpha_{i+1} = \frac{\alpha_i + \epsilon}{1 + \epsilon}$$with $\alpha_0 = 0$. The series converge and their limits are:
$$\alpha_i \rightarrow 1$$$$\phi_i \rightarrow f = p( x | o )$$The following part of this notebook presents a simple example for
the convergence of a series of distributions with successive
weighting and jittering (variance increase) steps. The iis module
does not much more than providing a simple user interface including
multivariate models, and provides a couple of handy functions for
plotting and diagnostics. Of course, iis module could be extended with new
methods etc...
In [ ]:
import sys
import numpy as np
from scipy.stats import norm, uniform, lognorm
from iis.resampling import multinomial_resampling, residual_resampling # , stratified_sampling
In [ ]:
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++
# Initialization
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++
# ensemble size
size = 500
# posterior distribution (the truth)
posterior = norm(loc=0, scale=1)
#posterior = lognorm(1) # log-normal distribution : test deviation from theoretical gaussian case
# proposal distribution from which to start the process (no prior here)
proposal = uniform(loc=0, scale=100)
# initialize (or reset) ensemble and a few other things
ensemble = proposal.rvs(size=size)
ensembles = [] # initialize ensembles
alpha = 0
alphas = []
iteration = 0
In [ ]:
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
# estimation (can be executed as many times as needed)
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
# algorithm parameters
epsilon = 0.05 # exponent
# number of iterations
iterations = 60
# sampling method (important !)
#
# - random_walk : otherwise called multinomial sampling : non-deterministic
# - residual_sampling : partly deterministic (main 60% or so just copied, then random walk through residual weights)
# - stratified_sampling : NOT IMPLEMENTED (used by Annan and Hargreave, 2010).
# - deterministic_sampling : NOT IMPLEMENTED starts like residual sampling, but aggregate residual weights into
# larger particules (called support particles) according to a recursive partition of space, along a user-input
# preferred sub-space. See Li et al (2012) in Signal Processing, doi:10.1016/j.sigpro.2011.12.019
#
resample_ensemble = residual_resampling
for iteration in xrange(iterations):
ensembles.append(ensemble) # ensemble before update, for the record
alphas.append(alpha)
# weight the samples according to a scaled version of the likelihood
weights = np.exp(posterior.logpdf(ensemble)*epsilon)
weights /= weights.sum() # normalized weights
# resample according to weights
ids = resample_ensemble(weights, ensemble.size)
ensemble = ensemble[ids] # new ensemble
# add jitter
variance = np.var(ensemble) # after resampling
jitter = np.random.normal(loc=0, scale=np.sqrt(variance*epsilon), size=ensemble.size)
ensemble += jitter
# diagnostic only: update alpha to check convergence properties
# supposedly sampled distribution will be: proposal**(1-alpha) * posterior**alpha
alpha = (alpha + epsilon)/(1+epsilon)
#if alpha > 0.99999: break
In [ ]:
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
# check results
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
%matplotlib inline
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, sharex=True)
bins = 10
mi, ma = -20, 20
x = np.linspace(mi, ma, 1000)
# PDF
ax = axes[0]
ax.hist(ensemble, bins, histtype='step', label='result', normed=True)
ax.plot(x, posterior.pdf(x), label='truth')
ax.legend(frameon=False)
ax.set_title("Histogram (empirical PDF)")
# CDF
ax = axes[1]
ax.hist(ensemble, bins, histtype='step', label='result', normed=True, cumulative=True)
ax.plot(x, posterior.cdf(x), label='true posterior')
ax.legend(frameon=False)
ax.set_title("Empirical cumulative distribution function (CDF))")
fig.set_figwidth(17)
ax.set_xlim([mi, ma])
In [ ]:
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
# check convergence properties
#+++++++++++++++++++++++++++++++++++++++++++++++++++++
%matplotlib inline
import matplotlib.pyplot as plt
pct = [50, 5, 95, 16, 84]
n = len(ensembles)
data = np.empty((n, len(pct)))
for i, ens in enumerate(ensembles):
data[i] = np.percentile(ens, pct)
x = np.arange(n)
f90 = plt.fill_between(x, data[:, 1], data[:, 2], alpha=0.2)
f67 = plt.fill_between(x, data[:, 3], data[:, 4], alpha=0.5)
plt.plot(x, data[:, 0], label='model')
# add truth
mi, ma = plt.xlim()
truth = posterior.ppf(np.array(pct)/100.) # quantiles
plt.hlines(truth[0], mi, ma, linestyle='-', label='truth', color='red')
plt.hlines(truth[[1,2]], mi, ma, linestyle=':', color='red')
plt.hlines(truth[[3,4]], mi, ma, linestyle='--', color='red')
plt.legend(frameon=False)
plt.xlim([mi, ma])
plt.xlabel("Iteration number")
plt.ylabel("Quantiles")
# also plot alpha
fig = plt.figure()
plt.plot(x, alphas, label='alpha')
plt.legend(frameon=False, loc="upper left")
plt.title('Alpha value to check convergence')
plt.grid()
fig.set_figheight(3)
plt.ylim([0,1])
plt.xlabel("Iteration number")