Codes for some of these are in our list of MCMC packages.
(Also sometimes called Hybrid Monte Carlo)
While standard MCMC is analogous to the evolution of a thermodynamic system, HMC is (almost) analogous to the evolution of a single particle. Consider our free parameters as coordinates of a position, $\theta$, and minus the log-posterior as a potential energy, $U(\theta)$.
HMC introduces momentum parameters, $\phi$, corresponding to each position parameter, and an associated "kinetic energy",
$K(\phi) = \sum_i \frac{\phi_i^2}{2m_i}$.
Looking at the analogy the other way, the probability associated with $K(\phi)$
$p(\phi) \propto e^{-K(\phi)}$
is a multivariate Gaussian with mean zero and a diagonal covariance given by the "masses".
The HMC algorithm alternates Gibbs samples of $\phi|\theta$ with joint updates of $\phi$ and $\theta$.
Note that this procedure requires derivatives of the log-posterior, unlike standard MCMC methods.
There's a lot more literature on HMC and its optimization than we can cover - see e.g. this chapter. In a nutshell, the introduction of "momentum" into the evolution of the chain is supposed to reduce the random-walk behavior (autocorrelation) of traditional MCMC.
What's the catch?
To evolve the state, we need the gradient of the log-posterior (the "force"). If this can be computed analytically, great. If we need to do it numerically, it isn't obvious that HMC will be more efficient than standard MCMC.
This is a class of methods that evolve an ensemble of states rather than a single state. After convergence, the ensemble can be regarded as a set of samples from the target distribution.
This approach provides some of the benefits of running multiple chains - but remember that these are not independent chains!
The currently fashionable variant is coded in a python
package hilariously called emcee
, and implements the evolution rule proposed by Goodman and Weare (2010).
The algorithm for moving each point in the ensemble is:
Note that there is some magic in the density $g$. We are not free to choose just any function.
This algorithm is relatively easy to use - there is no tuning required and it's straightforward to parallelize.
Important cautions:
Recall the essence of MC integration: $\int w(x)\,p(x)\,dx = \int w(x)\,dP(x) \approx \overline{w(x_i)}; ~ x_i\sim P$.
We used this (way back) to set up Simple Monte Carlo, where $p$ is the prior and $w$ is the likelihood.
The key to SMC is that any set of points can provide an unbiased estimate of an integral, as long as the weighting is correct.
PMC is a more refined version of this - instead of sampling from the prior, we iteratively build up an approximation to the posterior distribution, called the generating distribution, along with a number of samples.
Each iteration consists of
Clearly, getting this to converge to the target quickly will require a clever selection of $q$.
A nice feature of PMC is that the generating distribution can depend on the previous crops of points while still producing unbiased results, thanks to the explicit re-weighting.
With a suitably intelligent implementation (e.g. see this), the generating distribution can be evolved towards the posterior distribution, providing samples for a posteriori analysis as well as an estimate for the evidence.
This is a particular form of PMC that is primarily aimed at calculating the Bayesian evidence, the integral of the likelihood function over the prior, $\int p(D|\theta)\,p(\theta)d\theta$.
We begin with a large number of points sampled from the prior, and gradually evolve them towards higher-likelihood positions, keeping track of how the volume filled changes.
The evidence can then be calculated as a numerical approximation of $\int_0^1 L(V)\,dV$ (assuming the prior volume is normalized to 1).
By virtue of spamming the parameter space with points at the start, nested sampling is likely (though never guaranteed) to find multiple modes, if they exist (same for PMC).
The computational challenge here is to maximize the efficiency of moving a given point to a position of higher likelihood. Math is involved - see e.g. the MultiNest
paper.