Approximate Bayesian computation (ABC) and so called "likelihood free" Markov chain Monte Carlo techniques are popular methods for tackling parameter inference in scenarios where the likelihood is intractable or unknown. These methods are called likelihood free as they are free from any assumption about the form of the likelihood, as ABC aims to simulate samples from the parameter posterior distribution directly. In traditional MCMC approaches the target distribution is the posterior distribution of interest and in practice our estimate of this pdf is approximate due to finite sampling time resulting in a correlated chain which we hope has converged. ABC methods are also approximate in the sense that samples are generated from trial distributions which we hope are close to the real posterior of interest. The wikipedia page on ABC has a good introduction to the topic.
The simplest ABC algorithm is rejection sampling. Given a set of parameters, $\theta$, with associated priors, $\pi(\theta)$ and a forward simulated model for the data,
$\pi(D|\theta)$.
We can simulate from the posterior distribution, $P(\theta|D)$, by first drawing sample parameters
$\theta^* \sim \pi(\theta)$,
then simulating a dataset with these parameters
$D^* \sim \pi(D|\theta^*)$.
In a simple rejection sampling algorithm, we reject $D^*$ unless it matches the true data, $D$. For discrete data this algorithm would not be practical as many simulated samples would be rejected until an exact match is found. In practice we make an approximation and accept simulated datasets which are "close" to the true data. This introduces the idea of a distance metric and tolerance level in ABC. We accept proposed parameters $\theta^*$, if
$\rho(D^* - D) <\epsilon$
where $\rho$ is the distance metric, which could be e.g. the Euclidean norm $||D^* - D||$, and $\epsilon$ is a tolerance threshold. This procedure produces samples from
$P(\theta | \rho(D^*-D)<\epsilon)$
which will be a good approximation of the true posterior if $\epsilon$ is small.
The tolerance threshold in ABC controls which of the proposed parameters are accepted given the distance metric. There are two considerations in choosing this threshold. If the tolerance is too high then too many proposed parameters are accepted and the prior distribution dominates the results e.g. if the tolerance level is infinity then we would just recover the prior distribution from the algorithm. If the tolerance level is too low then the sampler is very inefficient with many proposed points being rejected. A compromise is to select a set of decreasing tolerance levels where for the initial iterations in the algorithm we accept points in parameter space which do not represent the data with high accuracy but as the algorithm progresses the tolerance level decreases and our estimate of the true posterior distribution improves.
In many cases it may be simpler to work with some lower dimension summary statistic of the data, $S(D)$, rather then the full dataset. In this case the chosen statistic needs to be a so-called sufficient statistic in that any information about the parameter of interest which is contained in the data, is also contained in the summary statistic. More formally a statistic $S(D)$ is sufficient for $\theta$ if the distribution $P(D|S(D))$ does not depend on $\theta$. This requirement ensures that in summarizing the data we have not thrown away constraining information about $\theta$.
Rather than drawing candiates $\theta^*$, one at a time, we can speed up the ABC algorithm by working with large pools of candidates, called particles, simultaneously. At each stage of the algorithm the particles are perturbed and filtered using the distance metric, and eventually this pool of particles move closer and closer to simulating from the desired posterior distribution. This approach is known as Sequential Monte Carlo or Particle Monte Carlo sampling.
Outline of the ABC SMC algorithm:
Different ABC SMC algorithms can be distinguished by how sampling weights are assigned to the particles in the pool. In order to perturb and filter the particles we need a transition kernel. The transition kernel serves the same purpose as the proposal distribution in a standard MCMC algorithm. The transition kernel specifies the distribution of a random variable that will be added to each particle to move it around in the parameter space. For more details on this please see Beaumont et al 2009.
In [ ]: