CosmoMC
by Antony Lewisis a specialization of Metropolis-Hastings
In general, this is not obviously an improvement to proposing changes to all $\theta$ simultaneously.
But, something interesting happens if the fully conditional likelihood and prior are conjugate - then we know the conditional posterior exactly. If we use independent samples of the conditional posterior as proposals, then the Metropolis-Hastings acceptance ratio becomes
$\frac{P(y)Q(x)}{P(x)Q(y)} = \frac{P(y)P(x)}{P(x)P(y)} = 1$
and every proposal is automatically accepted!
Suppose we have some data, $x_i$, that we're modelling as being generated by a common normal (aka Gaussian) distribution.
$P(x_i|\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\,\exp-\frac{(x_i-\mu)^2}{2\sigma^2}$
The model parameters are the mean, $\mu$, and variance, $\sigma^2$.
Consulting the Wikipedia, we see that we can Gibbs sample this posterior using two steps:
Remeber that using conjugacy limits our options for choosing priors!
A less specialized modification of Metropolis-Hastings is slice sampling. This method also avoids having to manually tune a proposal scale, although it usually involves a rejection step.
Given a starting point $P(\theta_0)$, we uniformly choose a probability $q\leq P(\theta_0)$. This defines a slice where $P(\theta)\geq q$, and we sample the next $\theta$ uniformly within the slice.
See also Neal 2003
start with position x
evaluate p = P(x)
guess a width W
while we want samples
draw q from Uniform(0,p)
choose L,R: R-L=W and L<=x<=R
while P(L) > q, set L = L - W
while P(R) > q, set R = R + W
loop forever
draw y from Uniform(L,R)
if P(y) < q,
if y < x, set L = x1
otherwise, set R = x1
otherwise, break
set x = x1 and record x
The simple Metropolis implementation we tinkered with in the Basic Monte Carlo chunk can be improved on in a number of ways.
choose a starting point and a proposal scale
while we want more samples
propose a new point from a scaled Gaussian
centered on our current position
accept/reject the new point
Degeneracies can usually be reduced or eliminated by reparametrizing the model.
If changes to some parameters require much more costly calculations than others, straightforward diagonalization may not be optimal in practice.
Unless we know what to do a priori, the best information for how to tune proposal lengths and directions comes from running test chains started from different positions and using the information they build up about the posterior.
Parallel, communicating chains (e.g. using MPI) that tune themselves this way can vastly decrease the time to convergence compared with single chains.
Strictly speaking, tuning the proposal distribution on the fly breaks the rules of MCMC (specifically, detailed balance).
Consider the function $[P(x)]^{1/T}$, where $P(x)$ is the target PDF.
With parallel tempering, we run one chain with $T=1$ and several more chains with $T>1$. A modified Metropolis-Hastings update occasionally allows the chains to exchange positions, giving the $T=1$ chain a mechanism for sampling regions of parameter space it might otherwise have low probability of proposing. Samples from the $T=1$ chain can be used for inference.
This sandbox notebook contains code for a simple Metropolis sampler and 4 PDFs:
Choose one of the speed-ups above (something that sounds implementable in a reasonable time) and modify/replace the sampler to use it on one or more of these PDFs. Compare performance (burn-in length, acceptance rate, etc.) with simple Metropolis. (Apart from PDF 4, chances are that nothing you do will work brilliantly, but you should see a difference!)
For Gibbs sampling, interpret PDF 4 as a likelihood function and put some thought into how you define your priors. Otherwise, just take the given PDF to be the posterior distribution.