In [10]:
from IPython.display import Image
Gilad Amitai and Beau Coker
All code is available here: https://github.com/gldmt-duke/CokerAmitaiSGHMC
Hamiltonian Monte Carlo is an MCMC algorithm that lets us more efficiently sample a target distribution, using notions of a Hamiltonian system. Computation can be taxing since in order to do the time evolution, the gradient of the potential energy must be computed at each step. The Stochastic Gradient Hamiltonian Monte Carlo uses a subset of the data to compute the gradient, thus cutting down on computation, but producing a noisier estimate.
Input: Starting position $\theta^{(1)}$ and step size $\epsilon$
for $t=1,2,\dotsc$ do:
Resample momentum $r$
$\quad r^{(t)}\sim \mathcal{N}(0,M)$
$\quad (\theta_0,r_0) = (\theta^{(t)}, r^{(t)})$
Simulate Hamiltonian dynamics:
for $i=1$ to $m$ do:
$\quad \theta_i \leftarrow \theta_{i-1} + \epsilon_t M^{-1} r_{i-1}$
$\quad r_i \leftarrow r_{i-1} + \epsilon_t \nabla \tilde{U}(\theta_i) - \epsilon_t C M^{-1} r_{i-1} + \mathcal{N}(0, 2(C-\hat{B})\epsilon_t)$
end
$\quad (\theta^{(t+1)}, r^{(t+1)}) = (\theta_m,r_m)$
Where:
To understand Stochastic Gradient descent Hamiltonian Monte Carlo (SGHMC), we illustrate two modifications of Hamiltonian Monte Carlo (HMC):
For each of the four permutations of these two modifications, we show the performance of the algorithm on simulated data from a simple target distribution $U(\theta) = -2\theta^2 + \theta^4$. The gradient $\nabla U$ is calculated analytically, and we obtain a stochastic gradient $\nabla \tilde{U} = \nabla{U} + \mathcal{N}(0,4)$ by adding normally distributed noise.
We also show the performance of SGHMC, which does not include a Metropolis Hastings step, does include a stochastic gradient, and, importantly, includes a friction term that counteracts the effects of the noisy gradient.
As can be seen in the figure below, each algorithm successfully converges to the target distribution except the the HMC with the stochastic gradient and no Metropolis Hastings step. The introduction of the friction term is what separates this algorithm from SGHMC.
In [13]:
Image("../fig1.png")
Out[13]:
Next, for another simple system $U(\theta)=\frac{1}{2}\theta^2$, we show the samples of the momentum term $r$ and the parameter of interest $\theta$. In the figure, lighter colors indicate samples later in the simulation. The Hamiltonian samples, where energy is conserved, are near the orgin. Samples using the noisy hamiltonian spiral away from the origin. Adding the friction term keeps the posterior samples in a reasonable range.
In [12]:
Image("../fig2.png")
Out[12]:
In [ ]: