"Variational Bayes"

This was written using a combination of at least the following sources:

We want to evaluate the posterior $p(z \mid x)$ given a prior $p(z)$, and $p(x \mid z)$.

The posterior is given, using Bayes Rule, as:

$$ p(z \mid x) = \frac{p(x \mid z)\,p(z)} {p(x)} $$

We have $p(x \mid z)$ and $p(z)$. How to get $p(x)$? We can get it by marginalizing the joint distribution over $z$:

$$ p(z \mid x) = \frac{p(x \mid z)p(z)} {\int_z p(x,z) \, dz} $$

The issue is that the marginalization over $z$ is often intractible. For example, the space of $z$ could be combinatorially large.

The Variational approximation lets us provide a distribution that approximates $p(z \mid x)$, and replace the marginalization by expectation.

Let's use a distribution $q(z)$ to approximate $p(z \mid x)$. This distribution will likely be a family of parametric distributions, ie $q_\phi(z)$. We need to minimize the difference of $q_\phi(z)$ from $p(z \mid x)$. How to do this? We can measure this using the KL-divergence, and then minimize this KL-divergence.

The KL-divergence is:

$$ \def\Exp{\mathbb{E}} D_{KL}(q_\phi(z) \| p(z | x)) = \Exp_{q_\phi(z)}\left[ \log \frac{q_\phi(z)} {p(z \mid x)} \right] $$

$$ =\Exp_{q_\phi(z)} \left[ \log q_\phi(z) + \log \left( \frac{p(x)} {p(z,x)} \right) \right] $$

$$ = \Exp_{q_\phi(z)} \left[ \log q_\phi(z) + \log p(x) - \log p(z,x) \right] $$

But $p(x)$ doesnt depend on $z$, so we can remove it from the expectation:

$$ D_{KL}(q_\phi(z) \| p(z \mid x)) = \Exp_{q_\phi(z)} \left[ \log q_\phi(z) - \log p(z,x) \right] + \log p(x) $$

Therefore:

$$ \log p(x) = D_{KL}(q_\phi(z) \| p(z \mid x)) - \Exp_{q_\phi(z)} \left[ \log q_\phi(z) - \log p(z,x) \right] $$

$$ = D_{KL}(q_\phi(z) \| p(z \mid x)) + L[q_\phi] $$

where $L[q_\phi] = \Exp_{q_\phi(z)}[\log p(z,x) - \log q_\phi(z)]$ is a functional of $q_\phi$.

Since $D_{KL}(\cdot \| \cdot)$ is more than 0, and $\log p(x)$ is independent of $q_\phi$, then maximizing $L$ will minimize the KL-divergence, and the approximate distribution $q_\phi(z)$ becomes most similar to the posterior distribution $p(z \mid x)$.

$L[q_\phi]$ is termed the Evidence Lower Bound (ELBO), or the Variational Lower Bound.