This was written using a combination of at least the following sources:
We want to evaluate the posterior $p(z \mid x)$ given a prior $p(z)$, and $p(x \mid z)$.
The posterior is given, using Bayes Rule, as:
$$ p(z \mid x) = \frac{p(x \mid z)\,p(z)} {p(x)} $$We have $p(x \mid z)$ and $p(z)$. How to get $p(x)$? We can get it by marginalizing the joint distribution over $z$:
$$ p(z \mid x) = \frac{p(x \mid z)p(z)} {\int_z p(x,z) \, dz} $$The issue is that the marginalization over $z$ is often intractible. For example, the space of $z$ could be combinatorially large.
The Variational approximation lets us provide a distribution that approximates $p(z \mid x)$, and replace the marginalization by expectation.
Let's use a distribution $q(z)$ to approximate $p(z \mid x)$. This distribution will likely be a family of parametric distributions, ie $q_\phi(z)$. We need to minimize the difference of $q_\phi(z)$ from $p(z \mid x)$. How to do this? We can measure this using the KL-divergence, and then minimize this KL-divergence.
The KL-divergence is:
$$ \def\Exp{\mathbb{E}} D_{KL}(q_\phi(z) \| p(z | x)) = \Exp_{q_\phi(z)}\left[ \log \frac{q_\phi(z)} {p(z \mid x)} \right] $$But $p(x)$ doesnt depend on $z$, so we can remove it from the expectation:
$$ D_{KL}(q_\phi(z) \| p(z \mid x)) = \Exp_{q_\phi(z)} \left[ \log q_\phi(z) - \log p(z,x) \right] + \log p(x) $$Therefore:
$$ \log p(x) = D_{KL}(q_\phi(z) \| p(z \mid x)) - \Exp_{q_\phi(z)} \left[ \log q_\phi(z) - \log p(z,x) \right] $$where $L[q_\phi] = \Exp_{q_\phi(z)}[\log p(z,x) - \log q_\phi(z)]$ is a functional of $q_\phi$.
Since $D_{KL}(\cdot \| \cdot)$ is more than 0, and $\log p(x)$ is independent of $q_\phi$, then maximizing $L$ will minimize the KL-divergence, and the approximate distribution $q_\phi(z)$ becomes most similar to the posterior distribution $p(z \mid x)$.
$L[q_\phi]$ is termed the Evidence Lower Bound (ELBO), or the Variational Lower Bound.