MCMC for West Cost, Variational Bayes for East Cost. So, I prefer to use the variational approach in my research over sampling. In some machine learning courses and books (as Bishop "Pattern Recognition") it looks like number of algebraic tricks. I wonder, where does real VARIATIONAL, as we know it from real analysis, hide in all it? Also, I found this way more forward and clear for me.
Consider
Our model:
$$ p(X, Z, \theta) = p(X|Z)p(Z|\theta)p(\theta) $$Our far goal:
$$ p(Z,\theta|X) = \dfrac{p(X, Z, \theta)}{\int p(X, Z, \theta) dZd\theta} $$Most hard part is to deal with $\int p(X, Z, \theta) dZd\theta = p(X)$. Let's optimise its lower bound.
Consider $\mathcal{F}(q) = \mathbb{E}_q \ln p(X,Z|\theta) + \mathbb{E}_q \ln p(\theta) - \mathbb{E}_q\ln q(Z, \theta)$.
Than we obtain optimization problem with constraints:
$$ \max\mathcal{L}(q,\lambda) = \mathcal{F}(q) + \lambda\left(\int q(Z,\theta)dZd\theta - 1 \right) = \\ = \mathbb{E}_q \ln p(X,Z|\theta) + \mathbb{E}_q \ln p(\theta) - \mathbb{E}_q\ln q(Z, \theta) + \lambda\left(\int q(Z,\theta)dZd\theta - 1 \right) $$FOC:
$$ \dfrac{\delta}{\delta q}\mathcal{L}(q,\lambda) = \dfrac{\delta}{\delta q}\mathcal{F}(q)+\lambda = \ln p(X,Z,\theta) - \left(\ln q(Z,\theta) + 1\right) + \lambda = 0 $$$$ \dfrac{\partial}{\partial \lambda}\mathcal{L}(q,\lambda) = \int q(Z,\theta)dZd\theta - 1 = 0 $$From first condition:
$$ \ln p(X,Z,\theta) = \ln q(Z,\theta) + 1 - \lambda \\ q(Z,\theta) = p(X,Z,\theta)\exp(\lambda - 1) \\ $$Then from second condition:
$$ \exp(\lambda - 1)\int p(X,Z,\theta) dZd\theta = 1 \\ \lambda = -\ln \int p(X,Z,\theta)dZd\theta + 1 $$Finally:
$$ q(Z,\theta) = \dfrac{p(X,Z,\theta)}{\int p(X, Z,\theta)dZd\theta} $$We end up with exact solution. Now we will follow same pipeline, but use mean field approximation.
Sometimes (be honest, almost surely) we can’t compute, $\int p(X, Z,\theta)dZd\theta$ directly. Then we have restrict to set of functions. Thus, we condiser $q(Z,\theta) = q_z(Z)q_{\theta}(\theta)$.
Then:
$$ \mathcal{F}(q) = \mathbb{E}_q \ln p(X,Z|\theta) + \mathbb{E}_q \ln p(\theta) - \mathbb{E}_q\ln q(Z, \theta) = \\ = \mathbb{E}_q \ln p(X,Z|\theta) + \mathbb{E}_{q_{\theta}} \ln p(\theta) - \mathbb{E}_{q_{\theta}}\ln q_{\theta} (\theta) - \mathbb{E}_{q_{z}} \ln q_{z}(Z) $$As earlier FOC:
Then, for $q_\theta$:
Finally:
$$ q_{\theta}(\theta) = \dfrac{p(\theta)\exp(\mathbb{E}_{q_{z}}\ln p(X,Z|\theta)}{\int p(\theta)\exp(\mathbb{E}_{q_{z}}\ln p(X,Z|\theta)d\theta} $$By symmetry:
And that's it!