Generative Models: Restricted Boltzmann Machines

The main difference:

$$E(x,h) = -h^T Wx - c^Tx - b^Th \\ = -\sum \sum W_{i,j}h_jx_k - \sum c_k x_k- \sum b_j h_j$$

Joint distribution

$$p(x,h) = \frac{\exp(-E(x,h))}{Z}$$
Partition function: $Z = \sum_{x\in \{0,1\}^n} \sum_{h\in\{0,1\}^m} p(x,h)$
Intractable: because we need to sum of $2^n$ $x$s and $2^m$ $h$s, therefore total of $2^{n+m}$.

$$p(x,h) = \frac{\exp(-E(x,h))}{Z} = \frac{\exp(h^TWx + c^Tx + b^th)}{Z} \\ = \frac{\exp() \exp() \exp()}{Z}$$

Connection to physics and nature:

Interactions between atoms and molecules
Differnet energy functions

$$p(x,h) = \frac{1}{Z} \prod \prod \exp() \times \prod \exp() \times \prod \exp()$$

Conditional dist

$$p(h|x) = \prod_j p(h_j|x)$$

$p(h_j=1|x) = \frac{1}{1 + \exp(-(b_j + W_{j.}x))} = sigmpoid(b_j + W_{j.}x)$

$$p(x|h) = \prod p(x_k|h)$$

$p(x_k=1|h) = \frac{1}{1 + \exp()} = sigmoid(x_k + h^T W_{.k})$

$$p(h|x) = \frac{}{\sum_\tilde{h} p(x,\tilde{h})}$$

$$p(x) = \sum_h p(x,h) = \sum_h \frac{\exp(-E(x,h))}{Z}$$$$p(x) = \frac{\exp(-F(x))/}{Z}$$

$F(x)$ is the free energy.

This is called softplus(.). Softplus is a smooth version of ReLU.

Loss function: negative log-likelihood

$$\frac{1}{T} \sum_{t\in \text{training}} l(f(x^t)) = \frac{1}{T} \sum_t -\log(p(x^t))$$
Training: stochastic gradient descent

where $\mathbf{E}_x$ is the expectaiton under distrubution of $x$ and $E$ is the energy function.

Sampling the negative samples.

Adding more layers: Deep Boltzmann Machines

Discussion: comparing modeling using directed graph vs. undirected

Directed: $z \to x$ we model $p(x|z)$
Undirected: $z -- x$ we model $p(x,z)$ and $p(z)$

The directed version is easier, since we use some input.

The undirected graph should in theory give more accurate model since during the iterative process, we repeatedly make both $x$ and $z$ better.



In [ ]:



In [ ]:



In [ ]:

For the case when input $x$ is real and unbounded:
- add a quadratic termto the energy function $$E(x,h) = -h^TWx - c^Tx - b^Th +/frac{1}{2} x^Tx$$
- $p(x|h)$ is now a Guassian dist
  - $\mu = c + W^Tx$
  - $\Sigma = I$



In [ ]: