Generative Models

$$X = \mu + \phi \alpha$$

where $\mu$ is the mean, and $\alpha$ is the eigen vectors. By generating different coeffcitns $\alpha$, we can generate new $X$ values.

Beyond associating inputs to putputs
Recognize objects in the world and their factors of variation: recgnize a car and its differnt forms doors open or closed
Understand and imagine how the world evolves
Detect surprising events in the world
Imagine and generate rich plans for future
Establish concepts as useful for reasoning and decision making

If the number of hiddent units is the same as input and output units, then using an identity matrix as weights can re-generate the same input as output.
To avoid this, we can make the number o fhidden units smaller, so then the network should learn useful informaiton from input.

Loss function

For binary inputs: cross-0entropy loss
For real-valued inputs: sum-of-squared differences
- Use a linear activation function at the output

When hidden layer is smaller than the input layer
- Hidden layer compresses the input
- Will compress well ony for the training distribution
  - Example: Good compressed features for input digit if it is trained for MNIST

If the inputs are normalized (by subtracting the mean):
- then the encoder corresponds to PCA

Goal: modeling $p(x)$

Three ways to do this:

Fully observed models
- An undirected graphical model
- here there ar eno latent variable, so we can directly model the joint distribution
- Example: recurrent neural network
Transformation models:
- start by a random vector $z$ that can be generated $z\sim \mathcal{N}(0,I)$, then we learn a transformation funciton $f$ so that maps $z \to x = f(z)$
- transform un-observed noise source using a parametrized funtion
- Example: many sampling functions, Generative Adversarial Networks (GAN)

Latent Variable Model: both $x,z$ are random varibale
- modeling hidden causes
- introduc unobserved local random variable

Review on graphical model: to get the joint probability distribution $p(x_1, ...,x_n)

in undirected graphical model, we have to deal with the partition function $Z$.

$$w^{(s)} = $$

$w^{(s)}$ is called the weight/iportance. Intuitivly speaking, it tells how the estimated function $q(z)$ matches with the real distrin=bution $p(z)$

Then, with sampling, we convert the integral into a summation:

$$p(x) = \frac{1}{S}\sum_{s}w^{(s)} p(x|z^{(s)})$$

For this, we use Jensen's inequality: $$\log\left(\int p(x) q(x) dx\right) \ge \int p(x) \log q(x) dx$$



In [ ]:



In [ ]:



In [ ]:



In [ ]:

We start by input data/image ($x$)
Using the inference network, we model parmaters of the latern variable $z$: for example $q(z|x) \sim \mathcal{N}(\mu_z,\Sigma_z)$
Using the parmaters obtained above, generate a random $z$
Use the generated $z$ to reconstruct and obtain $\hat{x}$



In [ ]: