When we talk about generative models, we mainly talk about the following 3 types:
The main characterisitcs of generative models are:
Note: This document is meant to be a comprehensive collection of references for deep generative models. We start with a list of references for general ideas and then delve deep into each of the branches.
References for deep generative models:
There can be many ways to classify generative models.
Here's one way of classifying generative models:
Fully observed models - model that observes data directly without any new unobserved local variables, e.g. undirected graphical models, auto-regressive models, Boltzmann machines
Latent variable models - introduce an unobserved random variable for every observed data point to explain hidden causes
Each of the above model types can be visualized based on whether they are directed or undirected and whether they work with discrete or continuous data. Here are the model space visualizations for fully observed and latent variable models:
(The above figures are from this Shakir presentation.
Here's another way to a general taxonomy of deep generative models based on [1].
Models that define an explicit density function pmodel(x;$\theta$). For these models, maximization of the likelihood is straightforward; we simply plug the model’s definition of the density function into the expression for the likelihood, and follow the gradient uphill.
Explicit density models can be of the following types:
Models that define an explicit density function that is computationally tractable. There are currently two popular approaches to tractable explicit density models:
FVBNs are models that use the chain rule of probability to decompose a probability distribution over an n-dimensional vector x into a product of one-dimensional probability distributions: $p_{\text {model }}(\boldsymbol{x})=\prod_{i=1}^{n} p_{\text {model }}\left(x_{i} | x_{1}, \ldots, x_{i-1}\right)$.
Here are some implementations of variants of FVBN:
These are also models with auto-regressive flows.
References for Auto-regressive Flows:
Models with explicit density functions is based on defining continuous, nonlinear transformations between two different spaces. For example, if there is a vector of latent variables z and a continuous, differentiable, invertible transformation g such that g(z) yields a sample from the model in x space, then $p_{x}(\boldsymbol{x})=p_{z}\left(g^{-1}(\boldsymbol{x})\right)\left|\operatorname{det}\left(\frac{\partial g^{-1}(\boldsymbol{x})}{\partial \boldsymbol{x}}\right)\right|$
The density px is tractable if the density pz is tractable and the determinant of the Jacobian of g−1 is tractable. In other words, a simple distribution over z combined with a transformation g that warps space in complicated ways can yield a complicated distribution over x, and if g is carefully designed, the density is tractable too.
Here are some implementations:
These are also models with normalizing flows.
References for Normalizing Flows:
Models that provide an explicit density function but use one that is intractable, requiring the use of approximations to maximize the likelihood. These fall roughly into two categories: those using deterministic approximations, which almost always means variational methods, and those using stochastic approximations, meaning Markov chain Monte Carlo methods.
Variational methods define a lower bound $\mathcal{L}(\boldsymbol{x} ; \boldsymbol{\theta}) \leq \log p_{\operatorname{model}}(\boldsymbol{x} ; \boldsymbol{\theta})$.
A learning algorithm that maximizes L is guaranteed to obtain at least as high a value of the log-likelihood as it does of L. For many families of models, it is possible to define an L that is computationally tractable even when the log-likelihood is not.
Implementation:
A detailed reference list for VAEs can be found here.
Models that make use of some form of stochastic approximation, at the very least in the form of using a small number of randomly selected training examples to form a minibatch used to minimize the expected loss.
Implementation:
References:
Some models use both variational and Markov chain approximations. For example, deep Boltzmann machines make use of both types of approximation.
Some models can be trained without even needing to explicitly define a density functions. These models instead offer a way to train the model while interacting only indirectly with pmodel, usually by sampling from it. These constitute the implicit density models.
Some of these implicit models based on drawing samples from pmodel define a Markov chain transition operator that must be run several times to obtain a sample from the model. From this family, the primary example is the Generative Stochastic Network.
References:
Family of implicit models that can generate a sample in a single step. At the time of their introduction, GANs were the only notable member of this family, but since then they have been joined by additional models based on kernelized moment matching.
References: