- Different design decisions for Deep Learning
- Different algorithms
- Different models
- The depth of a model in Deep Graphical Models
- Depth of latent varaible $\mathbf{h}$ is...
- The shortest path from $\mathbf{h}$ to an observed variable.
- Deep Learning models...
- Use Distributed Representations
- Typically have more latent variables than observed variables.
- Focus Indirect effect
- Capture Nonlinear interactions via Indirect connections between multiple latent variables.
Traditional Graphical Models
- Focus Direct effect
- Capture Nonlinear interactions using High order term and Structure Learning between variables.
- Use only few latent variables
Design of Latent variables in Deep Graphical Models
- The latent variables do not have any specific semantics
- Usually not very easy for a numan to inteprete
- c.f. In the traditional models...
- Latent variables are designed with some specific semantics in "human" mind.
- Less able to scale to complex problems
- Not reusable
Connectivity typically used in Deep Graphical Models
- Have large groups of units that are all connected to other groups of units.
- Interactions between two groups may be described by a single matrix.
- c.f. In the traditional models...
- few connections and the choice of connections for each variable.
Training Algorithms is "free" to model a particular dataset.
- c.f. In the traditional models...
- The choice of inference algorithm is "tightly linked" with the design of the model structure.
Traditional approaches typically aim to "Exact inference".
- Or use "loopy belief propagation" for approximate inference algorithm. (Murphy Ch20)
- c.f. In Deep Graphical Model..
- Gibbs sampling or variational inference algorithms.
Both approaches work well with very Sparsely connected graphs (using exact inference or loopy belief propatation).
- In case that the graphs are not spase enough
- exact inference or loopy belief propagation are not relevant.
Deep Graphical Models In the view of Computation
- A very large latent variables makes efficient numerical code essential.
- => implemented with efficient matrix product operations
- => sparsely connected generalizations
- block diagonal matrix products or convolutions
- c.f. Traditional Model use one big matrix.(?)
Trend: The deep learning approach is often...
- Figure out what the minimum amount of information we absolutely need
- Figure out how to get a reasonable approximation of that information as quickly as possible.
- c.f. Traditional approach ...
- Simplifying the model until computing exactly.
- Increase power of the model until it is barely possible to train or use.
- We train Model!!!
- Marginal distributions cannot be computed
- However Satisfied to draw approximate samples
- Objective function is intractable.
- However have an estimate of the gradient.