Multivariate Gaussian Properties

29th May 2014

Neil Lawrence

The Gaussian density is a remarkable density. It has some very unusual properties. In particular, the sum of a set of Gaussian random variables is also Gaussian distributed. So if we are given that

$$x_i \sim \mathcal{N}(\mu_i, \sigma_i^2)$$

and we wish to know the distribution of

$$\sum_{i=1}^k x_i$$

we know the result is given by

$$\sum_{i=1}^k x_i \sim \mathcal{N}\left(\sum_{i=1}^k\mu_i, \sum_{i=1}^k \sigma_i^2\right).$$

Note that the new mean is the sum of the old means, and the new variance is the sum of the old variances.

Scaling a Gaussian

Scaling a Gaussian also leads to a Gaussian form

$$cx_i \sim \mathcal{N}(c\mu_i, c^2\sigma_i^2).$$

Multivariate Gaussians

Since matrix multiplication is just a series of additions and multiplications (i.e. it's a linear operation), applying matrix multiplication to Gaussian random variables also leads to Gaussian densities. So if $\mathbf{x}$ is drawn from a Gaussian density with covariance $\boldsymbol{\Sigma}$ and mean $\boldsymbol{\mu}$, $$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$$ then $$\mathbf{f} = \mathbf{W}\mathbf{x}$$ would also be drawn from a Gaussian density $$\mathbf{f} \sim \mathcal{N}(\mathbf{W}\boldsymbol{\mu}, \mathbf{W}\boldsymbol{\Sigma}\mathbf{W}^\top).$$

The simple linear algebraic relationship between these covariances is at the heart of Gaussian process models. It makes models based on Gaussian densities extremely easy to do Bayesian inference in. For example, if we further assume that $\mathbf{f}$ is corrupted by additive independent zero mean Gaussian noise, with variance $\sigma^2$, to form an observation $\mathbf{y}$, then since adding a Gaussian variable also leads to another Gaussian distribution we know that $$\mathbf{y} \sim \mathcal{N}(\mathbf{W}\boldsymbol{\mu}, \mathbf{W}\boldsymbol{\Sigma}\mathbf{W}^\top + \sigma^2 \mathbf{I}).$$

If we further assume that $\boldsymbol{\Sigma} = \mathbf{I}$ and $\boldsymbol{\mu} = \mathbf{0}$ then we recover the probabilistic model underlying principal component analysis (PCA), probabilistic PCA (Tipping and Bishop, 1999, Roweis, 1998). $$\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{W}\mathbf{W}^\top + \sigma^2 \mathbf{I}).$$ This perspective on PCA matches that introduced by Hotelling, 1933.


In [ ]: