The Gaussian density is a remarkable density. It has some very unusual properties. In particular, the sum of a set of Gaussian random variables is also Gaussian distributed. So if we are given that
and we wish to know the distribution of
we know the result is given by
Note that the new mean is the sum of the old means, and the new variance is the sum of the old variances.
Scaling a Gaussian also leads to a Gaussian form
Since matrix multiplication is just a series of additions and multiplications (i.e. it's a linear operation), applying matrix multiplication to Gaussian random variables also leads to Gaussian densities. So if $\mathbf{x}$ is drawn from a Gaussian density with covariance $\boldsymbol{\Sigma}$ and mean $\boldsymbol{\mu}$, $$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$$ then $$\mathbf{f} = \mathbf{W}\mathbf{x}$$ would also be drawn from a Gaussian density $$\mathbf{f} \sim \mathcal{N}(\mathbf{W}\boldsymbol{\mu}, \mathbf{W}\boldsymbol{\Sigma}\mathbf{W}^\top).$$
The simple linear algebraic relationship between these covariances is at the heart of Gaussian process models. It makes models based on Gaussian densities extremely easy to do Bayesian inference in. For example, if we further assume that $\mathbf{f}$ is corrupted by additive independent zero mean Gaussian noise, with variance $\sigma^2$, to form an observation $\mathbf{y}$, then since adding a Gaussian variable also leads to another Gaussian distribution we know that $$\mathbf{y} \sim \mathcal{N}(\mathbf{W}\boldsymbol{\mu}, \mathbf{W}\boldsymbol{\Sigma}\mathbf{W}^\top + \sigma^2 \mathbf{I}).$$
If we further assume that $\boldsymbol{\Sigma} = \mathbf{I}$ and $\boldsymbol{\mu} = \mathbf{0}$ then we recover the probabilistic model underlying principal component analysis (PCA), probabilistic PCA (Tipping and Bishop, 1999, Roweis, 1998). $$\mathbf{y} \sim \mathcal{N}(\mathbf{0}, \mathbf{W}\mathbf{W}^\top + \sigma^2 \mathbf{I}).$$ This perspective on PCA matches that introduced by Hotelling, 1933.
In [ ]: