To be able to tune the Gibbs sampler for a multivariate normal likelihood and multivariate normal prior, it helps to go back and look at the sampler. Eventually sampling from a multivariate normal works its way down to a function in the mvnormal.jl
source file for the Distributions package.
That function is a one-liner
_rand!(d::MvNormal, x::VecOrMat{Float64}) = add!(unwhiten!(d.Σ, randn!(x)), d.μ)
The underscore at the beginning of the name indicates that you shouldn't expect to call this function directly. The trailing !
on the function names indicates that these functions are mutating functions. That is, they modify one or more of their arguments.
This is a common idiom in Julia. A function that generates a vector or array is often defined to call a companion function that mutates one of its arguments. The randn!
function operates on a VecOrMat{Float64}
overwriting the argument with i.i.d. standard normal variates. If you check how randn
is defined to return a vector it simply allocates the result and passes it to randn!
. The point it that, if you want to, you can overwrite an array in each step of a iteration and avoid needless allocation and garbage collection of vectors at every call to a function like randn
.
The MvNormal
type is an abstract type encompassing many specialized cases.
In [1]:
using Distributions, PDMats
In [2]:
?MvNormal
Out[2]:
It is common to define a diffuse multivariate normal prior for unbounded vector parameters like coefficients. These end up being special cases of the MvNormal
type according to the template arguments.
In [3]:
d = MvNormal(2, 1000.)
Out[3]:
In [4]:
typeof(d.μ)
Out[4]:
In [5]:
typeof(d.Σ)
Out[5]:
In [6]:
?ScalMat
Out[6]:
This type of MvNormal could be of dimension 1,000,000 and would still only require storing these three numbers.
The next question to ask is what does unwhiten!
do. Essentially it is the multivariate equivalent of multiplying by $\sigma$ in the univariate case. If you want to take a univariate standard normal, z
, and convert it to have a $\mathcal{N}(\mu, \sigma^2)$ distribution you would form $\mu + \sigma z$. The "unwhitening" here is essentially multiplying by a square root of \Sigma
, which is the Cholesky factor.
If you have a sample from a $\mathcal{N}(\mu,\sigma)$ distribution you convert it to "white noise" or "whiten" it with the Z transformation. The transformation in the other direction is called "unwhitening".
The Gibbs sampler for $\beta$ in the Mamba documentation uses standard formulas to get the mean and variance of a combination of multivariate normals.
Gibbs_beta = Sampler([:beta],
(beta, s2, xmat, y) ->
begin
beta_mean = mean(beta.distr)
beta_invcov = invcov(beta.distr)
Sigma = inv(xmat' * xmat / s2 + beta_invcov)
mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)
rand(MvNormal(mu, Sigma))
end
)
The function indicated by the ->
mapping operator (what is sometimes called the "stabby lambda" syntax) produces the next value of $\beta$. This particular syntax is unusual in that it combines the "one-liner" form with a "begin/end" block. It would be more common to write this as
Gibbs_beta = Sampler(:beta,
function (beta, s2, xmat, y)
beta_mean = mean(beta.distr)
beta_invcov = invcov(beta.distr)
Sigma = inv(xmat' * xmat / s2 + beta_invcov)
mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)
rand(MvNormal(mu, Sigma))
end
)
Notice that xmat' * xmat
and xmat' * y
are recomputed at each iteration. A slightly less obvious inefficiency comes from losing the information on the structure of the matrices involved. The matrix xmat' * xmat
is positive semi-definite and symmetric but this information is lost.
In [7]:
xmat = hcat(ones(5), collect(1:5))
Out[7]:
In [8]:
xmat' * xmat
Out[8]:
We also lose the special structure of the covariance of the prior.
In [9]:
invcov(d)
Out[9]:
None of these issues are at all important for this example. In large examples, however, one might want to be more careful and store X'X
and X'y
. It would look like
In [10]:
function updateβ(β, σ², XtX, Xty)
d = β.distr
if isa(d, ZeroMeanIsoNormal)
cfac = cholfact!(XtX + σ² * d.Σ.inv_value * I, :L)
return cfac\Xty + √σ² * cfac[:L]' \ randn(length(β))
else
throw(ArgumentError(
"prior for coefficients should be ZeroMeanIsoNormal"))
end
end
Out[10]: