Sampling from a multivariate normal

To be able to tune the Gibbs sampler for a multivariate normal likelihood and multivariate normal prior, it helps to go back and look at the sampler. Eventually sampling from a multivariate normal works its way down to a function in the mvnormal.jl source file for the Distributions package.

That function is a one-liner

_rand!(d::MvNormal, x::VecOrMat{Float64}) = add!(unwhiten!(d.Σ, randn!(x)), d.μ)

The underscore at the beginning of the name indicates that you shouldn't expect to call this function directly. The trailing ! on the function names indicates that these functions are mutating functions. That is, they modify one or more of their arguments.

This is a common idiom in Julia. A function that generates a vector or array is often defined to call a companion function that mutates one of its arguments. The randn! function operates on a VecOrMat{Float64} overwriting the argument with i.i.d. standard normal variates. If you check how randn is defined to return a vector it simply allocates the result and passes it to randn!. The point it that, if you want to, you can overwrite an array in each step of a iteration and avoid needless allocation and garbage collection of vectors at every call to a function like randn.

The MvNormal type is an abstract type encompassing many specialized cases.


In [1]:
using Distributions, PDMats

In [2]:
?MvNormal


search: 
Out[2]:

No documentation found.

Summary:

immutable Distributions.MvNormal{Cov<:PDMats.AbstractPDMat{T<:AbstractFloat},Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}} <: Distributions.AbstractMvNormal

Fields:

μ :: Mean<:Union{Array{Float64,1},Distributions.ZeroVector{Float64}}
Σ :: Cov<:PDMats.AbstractPDMat{T<:AbstractFloat}
MvNormal MvNormalCanon MvNormalKnownCov gmvnormal MvLogNormal

It is common to define a diffuse multivariate normal prior for unbounded vector parameters like coefficients. These end up being special cases of the MvNormal type according to the template arguments.


In [3]:
d = MvNormal(2, 1000.)


Out[3]:
ZeroMeanIsoNormal(
dim: 2
μ: [0.0,0.0]
Σ: 2x2 Array{Float64,2}:
 1.0e6  0.0  
 0.0    1.0e6
)

In [4]:
typeof(d.μ)


Out[4]:
Distributions.ZeroVector{Float64}

In [5]:
typeof(d.Σ)


Out[5]:
PDMats.ScalMat{Float64}

In [6]:
?ScalMat


search: 
Out[6]:

No documentation found.

Summary:

immutable PDMats.ScalMat{T<:AbstractFloat} <: PDMats.AbstractPDMat{T<:AbstractFloat}

Fields:

dim       :: Int64
value     :: T<:AbstractFloat
inv_value :: T<:AbstractFloat
ScalMat WalleniusNoncentralHypergeometric FisherNoncentralHypergeometric

This type of MvNormal could be of dimension 1,000,000 and would still only require storing these three numbers.

The next question to ask is what does unwhiten! do. Essentially it is the multivariate equivalent of multiplying by $\sigma$ in the univariate case. If you want to take a univariate standard normal, z, and convert it to have a $\mathcal{N}(\mu, \sigma^2)$ distribution you would form $\mu + \sigma z$. The "unwhitening" here is essentially multiplying by a square root of \Sigma, which is the Cholesky factor.

If you have a sample from a $\mathcal{N}(\mu,\sigma)$ distribution you convert it to "white noise" or "whiten" it with the Z transformation. The transformation in the other direction is called "unwhitening".

Creating a Gibbs sampler for coefficients

The Gibbs sampler for $\beta$ in the Mamba documentation uses standard formulas to get the mean and variance of a combination of multivariate normals.

Gibbs_beta = Sampler([:beta],
  (beta, s2, xmat, y) ->
    begin
      beta_mean = mean(beta.distr)
      beta_invcov = invcov(beta.distr)
      Sigma = inv(xmat' * xmat / s2 + beta_invcov)
      mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)
      rand(MvNormal(mu, Sigma))
    end
)

The function indicated by the -> mapping operator (what is sometimes called the "stabby lambda" syntax) produces the next value of $\beta$. This particular syntax is unusual in that it combines the "one-liner" form with a "begin/end" block. It would be more common to write this as

Gibbs_beta = Sampler(:beta,
    function (beta, s2, xmat, y)
      beta_mean = mean(beta.distr)
      beta_invcov = invcov(beta.distr)
      Sigma = inv(xmat' * xmat / s2 + beta_invcov)
      mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)
      rand(MvNormal(mu, Sigma))
    end
)

Notice that xmat' * xmat and xmat' * y are recomputed at each iteration. A slightly less obvious inefficiency comes from losing the information on the structure of the matrices involved. The matrix xmat' * xmat is positive semi-definite and symmetric but this information is lost.


In [7]:
xmat = hcat(ones(5), collect(1:5))


Out[7]:
5x2 Array{Float64,2}:
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0

In [8]:
xmat' * xmat


Out[8]:
2x2 Array{Float64,2}:
  5.0  15.0
 15.0  55.0

We also lose the special structure of the covariance of the prior.


In [9]:
invcov(d)


Out[9]:
2x2 Array{Float64,2}:
 1.0e-6  0.0   
 0.0     1.0e-6

None of these issues are at all important for this example. In large examples, however, one might want to be more careful and store X'X and X'y. It would look like


In [10]:
function updateβ(β, σ², XtX, Xty)
    d = β.distr
    if isa(d, ZeroMeanIsoNormal)
        cfac = cholfact!(XtX + σ² * d.Σ.inv_value * I, :L)
        return cfac\Xty + σ² * cfac[:L]' \ randn(length(β))
    else
        throw(ArgumentError(
        "prior for coefficients should be ZeroMeanIsoNormal"))
    end
end


Out[10]:
updateβ (generic function with 1 method)