My goal is to provide fast methods to use Markov-Chain Monte Carlo (MCMC) methods on linear mixed-effects models, with possible extensions to generalized linear mixed models.
There are several MCMC frameworks for Julia. An interface to Stan is available and there are two native Julia implementations; Mamba and Lora. I prefer Mamba
because of its flexibility. The problem with Stan, BUGS and JAGS is that each of them reinvents all the data structures, input/output, data manipulation, distribution definitions, etc. in its own environment. They also define a Domain Specific Language (DSL) for which they must provide parsers, interpreters, run-time environments, etc.
A native implementation like Mamba can use all of the facilities of Julia and its packages.
Whenever you have a linear predictor (i.e. an $\bf X\beta$ type of expression) in a model there is a good chance that you can write out the full conditional distribution of $\beta$ or obtain a good approximation to it. If you can write out the conditional distribution you can use a multivariate Gibbs sampler to obtain a vector-valued sample from the condtional. This helps to avoid one of the underlying problems of MCMC methods which is successive sampling from conditionals of correlated parameters. Consider a case where you might have hundreds or thousands of random effects and dozens of fixed effects. You don't want to sample sequentially in those cases when you can sample from the distribution of the entire vector in one step.
In most cases the conditional distribution of the coefficients (i.e. both random and fixed effects) is a multivariate normal with known mean and covariance matrix. It is worthwhile examining the representation in Julia of this distribution. Not surprisingly the representation involved the mean and covariance but the form of the covariance is encoded in the type. For example, a common prior distribution for coefficients is a zero-mean multivariate normal with a covariance that is a large multiple of the identity - a diffuse prior.
In [1]:
versioninfo(true)
In [2]:
using Distributions, GLM, Mamba, PDMats, StatsBase
d = MvNormal(2, 1000.)
Out[2]:
If you take apart the representation itself you discover that the only values that are stored are the scalar $\sigma^2$ and its inverse.
In [3]:
fieldnames(d)
Out[3]:
In [4]:
typeof(d.μ)
Out[4]:
In [5]:
typeof(d.Σ)
Out[5]:
In [6]:
fieldnames(d.Σ)
Out[6]:
In [7]:
(d.Σ.dim, d.Σ.value, d.Σ.inv_value)
Out[7]:
In [8]:
X = hcat(ones(5), [1.:5;])
Out[8]:
In [9]:
XtX = PDMat(X'X)
Out[9]:
In [10]:
y = [1.,3,3,3,5];
Xty = X'y
Out[10]:
In [11]:
β = XtX\Xty
Out[11]:
We could obtain the residual sum of squares by forming $\mu = \bf X'\beta$ but there is a short cut that we can use later with mixed-effects models. We append the column of $\bf y$ to the $\bf X$ and then form the Cholesky decomposition.
In [12]:
const Xy = hcat(X,y)
Out[12]:
In [13]:
pp = PDMat(Xy'Xy)
Out[13]:
This PDMat
object contains the positive-definite matrix itself and its (upper) Cholesky factor.
In [14]:
fieldnames(pp)
Out[14]:
In [15]:
pp.mat
Out[15]:
In [16]:
pp.chol
Out[16]:
As with many of the factorizations available in Julia (see also), the Cholesky
factorization provides the factor itself by indexing with a symbol.
In [17]:
R = pp.chol[:U]
Out[17]:
The [3, 3] element is the square root of the residual sum of squares.
In [18]:
abs2(R[3, 3]) # abs2 is x^2 as a function
Out[18]:
In [19]:
using StatsBase
sqL2dist(y, X * β) # squared L₂ distance
Out[19]:
In [20]:
sqL2dist(y, X * β) ≈ abs2(R[3, 3])
Out[20]:
The least squares estimates of the coefficients are obtained as
In [21]:
R12 = UpperTriangular(R[1:2, 1:2]);
typeof(R12)
Out[21]:
In [22]:
bb = R12 \ R[1:2, 3]
Out[22]:
Furthermore, the covariance matrix of the least squares parameter estimator is
In [23]:
R12inv = inv(R12)
Out[23]:
In [24]:
Σ₁ = unscaledΣ = PDMat(R12inv * R12inv')
Out[24]:
In [25]:
inv(PDMat(X'X))
Out[25]:
We want to sample from a $\mathcal{N}(\beta, \sigma^2 \Sigma_1)$ distribution for the Gibbs update of $\beta$.
At this point we want to sample from
If you trace back through methods for functions like rand
, its mutating version rand!
and its hidden, mutating version without dimension checks, _rand!
, you will find that a call to
rand(MvNormal(β, abs2(σ) * Σ₁))
eventually calls
_rand!(d::MvNormal, x::VecOrMat{Float64}) = add!(unwhiten!(d.Σ, randn!(x)), d.μ)
where randn!(x)
overwrites x
with $\mathcal{N}(0, I)$ random variates.
This brings us to the question of what is "unwhitening"? Well, it is the inverse of "whitening" which is producing "white noise", i.e. uncorrelated, constant variance multivariate normal disturbances, from correlated, non-constant variance disturbances.
This is the multivariate equivalent of the inverse of the univariate "z transformation" $$ x = \mu + \sigma z $$
For the multivariate case we need to multiply a vector $\mathbf{z}$ by $\sigma$ and a "square root" of $\Sigma_1$, which is given by its Cholesky factor, $R_1$. That is $\mathrm{Var}(R_1\mathbf{z})$ is $R_1'\mathrm{Var}(\mathbf{zz}')R_1=R_1'R_1=\Sigma_1$.
In [30]:
σ = R[3, 3]/√3
Out[30]: