Topics for March 2, 2016

My goal is to provide fast methods to use Markov-Chain Monte Carlo (MCMC) methods on linear mixed-effects models, with possible extensions to generalized linear mixed models.

There are several MCMC frameworks for Julia. An interface to Stan is available and there are two native Julia implementations; Mamba and Lora. I prefer Mamba because of its flexibility. The problem with Stan, BUGS and JAGS is that each of them reinvents all the data structures, input/output, data manipulation, distribution definitions, etc. in its own environment. They also define a Domain Specific Language (DSL) for which they must provide parsers, interpreters, run-time environments, etc.

A native implementation like Mamba can use all of the facilities of Julia and its packages.

Linear predictor

Whenever you have a linear predictor (i.e. an $\bf X\beta$ type of expression) in a model there is a good chance that you can write out the full conditional distribution of $\beta$ or obtain a good approximation to it. If you can write out the conditional distribution you can use a multivariate Gibbs sampler to obtain a vector-valued sample from the condtional. This helps to avoid one of the underlying problems of MCMC methods which is successive sampling from conditionals of correlated parameters. Consider a case where you might have hundreds or thousands of random effects and dozens of fixed effects. You don't want to sample sequentially in those cases when you can sample from the distribution of the entire vector in one step.

Multivariate normal conditionals

In most cases the conditional distribution of the coefficients (i.e. both random and fixed effects) is a multivariate normal with known mean and covariance matrix. It is worthwhile examining the representation in Julia of this distribution. Not surprisingly the representation involved the mean and covariance but the form of the covariance is encoded in the type. For example, a common prior distribution for coefficients is a zero-mean multivariate normal with a covariance that is a large multiple of the identity - a diffuse prior.


In [1]:
versioninfo(true)


WARNING: readbytes is deprecated, use read instead.
 [inlined code] from ./error.jl:26
 in depwarn(::ASCIIString, ::Symbol) at ./deprecated.jl:64
 in readbytes(::Base.PipeEndpoint, ::Vararg{Any}) at ./deprecated.jl:30
 in send_stream(::Base.PipeEndpoint, ::ASCIIString) at /home/bates/.julia/v0.5/IJulia/src/stdio.jl:25
 in 
Julia Version 0.5.0-dev+3149
watch_stream(::Base.PipeEndpoint, ::ASCIIString) at /home/bates/.julia/v0.5/IJulia/src/stdio.jl:41
 in (::IJulia.##4#8)() at ./task.jl:431
while loading In[1], in expression starting on line 1
Commit 3c72273* (2016-03-14 16:15 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
  WORD_SIZE: 64
           Ubuntu 15.10
  uname: Linux 4.2.0-34-generic #39-Ubuntu SMP Thu Mar 10 22:13:01 UTC 2016 x86_64 x86_64
Memory: 15.561424255371094 GB (10432.33203125 MB free)
Uptime: 3851.0 sec
Load Avg:  0.65771484375  0.4658203125  0.5283203125
Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz: 
       speed         user         nice          sys         idle          irq
#1  3798 MHz      21737 s        540 s       3672 s     341064 s          0 s
#2  3698 MHz      33490 s        990 s       3509 s     329140 s          0 s
#3  3689 MHz      19396 s        465 s       3304 s     350182 s          0 s
#4  3692 MHz      20761 s        454 s       3654 s     348260 s          0 s

  BLAS: libopenblas (NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: liblapack
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)
Environment:
  XDG_SESSION_PATH = /org/freedesktop/DisplayManager/Session0
  MANDATORY_PATH = /usr/share/gconf/ubuntu.mandatory.path
  DEFAULTS_PATH = /usr/share/gconf/ubuntu.default.path
  COMPIZ_BIN_PATH = /usr/bin/
  NODE_PATH = /usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript
  XDG_SEAT_PATH = /org/freedesktop/DisplayManager/Seat0
  PATH = /home/bates/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
  HOME = /home/bates
  TERM = screen-256color-bce

Package Directory: /home/bates/.julia/v0.5
12 required packages:
 - FileIO                        0.0.4
 - GLM                           0.5.0
 - Gadfly                        0.4.2
 - HDF5                          0.5.8
 - IJulia                        1.1.8
 - Mamba                         0.9.0
 - MixedModels                   0.4.5+             master
 - PDMats                        0.4.0
 - ProfileView                   0.1.1
 - RCall                         0.3.2+             master
 - VideoIO                       0.0.14
 - ZipFile                       0.2.6
59 additional packages:
 - ArrayViews                    0.6.4
 - BinDeps                       0.3.21
 - Blosc                         0.1.4
 - Cairo                         0.2.31
 - Calculus                      0.1.14
 - Codecs                        0.1.5
 - ColorTypes                    0.2.1
 - ColorVectorSpace              0.1.1
 - Colors                        0.6.3
 - Compat                        0.7.12
 - Compose                       0.4.2
 - Conda                         0.1.9
 - Contour                       0.1.0
 - Cxx                           0.0.0-             master (unregistered)
 - DataArrays                    0.2.20
 - DataFrames                    0.6.10
 - DataStructures                0.4.3
 - Dates                         0.4.4
 - Distances                     0.3.0
 - Distributions                 0.8.10
 - Docile                        0.5.23
 - DualNumbers                   0.2.2
 - FixedPointNumbers             0.1.2
 - FixedSizeArrays               0.0.10
 - GZip                          0.2.18
 - Glob                          1.0.2
 - Graphics                      0.1.3
 - Graphs                        0.6.0
 - Grid                          0.4.0
 - Gtk                           0.9.3
 - GtkUtilities                  0.0.8
 - Hexagons                      0.0.4
 - ImageView                     0.1.19
 - Images                        0.5.3
 - IniFile                       0.2.5
 - Iterators                     0.1.9
 - JSON                          0.5.0
 - KernelDensity                 0.1.2
 - Loess                         0.0.6
 - MathProgBase                  0.4.2
 - Measures                      0.0.2
 - NLopt                         0.3.1
 - NaNMath                       0.2.1
 - Nettle                        0.2.2
 - Optim                         0.4.4
 - Reexport                      0.0.3
 - SHA                           0.1.2
 - SIUnits                       0.0.6
 - Showoff                       0.0.6
 - SortingAlgorithms             0.0.6
 - StatsBase                     0.8.0
 - StatsFuns                     0.2.0
 - TexExtensions                 0.0.3
 - Tk                            0.3.7
 - URIParser                     0.1.3
 - Winston                       0.11.13
 - WoodburyMatrices              0.1.5
 - ZMQ                           0.3.1
 - Zlib                          0.1.12

In [2]:
using Distributions, GLM, Mamba, PDMats, StatsBase
d = MvNormal(2, 1000.)


WARNING: New definition 
    write(GZip.GZipStream, Array{#T<:Any, N<:Any}) at /home/bates/.julia/v0.5/GZip/src/GZip.jl:456
is ambiguous with: 
    write(Base.IO, Array{UInt8, N<:Any}) at io.jl:154.
To fix, define 
    write(GZip.GZipStream, Array{UInt8, N<:Any})
before the new definition.
WARNING: Method definition (::Type{Graphs.KeyVertex})(Int64, #K<:Any) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:12 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:16.
WARNING: Method definition (::Type{Graphs.Edge})(Int64, #V<:Any, #V<:Any) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:54 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:60.
WARNING: Method definition (::Type{Graphs.ExEdge})(Int64, #V<:Any, #V<:Any, Base.Dict{UTF8String, Any}) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:72 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:83.
WARNING: Method definition (::Type{Graphs.TargetIterator})(#G<:Graphs.AbstractGraph, #EList<:Any) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:123 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:127.
WARNING: Method definition (::Type{Graphs.SourceIterator})(#G<:Graphs.AbstractGraph, #EList<:Any) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:141 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:145.
WARNING: Method definition edge_property_requirement(Graphs.AbstractEdgePropertyInspector{#T<:Any}, Graphs.AbstractGraph{#V<:Any, E<:Any}) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/common.jl:164 overwritten at /home/bates/.julia/v0.5/Graphs/src/common.jl:179.
WARNING: Method definition vertex_index(#V<:Union{Graphs.ExVertex, Graphs.KeyVertex}, Graphs.GenericGraph{#V<:Union{Graphs.ExVertex, Graphs.KeyVertex}, E<:Any, VList<:Any, EList<:Any, IncList<:Any}) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/graph.jl:65 overwritten at /home/bates/.julia/v0.5/Graphs/src/graph.jl:67.
WARNING: Method definition (::Type{Graphs.GDistanceVisitor})(#G<:Graphs.AbstractGraph, #DMap<:Any) in module Graphs at /home/bates/.julia/v0.5/Graphs/src/breadth_first_visit.jl:107 overwritten at /home/bates/.julia/v0.5/Graphs/src/breadth_first_visit.jl:111.
WARNING: Method definition reset!(Mamba.ChainProgress) in module Mamba at /home/bates/.julia/v0.5/Mamba/src/progress.jl:34 overwritten at /home/bates/.julia/v0.5/Mamba/src/progress.jl:41.
WARNING: Method definition Slice(Union{Symbol, Array{Symbol, 1}}, Union{#T<:Real, Array{#T<:Real, 1}}) in module Mamba at /home/bates/.julia/v0.5/Mamba/src/samplers/slice.jl:51 overwritten at /home/bates/.julia/v0.5/Mamba/src/samplers/slice.jl:125.
WARNING: Method definition #Slice(Array, Mamba.#Slice, Union{Symbol, Array{Symbol, 1}}, Union{Array{#T<:Real, 1}, #T<:Real}) in module Mamba overwritten.
Out[2]:
ZeroMeanIsoNormal(
dim: 2
μ: [0.0,0.0]
Σ: 2x2 Array{Float64,2}:
 1.0e6  0.0  
 0.0    1.0e6
)

If you take apart the representation itself you discover that the only values that are stored are the scalar $\sigma^2$ and its inverse.


In [3]:
fieldnames(d)


Out[3]:
2-element Array{Symbol,1}:
 :μ
 :Σ

In [4]:
typeof(d.μ)


Out[4]:
Distributions.ZeroVector{Float64}

In [5]:
typeof(d.Σ)


Out[5]:
PDMats.ScalMat{Float64}

In [6]:
fieldnames(d.Σ)


Out[6]:
3-element Array{Symbol,1}:
 :dim      
 :value    
 :inv_value

In [7]:
(d.Σ.dim, d.Σ.value, d.Σ.inv_value)


Out[7]:
(2,1.0e6,1.0e-6)

Sampling without a prior

Let's defer the issue of a prior for a moment and consider that we have the matrix $\bf X'X$, the vector $\bf X'y$ and the scalar $\sigma$ defining the log-likelihood. Without the prior, the mean is


In [8]:
X = hcat(ones(5), [1.:5;])


Out[8]:
5x2 Array{Float64,2}:
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0

In [9]:
XtX = PDMat(X'X)


Out[9]:
PDMats.PDMat{Float64,Array{Float64,2}}(2,2x2 Array{Float64,2}:
  5.0  15.0
 15.0  55.0,Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
2x2 UpperTriangular{Float64,Array{Float64,2}}:
 2.23607  6.7082 
  ⋅       3.16228)

In [10]:
y = [1.,3,3,3,5];
Xty = X'y


Out[10]:
2-element Array{Float64,1}:
 15.0
 53.0

In [11]:
β = XtX\Xty


Out[11]:
2-element Array{Float64,1}:
 0.6
 0.8

We could obtain the residual sum of squares by forming $\mu = \bf X'\beta$ but there is a short cut that we can use later with mixed-effects models. We append the column of $\bf y$ to the $\bf X$ and then form the Cholesky decomposition.


In [12]:
const Xy = hcat(X,y)


Out[12]:
5x3 Array{Float64,2}:
 1.0  1.0  1.0
 1.0  2.0  3.0
 1.0  3.0  3.0
 1.0  4.0  3.0
 1.0  5.0  5.0

In [13]:
pp = PDMat(Xy'Xy)


Out[13]:
PDMats.PDMat{Float64,Array{Float64,2}}(3,3x3 Array{Float64,2}:
  5.0  15.0  15.0
 15.0  55.0  53.0
 15.0  53.0  53.0,Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
3x3 UpperTriangular{Float64,Array{Float64,2}}:
 2.23607  6.7082   6.7082 
  ⋅       3.16228  2.52982
  ⋅        ⋅       1.26491)

This PDMat object contains the positive-definite matrix itself and its (upper) Cholesky factor.


In [14]:
fieldnames(pp)


Out[14]:
3-element Array{Symbol,1}:
 :dim 
 :mat 
 :chol

In [15]:
pp.mat


Out[15]:
3x3 Array{Float64,2}:
  5.0  15.0  15.0
 15.0  55.0  53.0
 15.0  53.0  53.0

In [16]:
pp.chol


Out[16]:
Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
3x3 UpperTriangular{Float64,Array{Float64,2}}:
 2.23607  6.7082   6.7082 
  ⋅       3.16228  2.52982
  ⋅        ⋅       1.26491

As with many of the factorizations available in Julia (see also), the Cholesky factorization provides the factor itself by indexing with a symbol.


In [17]:
R = pp.chol[:U]


Out[17]:
3x3 UpperTriangular{Float64,Array{Float64,2}}:
 2.23607  6.7082   6.7082 
  ⋅       3.16228  2.52982
  ⋅        ⋅       1.26491

The [3, 3] element is the square root of the residual sum of squares.


In [18]:
abs2(R[3, 3]) # abs2 is x^2 as a function


Out[18]:
1.6000000000000012

In [19]:
using StatsBase
sqL2dist(y, X * β)  # squared L₂ distance


Out[19]:
1.5999999999999992

In [20]:
sqL2dist(y, X * β)  abs2(R[3, 3])


Out[20]:
true

The least squares estimates of the coefficients are obtained as


In [21]:
R12 = UpperTriangular(R[1:2, 1:2]);
typeof(R12)


Out[21]:
UpperTriangular{Float64,Array{Float64,2}}

In [22]:
bb = R12 \ R[1:2, 3]


Out[22]:
2-element Array{Float64,1}:
 0.6
 0.8

Furthermore, the covariance matrix of the least squares parameter estimator is


In [23]:
R12inv = inv(R12)


Out[23]:
2x2 UpperTriangular{Float64,Array{Float64,2}}:
 0.447214  -0.948683
  ⋅         0.316228

In [24]:
Σ₁ = unscaledΣ = PDMat(R12inv * R12inv')


Out[24]:
PDMats.PDMat{Float64,Array{Float64,2}}(2,2x2 Array{Float64,2}:
  1.1  -0.3
 -0.3   0.1,Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
2x2 UpperTriangular{Float64,Array{Float64,2}}:
 1.04881  -0.286039
  ⋅        0.13484 )

In [25]:
inv(PDMat(X'X))


Out[25]:
PDMats.PDMat{Float64,Array{Float64,2}}(2,2x2 Array{Float64,2}:
  1.1  -0.3
 -0.3   0.1,Base.LinAlg.Cholesky{Float64,Array{Float64,2}} with factor:
2x2 UpperTriangular{Float64,Array{Float64,2}}:
 1.04881  -0.286039
  ⋅        0.13484 )

We want to sample from a $\mathcal{N}(\beta, \sigma^2 \Sigma_1)$ distribution for the Gibbs update of $\beta$.

Sampling from a multivariate normal distribution

At this point we want to sample from If you trace back through methods for functions like rand, its mutating version rand! and its hidden, mutating version without dimension checks, _rand!, you will find that a call to

rand(MvNormal(β, abs2(σ) * Σ₁))

eventually calls

_rand!(d::MvNormal, x::VecOrMat{Float64}) = add!(unwhiten!(d.Σ, randn!(x)), d.μ)

where randn!(x) overwrites x with $\mathcal{N}(0, I)$ random variates.

This brings us to the question of what is "unwhitening"? Well, it is the inverse of "whitening" which is producing "white noise", i.e. uncorrelated, constant variance multivariate normal disturbances, from correlated, non-constant variance disturbances.

This is the multivariate equivalent of the inverse of the univariate "z transformation" $$ x = \mu + \sigma z $$

For the multivariate case we need to multiply a vector $\mathbf{z}$ by $\sigma$ and a "square root" of $\Sigma_1$, which is given by its Cholesky factor, $R_1$. That is $\mathrm{Var}(R_1\mathbf{z})$ is $R_1'\mathrm{Var}(\mathbf{zz}')R_1=R_1'R_1=\Sigma_1$.


In [30]:
σ = R[3, 3]/√3


Out[30]:
0.7302967433402218