Mixing it up - why Julia is amazing

Douglas Bates, U. of Wisconsin - Madison

Background

I'm an early user of S and a core developer of R
The "mixed-effects models" R packages I helped create, nlme and lme4, required a lot of C/C++ code for performance.
- lme4 uses R, Rcpp, and Eigen
- much effort to avoid making copies of large objects, not always successfully
- examples can take many hours/days to fit, often causing swap thrashing.

Basic structure of problem, without the "why"

observed response vector, $\bf y$, of length $n$.
two (known) model matrices
- $\bf X$ (size $n\times p$) long, skinny and dense
- $\bf Z$ (size $n\times q$) long, wide and very sparse
probability model $$ \begin{aligned} \mathcal{B} & \sim \mathcal{N}\left({\bf 0},\Sigma\right)\\ \mathcal{Y}|\left(\mathcal{B}=\bf b\right) & \sim \mathcal{N}\left({\bf X\beta+Zb}, \sigma^2\bf I_n\right) \end{aligned} $$
it is convenient to write $\bf b=\Lambda_\theta u$, where $\Sigma=\sigma^2\bf\Lambda_\theta\Lambda_\theta'$, and $\mathcal{U}\sim\mathcal{N}({\bf 0},\sigma^2\bf I_q)$

The role of terms

Both $\bf X$ and $\bf Z$ are divided into blocks of columns associated with "terms" in the model.
For $\bf Z$ the columns within a term are indicators or derived from indicators
The terms for $\bf Z$ also determine the structure of $\bf\Lambda_\theta$.
- $\bf\Lambda_\theta$ is block diagonal according to blocks of $\bf Z$.
- For "scalar random effects" each block is a multiple of the identity.
- For "vector-valued random effects" each block is itself block diagonal. The smaller blocks are repetitions of a small lower triangular matrix.

A small example

The InstEval data in lme4 contain evaluation scores, Y, by student, S, of instructor, D, in department, P. R indicates a service course.



In [1]:

    
using HDF5,ReTerms,StatsBase
inst = h5open("/var/tmp/dat.h5","r") do io g2dict(io,"inst") end;
dump(inst)









    



Dict{Symbol,Any} len 5
  R: DataArrays.PooledDataArray{Int32,UInt8,1}(73421) Int32[1,2,1,2]
  S: DataArrays.PooledDataArray{Int32,UInt16,1}(73421) Int32[1,1,1,1]
  P: DataArrays.PooledDataArray{Int32,UInt8,1}(73421) Int32[14,5,14,12]
  D: DataArrays.PooledDataArray{Int32,UInt16,1}(73421) Int32[525,560,832,1068]
  Y: Array(Float32,(73421,)) Float32



In [2]:

    
X = hcat(ones(length(inst[:Y])),inst[:R] .- 1);
size(X)









    Out[2]:





(73421,2)



In [3]:

    
m1 = LMM(X,[reterm(inst[s]) for s in [:S,:D,:P]],inst[:Y]);
for t in m1.trms println(size(t)) end









    



(73421,2972)
(73421,1128)
(73421,14)
(73421,3)

The log-likelihood function

The profiled log-likelihood, $$ -2\ell(\bf\theta|y)=\log\left|\bf\Lambda_\theta'Z'Z\Lambda_\theta+I\right|+ n\left(1+\log\left(\frac{2\pi\rho^2(\theta)}n\right)\right), $$ where $$ \rho^2(\bf\theta)= \min_{\bf\beta,u} \left\| \begin{bmatrix} \bf y\\ \bf 0 \end{bmatrix} - \begin{bmatrix} \bf Z\Lambda_\theta & \bf X \\ \bf I_q & \bf 0 \end{bmatrix} \begin{bmatrix} \bf u\\ \beta \end{bmatrix} \right\|^2, $$ can be evaluated from the Cholesky factor of $$ \begin{bmatrix} \bf\Lambda_\theta'Z'Z\Lambda_\theta+I & \bf\Lambda_\theta'Z'X & \bf\Lambda_\theta'Z'y\\ \bf X'Z\Lambda_\theta & \bf X'X & \bf X'y\\ \bf y'Z\Lambda_\theta & \bf y'X & \bf y'y \end{bmatrix} $$

In practice

concatenate $\bf X$ and $\bf y$ (hcat)
use block structure of $\bf Z$ and $\bf\Lambda_\theta$
precompute and save $\bf Z'Z$ (in blocks), $\bf[Xy]'Z$ (also in blocks) and $\bf[Xy]'[Xy]$.
for each value of $\bf\theta$
- copy stored values to $\bf L$ (lower triangle only)
- scale columns and rows by $\bf\Lambda_\theta$
- inflate diagonal of $\bf\Lambda_\theta'Z'Z\Lambda_\theta$
- Cholesky



In [4]:

    
size(m1.A)   # saved products (lower triangle)









    Out[4]:





(4,4)



In [5]:

    
for j in 1:4, i in j:4 
    println(i,",",j,": ",size(m1.A[i,j])," ",typeof(m1.A[i,j])) 
end









    



1,1: (2972,2972) Base.LinAlg.Diagonal{Float64}
2,1: (1128,2972) Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
3,1: (14,2972) Array{Float64,2}
4,1: (3,2972) Array{Float64,2}
2,2: (1128,1128) Base.LinAlg.Diagonal{Float64}
3,2: (14,1128) Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
4,2: (3,1128) Array{Float64,2}
3,3: (14,14) Base.LinAlg.Diagonal{Float64}
4,3: (3,14) Array{Float64,2}
4,4: (3,3) Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}



In [6]:

    
for j in 1:4, i in j:4 println(i,",",j,": ",typeof(m1.L[i,j])) end









    



1,1: Base.LinAlg.Diagonal{Float64}
2,1: Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
3,1: Array{Float64,2}
4,1: Array{Float64,2}
2,2: Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}
3,2: Array{Float64,2}
4,2: Array{Float64,2}
3,3: Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}
4,3: Array{Float64,2}
4,4: Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}



In [8]:

    
@doc "In-place lower Cholesky factor by blocks"->
function mycfactor!(A::AbstractMatrix)
    n = Base.LinAlg.chksquare(A)
    @inbounds begin
        for k = 1:n
            for j in 1:(k - 1)
                downdate!(A[k,k],A[k,j])  # A[k,k] -= A[k,j]*A[k,j]'
            end
            cfactor!(A[k,k])   # (lower) Cholesky factor of A[k,k]
            for i in (k + 1):n
                for j in 1:(k - 1)
                    downdate!(A[i,k],A[i,j],A[k,j]) # A[i,k] -= A[i,j]*A[k,j]
                end
                Base.LinAlg.A_rdiv_Bc!(A[i,k],A[k,k])
            end
        end
    end
    return LowerTriangular(A)
end









    Out[8]:





mycfactor! (generic function with 1 method)



In [9]:

    
@doc "Negative twice the log-likelihood"->
function myobjective(lmm::LMM)
    n = size(lmm.trms[1],1)
    logdet(lmm) + n*(1.+log(2π*abs2(lmm.L[end,end][end,end])/n))
end









    Out[9]:





myobjective (generic function with 1 method)



In [11]:

    
@doc "Log-determinant of Λ'Z'ZΛ + I"->
mylogdet(lmm::LMM) = 2.*mapreduce(logdet,(+),diag(lmm.L)[1:end-1])









    Out[11]:





mylogdet (generic function with 1 method)

Why Julia?

One Language (to rule them all). Flexibility and performance under one roof.
Rich, extensible type system.
- a matrix of matrices is no big deal to create
Multiple dispatch
- handle all the combinations, permutations of sparse, dense, diagonal, triangular in 3-argument form of downdate! by writing methods.
A friendly community of very talented people.