Mixing it up - adventures with sparse matrices and statistical models

Douglas Bates, Department of Statistics, University of Wisconsin-Madison

Abstract

For more years than I care to admit, I have been developing algorithms and implementations to fit what statisticians call mixed-effects models, most recently in Julia. These models are most useful when applied to very large data sets, which makes creating an effective, flexible implementation difficult. Evaluation of the log-likelihood to be maximized by the parameter estimates requires updating large structures, preferably in place, and some intricate linear algebra. Efficiency is best achieved by taking into account the special structure of the matrices representing the model. The Julia implementation can now out-perform earlier implementations that I and others have created, in some cases by orders of magnitude. The Julia type system, multiple dispatch, and, most of all, the fact that one can achieve both flexibility and performance in a single language,have been fundamental to achieving this performance.

Mixed-effects models and penalized least squares

Mixed-effects models are similar to linear models (LMs) or generalized linear models (GLMs) in that fitting the model requires estimation of coefficients in a linear predictor expression. In a linear mixed model (LMM) or a generalized linear mixed model (GLMM) there are two types of coefficients, or effects; fixed-effects, which are associated with factors that have reproducible levels, such as male/female, and random-effects, which are associated with factors whose levels are a sample from a population, such as subject or item.

Random-effects can be viewed as nuisance parameters in that they account for sources of variability in the observations but are often not by themselves of interest. Frequently the number of random effects grows with the number of observations (you observe a few observations on each of many subjects). Because their estimates will be ill-defined they are regularized. The model-fitting involves selecting the regularization parameters to maximize a log-likelihood.

Linear predictors and model matrices

In an LMM the $n$-dimensional response, $\bf y$, is considered to be the realization of random variable $\mathcal{Y}$ with conditional distribution $$ {\mathcal Y}|({\mathcal B} = \bf b) \sim {\mathcal N}({\bf X\beta + Z b},\sigma^2\bf I_n) . $$ The linear predictor, $\bf X\beta+Z b$, is based on two, known model matrices; the dense matrix $\bf X$ of size $n\times p$, and sparse matrix $\bf Z$ of size $n\times q$.

For example, we define a model for the InstEval data set from the lme4 package



In [1]:

    
using HDF5,ReTerms,StatsBase
inst = h5open("/var/tmp/dat.h5","r") do io g2dict(io,"inst") end;
dump(inst)









    



Dict{Symbol,Any} len 5
  R: DataArrays.PooledDataArray{Int32,UInt8,1}(73421) Int32[1,2,1,2]
  S: DataArrays.PooledDataArray{Int32,UInt16,1}(73421) Int32[1,1,1,1]
  P: DataArrays.PooledDataArray{Int32,UInt8,1}(73421) Int32[14,5,14,12]
  D: DataArrays.PooledDataArray{Int32,UInt16,1}(73421) Int32[525,560,832,1068]
  Y: Array(Float32,(73421,)) Float32[5.0,2.0,5.0,3.0,2.0,4.0,4.0,5.0,5.0,4.0  …  5.0,5.0,5.0,3.0,5.0,3.0,4.0,5.0,1.0,3.0]

where Y, the response, is the evaluation by student, S, of instructor, D, in a course from department, P. The service factor, R, indicates if the course was a service course or not. We create random effects terms for student, instructor and department.



In [2]:

    
revec = [reterm(inst[s]) for s in [:S,:D,:P]];
for i in eachindex(revec) println(size(revec[i])) end









    



(73421,2972)
(73421,1128)
(73421,14)

The three sparse matrices derived from these terms are, in this case, the indicator columns for the 2972 students, for the 1128 instructors, and for the 14 departments. The matrix $\bf Z$, of size $73421\times 4114$ is the horizontal concatenation of these three. It is (comparatively) large but very sparse -- each row has only three nonzeros.

The fixed effects in the model will be the intercept term and an indicator of a service course. The intercept corresponds to a typical evaluation in a non-service course, after adjusting for the student, instructor and department. The coefficient for R is the change in typical evaluation for service compared to non-service courses.



In [3]:

    
X = hcat(ones(length(inst[:R])),convert(Vector{Float64},inst[:R] .== 2));
size(X)









    Out[3]:





(73421,2)

Distribution of the random effects

The unconditional distribution of the random effects is also multivariate normal $$ {\mathcal B}\sim\mathcal{N}(\bf 0,\Sigma_{\theta}) $$ The positive-definite covariance matrix, $\Sigma$, depends on a small parameter vector $\theta$. For scalar random -effects such as these (a scalar effect for each student, each instuctor and each department), the elements of $\mathcal B$ are independent within and between groups. That is, $\Sigma$ is diagonal. Furthermore the variances of the effects are constant within group. For technical reasons we write $$ \Sigma_{\theta} = \sigma^2\Lambda_{\theta}\Lambda_{\theta}' $$ where $\sigma^2$ is the same scale parameter as in the conditional distribution of $\mathcal Y$. $\Lambda_{\theta}$ is the lower Cholesky factor of the relative covariance matrix, $\Sigma/\sigma^2$.

Henderson's mixed-model equations

Since the 1970's it has been recognized that, given a value of the covariance parameters, $\theta$, the conditional estimate of the fixed-effects, $\widehat{\bf\beta}$, and the "BLUPS" of the random effects, $\tilde{\bf b}$ are the solution to "Henderson's mixed-model equations" $$ \begin{bmatrix} \bf X'X & \bf X'Z \\ \bf Z'X & \bf Z'Z+\Sigma_{\theta}^{-1} \end{bmatrix} \begin{bmatrix} \widehat{\beta}\ \tilde{\bf b}

\end{bmatrix}

\begin{bmatrix} \bf X'y\\ \bf Z'y \end{bmatrix}$$ However, this solution by itself does not allow for direct evaluation of the log-likelihood. In [Fitting Linear Mixed-effects Models using lme4](http://arxiv.org/abs/1406.5823) we show that it is convenient to rearranging these equations as $$\begin{bmatrix} \bf \Lambda_{\theta}'Z'Z\Lambda_{\theta} + I & \bf \Lambda_{\theta}'Z'X \\ \bf X'Z\Lambda_\theta & \bf X'X \end{bmatrix}\begin{bmatrix} \tilde{\bf u}\\ \widehat{\beta} \end{bmatrix}

= \begin{bmatrix} \bf \Lambda_\theta'Z'y\\ \bf X'y \end{bmatrix} $$ corresponding to the _pseudo data_ representation of the penalized least squares problem $$ \rho^2(\theta)= \min{\beta,\bf u} \left| \begin{bmatrix} \bf y\\ \bf 0 \end{bmatrix} - \begin{bmatrix} \bf Z\Lambda\theta & \bf X \ \bf I_q & \bf 0 \end{bmatrix} \begin{bmatrix} \bf u\\ \beta \end{bmatrix} \right|^2 . $$

In this formulation it is natural to form the Cholesky factor of the system matrix for which the upper left block is $$ \bf L_\theta L_\theta' = \bf \Lambda_{\theta}'Z'Z\Lambda_{\theta} + I $$ providing straightforward evaluation of $$ \log\left(\left|\bf \Lambda_{\theta}'Z'Z\Lambda_{\theta} + I\right|\right) = 2\log\left(\left|\bf L_\theta\right|\right). $$

The profiled log-likelihood can be written as a function of $\theta$ only $$ -2\ell(\theta|{\bf y}) = -2\log(|\bf L_\theta|)+n\left(1+\frac{2\pi\rho^2(\theta)}n\right) . $$



In [4]:

    
m1 = LMM(X,revec,inst[:Y]);
fieldnames(m1)









    Out[4]:





7-element Array{Symbol,1}:
 :trms 
 :A    
 :L    
 :lower
 :pars 
 :gp   
 :fit



In [5]:

    
size(m1.A)









    Out[5]:





(4,4)



In [6]:

    
A = m1.A;
for j in 1:4, i in j:4 println(i,", ",j," ",size(A[i,j])," ",typeof(A[i,j])) end









    



1, 1 (2972,2972) Base.LinAlg.Diagonal{Float64}
2, 1 (1128,2972) Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
3, 1 (14,2972) Array{Float64,2}
4, 1 (3,2972) Array{Float64,2}
2, 2 (1128,1128) Base.LinAlg.Diagonal{Float64}
3, 2 (14,1128) Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
4, 2 (3,1128) Array{Float64,2}
3, 3 (14,14) Base.LinAlg.Diagonal{Float64}
4, 3 (3,14) Array{Float64,2}
4, 4 (3,3) Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}



In [7]:

    
L = m1.L
for j in 1:4, i in j:4 println(i,", ",j," ",size(L[i,j])," ",typeof(L[i,j])) end









    



1, 1 (2972,2972) Base.LinAlg.Diagonal{Float64}
2, 1 (1128,2972) Base.SparseMatrix.SparseMatrixCSC{Float64,Int32}
3, 1 (14,2972) Array{Float64,2}
4, 1 (3,2972) Array{Float64,2}
2, 2 (1128,1128) Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}
3, 2 (14,1128) Array{Float64,2}
4, 2 (3,1128) Array{Float64,2}
3, 3 (14,14) Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}
4, 3 (3,14) Array{Float64,2}
4, 4 (3,3) Base.LinAlg.LowerTriangular{Float64,Array{Float64,2}}



In [8]:

    
fit(m1,true);









    



f_1: 242074.39366, [1.0,1.0,1.0]
f_2: 245004.25385, [1.75,1.0,1.0]
f_3: 243120.91228, [1.0,1.75,1.0]
f_4: 242089.80485, [1.0,1.0,1.75]
f_5: 238618.67037, [0.25,1.0,1.0]
f_6: 241923.49173, [1.0,0.25,1.0]
f_7: 242040.48484, [1.0,1.0,0.25]
f_8: 240279.6813, [0.0,0.526088,0.475208]
f_9: 241303.18604, [0.0,1.11671,1.25398]
f_10: 238515.96151, [0.325,0.925,1.0]
f_11: 238727.58367, [0.356272,1.00427,0.832985]
f_12: 238751.90605, [0.372493,0.983047,1.0]
f_13: 238373.18283, [0.275129,0.871663,0.98288]
f_14: 238935.0279, [0.153782,0.847714,0.89802]
f_15: 238378.29187, [0.297217,0.868986,1.0545]
f_16: 238341.75427, [0.327602,0.820679,0.99938]
f_17: 238317.43087, [0.346982,0.776926,1.05713]
f_18: 238311.78636, [0.372166,0.724728,1.10473]
f_19: 238322.1477, [0.393995,0.679606,1.13311]
f_20: 238326.75849, [0.369695,0.738777,1.10716]
f_21: 238366.17778, [0.385395,0.727565,1.09961]
f_22: 238256.50803, [0.358984,0.718761,1.10482]
f_23: 238160.3682, [0.334948,0.702827,1.10723]
f_24: 238028.96884, [0.311516,0.651638,1.12066]
f_25: 237859.86437, [0.288225,0.548758,1.16833]
f_26: 237932.06368, [0.306849,0.334577,1.25422]
f_27: 237894.4426, [0.274781,0.578654,1.1413]
f_28: 237914.77722, [0.224122,0.513054,1.25786]
f_29: 237861.33364, [0.307408,0.533004,1.1451]
f_30: 237802.46158, [0.278546,0.492182,1.17577]
f_31: 237796.34184, [0.257003,0.41679,1.18247]
f_32: 237784.0691, [0.267903,0.443524,1.19029]
f_33: 237784.78458, [0.268563,0.449577,1.19567]
f_34: 237781.66033, [0.27528,0.44217,1.19038]
f_35: 237781.95107, [0.280279,0.444127,1.18514]
f_36: 237781.49784, [0.277115,0.435916,1.19409]
f_37: 237781.52376, [0.279139,0.435523,1.1885]
f_38: 237782.00084, [0.273605,0.441025,1.19831]
f_39: 237781.66044, [0.27912,0.440637,1.19223]
f_40: 237781.40799, [0.276961,0.437101,1.19102]
f_41: 237781.41162, [0.276458,0.436589,1.19081]
f_42: 237781.40065, [0.277217,0.437642,1.19057]
f_43: 237781.38182, [0.277066,0.437729,1.18984]
f_44: 237781.36736, [0.27731,0.438015,1.18894]
f_45: 237781.36162, [0.277453,0.438138,1.18846]
f_46: 237781.35558, [0.277651,0.438336,1.18776]
f_47: 237781.36449, [0.278165,0.43881,1.18649]
f_48: 237781.32844, [0.277441,0.438006,1.18712]
f_49: 237781.30808, [0.277191,0.437475,1.18665]
f_50: 237781.2839, [0.277092,0.436566,1.18546]
f_51: 237781.25794, [0.277561,0.435439,1.18272]
f_52: 237781.24856, [0.277598,0.435016,1.18161]
f_53: 237781.17995, [0.278104,0.436022,1.17883]
f_54: 237781.21626, [0.279663,0.436199,1.17304]
f_55: 237781.13524, [0.276761,0.435722,1.17847]
f_56: 237781.0606, [0.276824,0.435864,1.17548]
f_57: 237780.96869, [0.276637,0.43458,1.16962]
f_58: 237780.8205, [0.27637,0.432552,1.15779]
f_59: 237780.73082, [0.276699,0.429759,1.14205]
f_60: 237780.74167, [0.276833,0.429067,1.13861]
f_61: 237780.45209, [0.278225,0.432876,1.14149]
f_62: 237780.21501, [0.27736,0.435052,1.13887]
f_63: 237780.01917, [0.277301,0.43575,1.13187]
f_64: 237779.68775, [0.277043,0.435128,1.11783]
f_65: 237779.00303, [0.277414,0.434775,1.08971]
f_66: 237777.95647, [0.277088,0.429837,1.03368]
f_67: 237774.7206, [0.280724,0.436147,0.92142]
f_68: 237767.72345, [0.282273,0.431359,0.696486]
f_69: 237752.06727, [0.256391,0.438495,0.247308]
f_70: 237731.81328, [0.272322,0.43233,0.0]
f_71: 237737.45944, [0.266587,0.419807,0.0]
f_72: 237738.49473, [0.259799,0.438066,0.0]
f_73: 237731.24815, [0.28077,0.442667,0.0]
f_74: 237730.92952, [0.278141,0.437668,0.0]
f_75: 237730.69429, [0.276813,0.439609,0.0]
f_76: 237730.64802, [0.27631,0.440377,0.0]
f_77: 237730.64363, [0.27631,0.44037,0.000749968]
f_78: 237730.62451, [0.27622,0.440511,0.00148108]
f_79: 237730.56824, [0.276177,0.440552,0.00297991]
f_80: 237730.35361, [0.276172,0.440515,0.00597967]
f_81: 237729.54552, [0.276219,0.44057,0.0119792]
f_82: 237727.05053, [0.275585,0.440576,0.0239624]
f_83: 237722.79034, [0.275974,0.440931,0.0479567]
f_84: 237733.41586, [0.294743,0.452696,0.0905394]
f_85: 237730.00884, [0.260057,0.433004,0.0571825]
f_86: 237721.93715, [0.277479,0.433823,0.0708307]
f_87: 237721.79306, [0.276925,0.436962,0.069505]
f_88: 237721.7864, [0.276993,0.436741,0.0684607]
f_89: 237721.7833, [0.27703,0.436758,0.067788]
f_90: 237721.77605, [0.276845,0.436772,0.0670612]
f_91: 237721.77974, [0.276377,0.436215,0.0668785]
f_92: 237721.77151, [0.276227,0.437136,0.0672808]
f_93: 237721.76884, [0.276471,0.43729,0.0665886]
f_94: 237721.78069, [0.275837,0.437124,0.0662249]
f_95: 237721.7689, [0.276461,0.437235,0.0666621]
f_96: 237721.76901, [0.276508,0.437232,0.0665573]
f_97: 237721.76884, [0.276462,0.437325,0.0665416]
f_98: 237721.76895, [0.276528,0.437361,0.066546]
f_99: 237721.76881, [0.276467,0.437321,0.0665981]
f_100: 237721.76882, [0.276473,0.437393,0.0665798]
f_101: 237721.76879, [0.27647,0.437353,0.0666314]
f_102: 237721.76891, [0.276531,0.437375,0.0666702]
f_103: 237721.76878, [0.276468,0.43735,0.0666373]
f_104: 237721.76878, [0.276465,0.437353,0.0666554]
f_105: 237721.76877, [0.276464,0.437355,0.0666929]
f_106: 237721.76877, [0.276464,0.437354,0.0666968]
FTOL_REACHED



In [9]:

    
gc(); @time fit(LMM(X,revec,inst[:Y]));









    



LoadError: syntax: local declaration in global scope
while loading In[9], in expression starting on line 1