Checking fixed-effects rank

A MixedModels.LinearMixedModel can be constructed without being fit.

For the purposes of examples, load the dataset stored with the MixedModels package.


In [1]:
using DataFrames, MixedModels, RData
const dat = convert(Dict{Symbol,DataFrame},
    load(Pkg.dir("MixedModels", "test", "dat.rda")))


WARNING: Method definition model_response(DataFrames.ModelFrame) in module DataFrames at /home/bates/.julia/v0.6/DataFrames/src/statsmodels/formula.jl:352 overwritten in module MixedModels at /home/bates/.julia/v0.6/MixedModels/src/pls.jl:58.
Out[1]:
Dict{Symbol,DataFrames.DataFrame} with 61 entries:
  :bs10          => 1104×6 DataFrames.DataFrame…
  :Genetics      => 60×5 DataFrames.DataFrame…
  :Contraception => 1934×6 DataFrames.DataFrame…
  :Mmmec         => 354×6 DataFrames.DataFrame…
  :kb07          => 1790×10 DataFrames.DataFrame…
  :Rail          => 18×2 DataFrames.DataFrame…
  :KKL           => 53765×24 DataFrames.DataFrame…
  :Bond          => 21×3 DataFrames.DataFrame…
  :VerbAgg       => 7584×9 DataFrames.DataFrame…
  :ergoStool     => 36×3 DataFrames.DataFrame…
  :s3bbx         => 2449×6 DataFrames.DataFrame…
  :cake          => 270×5 DataFrames.DataFrame…
  :Cultivation   => 24×4 DataFrames.DataFrame…
  :Pastes        => 60×4 DataFrames.DataFrame…
  :Exam          => 4059×5 DataFrames.DataFrame…
  :Socatt        => 1056×9 DataFrames.DataFrame…
  :WWheat        => 60×3 DataFrames.DataFrame…
  :Pixel         => 102×5 DataFrames.DataFrame…
  :Arabidopsis   => 625×8 DataFrames.DataFrame…
  :TeachingII    => 96×14 DataFrames.DataFrame…
  :AvgDailyGain  => 32×6 DataFrames.DataFrame…
  :InstEval      => 73421×7 DataFrames.DataFrame…
  :Poems         => 275996×7 DataFrames.DataFrame…
  :d3            => 130418×5 DataFrames.DataFrame…
  :Hsb82         => 7185×8 DataFrames.DataFrame…
  ⋮              => ⋮

A model for the sleepstudy data with random-effects for the intercept and for the days of sleep deprivation, U, by subject, G is declared as


In [2]:
m = lmm(@formula(Y ~ 1 + U + (1+U|G)), dat[:sleepstudy]);

The fields in m


In [3]:
fieldnames(m)


Out[3]:
6-element Array{Symbol,1}:
 :formula
 :trms   
 :sqrtwts
 :A      
 :L      
 :optsum 

include trms, a vector of AbstractTerms, and two blocked arrays, A and L. The last two terms are always MatrixTerms representing the fixed-effects model matrix, $\bf X$, and the response vector, $\bf y$. The terms preceding these two are always random-effects terms.


In [4]:
typeof.(m.trms)


Out[4]:
3-element Array{DataType,1}:
 MixedModels.VectorFactorReTerm{Float64,String,UInt8}
 MixedModels.MatrixTerm{Float64,Array{Float64,2}}    
 MixedModels.MatrixTerm{Float64,Array{Float64,2}}    

The A field is a blocked matrix


In [5]:
typeof(m.A)


Out[5]:
BlockArrays.BlockArray{Float64,2,AbstractArray{Float64,2}}

whose rows and columns of blocks correspond to the terms.


In [6]:
nblocks(m.A)


Out[6]:
(3, 3)

The second last diagonal block in A is $\bf X'X$.


In [7]:
m.A[Block(2,2)]


Out[7]:
2×2 Array{Float64,2}:
 180.0   810.0
 810.0  5130.0

This can be verified by noting


In [8]:
dat[:sleepstudy]


Out[8]:
YUG
1249.560.0308
2258.70471.0308
3250.80062.0308
4321.43983.0308
5356.85194.0308
6414.69015.0308
7382.20386.0308
8290.14867.0308
9430.58538.0308
10466.35359.0308
11222.73390.0309
12205.26581.0309
13202.97782.0309
14204.7073.0309
15207.71614.0309
16215.96185.0309
17213.63036.0309
18217.72727.0309
19224.29578.0309
20237.31429.0309
21199.05390.0310
22194.33221.0310
23234.322.0310
24232.84163.0310
25229.30744.0310
26220.45795.0310
27235.42086.0310
28255.75117.0310
29261.01258.0310
30247.51539.0310
&vellip&vellip&vellip&vellip

In [9]:
X = m.trms[end - 1].x   # the model matrix X


Out[9]:
180×2 Array{Float64,2}:
 1.0  0.0
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0
 1.0  6.0
 1.0  7.0
 1.0  8.0
 1.0  9.0
 1.0  0.0
 1.0  1.0
 1.0  2.0
 ⋮       
 1.0  8.0
 1.0  9.0
 1.0  0.0
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0
 1.0  6.0
 1.0  7.0
 1.0  8.0
 1.0  9.0

In [10]:
X'X


Out[10]:
2×2 Array{Float64,2}:
 180.0   810.0
 810.0  5130.0

A pivoted Cholesky decomposition of this matrix returns the computational rank, which is also the rank of $\bf X$.


In [11]:
Lpiv = cholfact(Symmetric(m.A[Block(2,2)], :L), Val{true})


Out[11]:
Base.LinAlg.CholeskyPivoted{Float64,Array{Float64,2}}([71.624 810.0; 11.3091 7.2184], 'L', [2, 1], 2, 0.0, 0)

In [12]:
rank(Lpiv)


Out[12]:
2

An optional argument, tol, in the call to cholfact specifies the tolerance for determining the rank. It defaults to zero which would correspond to exact rank deficiency. A more reasonable value may be $\sqrt{\epsilon}$ where $\epsilon$ is the relative machine precision.


In [13]:
eps()


Out[13]:
1.4901161193847656e-8

As explained in the documentation for dpstrf, this value should be negated to be used as a relative tolerance. Otherwise it is used as an absolute tolerance which usually is not what you want.


In [14]:
Lpiv = cholfact(Symmetric(m.A[Block(2,2)], :L), Val{true}, tol = -√eps())


Out[14]:
Base.LinAlg.CholeskyPivoted{Float64,Array{Float64,2}}([71.624 810.0; 11.3091 7.2184], 'L', [2, 1], 2, -1.4901161193847656e-8, 0)