In [1]:
using DataFrames, Feather, MixedModels, RCall


INFO: Recompiling stale cache file /home/bates/.julia/lib/v0.5/DataStreams.ji for module DataStreams.
INFO: Recompiling stale cache file /home/bates/.julia/lib/v0.5/RCall.ji for module RCall.
INFO: Using R installation at /opt/anaconda3/lib/R

In [3]:
R"""
load("kb07.rda")
"""
@rget kb07;

In [4]:
@time mm1 = fit!(lmm(RTtrunc ~ 1+P+S+C+SP+SC+PC+SPC+
        (1+S+P+C+SP+SC+PC+SPC|subj)+
        (1+S+P+C+SP+SC+PC+SPC|item), kb07))


 53.580779 seconds (11.59 M allocations: 448.169 MB, 0.69% gc time)
Out[4]:
Linear mixed model fit by maximum likelihood
 Formula: RTtrunc ~ 1 + P + S + C + SP + SC + PC + SPC + ((1 + S + P + C + SP + SC + PC + SPC) | subj) + ((1 + S + P + C + SP + SC + PC + SPC) | item)
  logLik    -2 logLik     AIC        BIC    
-1.42931726×10⁴2.85863452×10⁴2.87483452×10⁴2.91930328×10⁴

Variance components:
              Column     Variance   Std.Dev.    Corr.
 subj     (Intercept)   90612.3205 301.018804
          S              5180.4346  71.975236 -0.44
          P              5530.2096  74.365379 -0.47  0.08
          C              7580.6305  87.066817  0.21 -0.20  0.41
          SP             8838.5383  94.013501  0.19 -0.76 -0.54 -0.20
          SC             1813.9315  42.590275  0.47 -0.53 -0.10 -0.44  0.28
          PC             7393.0576  85.982892 -0.10  0.13 -0.05 -0.86 -0.06  0.70
          SPC            3808.3909  61.712162 -0.48  0.41 -0.39 -0.09  0.18 -0.78 -0.39
 item     S            129609.6341 360.013380
          S              1842.1843  42.920674 -0.35
          P             62348.4893 249.696795 -0.68 -0.45
          C              2920.9622  54.045927  0.20 -0.05 -0.18
          SP             1054.5444  32.473750  0.57 -0.78  0.02  0.02
          SC             1663.1372  40.781579  0.27  0.05 -0.26  0.45 -0.21
          PC             4721.5379  68.713448  0.08 -0.24  0.21 -0.13 -0.26  0.02
          SPC            4813.1935  69.377183  0.04 -0.50  0.32 -0.69  0.65 -0.67 -0.10
 Residual P            399646.8438 632.176276
 Number of obs: 1790; levels of grouping factors: 56, 32

  Fixed-effects parameters:
             Estimate Std.Error  z value P(>|z|)
(Intercept)   2180.63   76.7573  28.4094  <1e-99
P            -333.878   47.6492 -7.00702  <1e-11
S            -66.9927   19.3229 -3.46701  0.0005
C             78.9886   21.2119  3.72378  0.0002
SP            22.1534   20.3491  1.08867  0.2763
SC           -18.9216   17.5403 -1.07875  0.2807
PC            5.25918   22.4247 0.234527  0.8146
SPC          -23.9526   21.0171 -1.13967  0.2544
  

In [5]:
mm1.optsum


Out[5]:
Initial parameter vector: [1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0  …  1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0]
Initial objective value:  30014.36976860628

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [0.0,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,-Inf,0.0,-Inf  …  0.0,-Inf,-Inf,-Inf,0.0,-Inf,-Inf,0.0,-Inf,0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10  …  1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10,1.0e-10]

Function evaluations:     1742
Final parameter vector:   [0.476163,-0.0495364,-0.055594,0.0292095,0.0289941,0.0319883,-0.0136437,-0.0464125,0.102512,-0.0167764  …  0.000173176,-0.0377045,0.0362933,0.0137344,0.029079,-0.0245135,-0.0120874,4.23458e-5,0.000132778,1.15826e-6]
Final objective value:    28586.345191687305
Return code:              FTOL_REACHED

In [6]:
mm1.Λ[1]


Out[6]:
8×8 LowerTriangular{Float64,Array{Float64,2}}:
  0.476163     ⋅           ⋅            ⋅          …   ⋅           ⋅         
 -0.0495364   0.102512     ⋅            ⋅              ⋅           ⋅         
 -0.055594   -0.0167764   0.102302      ⋅              ⋅           ⋅         
  0.0292095  -0.0160742   0.0774595    0.108889        ⋅           ⋅         
  0.0289941  -0.11122    -0.094225     0.00518803      ⋅           ⋅         
  0.0319883  -0.024384    0.00557116  -0.053748    …   ⋅           ⋅         
 -0.0136437   0.0124569  -0.0128618   -0.134133       0.0          ⋅         
 -0.0464125   0.0218546  -0.0653738    0.0512141      7.48771e-7  0.000132175

In [7]:
mm1.Λ[2]


Out[7]:
8×8 LowerTriangular{Float64,Array{Float64,2}}:
  0.569483      ⋅            ⋅         …    ⋅          ⋅            ⋅        
 -0.0234283    0.0637232     ⋅              ⋅          ⋅            ⋅        
 -0.266682    -0.288296     0.0421306       ⋅          ⋅            ⋅        
  0.0174978    0.00218032  -0.0211528       ⋅          ⋅            ⋅        
  0.0291642   -0.0320205   -0.025332        ⋅          ⋅            ⋅        
  0.017738     0.00989025   0.0212628  …   0.029079    ⋅            ⋅        
  0.00863582  -0.0246445    0.0955937     -0.0245135  4.23458e-5    ⋅        
  0.00489853  -0.0568596   -0.0284809     -0.0120874  0.000132778  1.15826e-6

In [8]:
R"""
baboon <- within(readRDS("finaldat.tbt.rds"), {
    stimulus <- factor(stimulus)
    lex <- factor(lex)
    Outcomes <- factor(Outcomes)
    id <- factor(id)
    dec.FINAL <- factor(dec.FINAL)
    dec.GAM2 <- factor(dec.GAM2)
})
str(baboon)
"""


'data.frame':	316871 obs. of  44 variables:
 $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ trial        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ subject      : Factor w/ 6 levels "ARI","ART","CAU",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ stimulus     : Factor w/ 8139 levels "ABBS","ABBT",..: 6929 1613 7705 7220 339 339 7244 6929 5483 92 ...
 $ lex          : Factor w/ 2 levels "NW","W": 2 1 2 1 2 2 1 2 1 1 ...
 $ acc          : int  0 1 0 1 0 0 1 0 1 1 ...
 $ Response     : Factor w/ 2 levels "NW","W": 1 1 1 1 1 1 1 1 1 1 ...
 $ Outcomes     : Factor w/ 2 levels "NW","W": 2 1 2 1 2 2 1 2 1 1 ...
 $ Block        : num  1 1 1 1 1 1 1 1 1 1 ...
 $ dec.HOG      : Factor w/ 2 levels "NW","W": 2 1 1 1 1 2 2 1 1 2 ...
 $ dec.HOG.0    : logi  FALSE TRUE FALSE TRUE FALSE TRUE ...
 $ actN         : num  0 0 0.279 0.147 0.308 ...
 $ actW         : num  0 0.144 0.112 0.278 0.266 ...
 $ priorN       : num  0 0 0.59 0.596 1.094 ...
 $ priorW       : num  0 0.507 0.544 0.979 1.025 ...
 $ first        : logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
 $ agr          : num  0 1 1 1 1 0 0 1 1 0 ...
 $ acc.model    : num  1 1 0 1 0 1 0 0 1 0 ...
 $ diff         : num  0 -0.1438 -0.1666 -0.1312 -0.0422 ...
 $ diff.abs     : num  0 0.1438 0.1666 0.1312 0.0422 ...
 $ id           : Factor w/ 308 levels "ACME","AHEM",..: 266 172 279 172 9 9 172 266 172 172 ...
 $ first.app    : Factor w/ 52 levels "1","3","4","5",..: 1 NA 1 NA 1 1 NA 1 NA NA ...
 $ dist         : Factor w/ 62 levels "0","1","2","3",..: 1 NA 1 NA 1 1 NA 1 NA NA ...
 $ first.app.num: num  1 1 1 1 1 1 1 1 1 1 ...
 $ first.trial  : num  1 1 1 1 1 1 1 1 1 1 ...
 $ priorsWDS    : num  0 0 0 0 0 ...
 $ actsWDS      : num  0 0 0 0 0 ...
 $ dec.FINAL    : Factor w/ 2 levels "NW","W": 1 1 1 1 1 1 2 1 1 2 ...
 $ unique       : num  0 0 2 2 3 3 3 3 3 3 ...
 $ old20        : num  0 0 NA NA NA NA NA NA NA NA ...
 $ Correct      : logi  FALSE TRUE FALSE TRUE FALSE FALSE ...
 $ acc.f        : logi  FALSE TRUE FALSE TRUE FALSE FALSE ...
 $ first.tri    : Factor w/ 1991 levels "ABB","ABL","ACH",..: 1693 390 1873 1747 69 69 1750 1693 1354 16 ...
 $ last.tri     : Factor w/ 1539 levels "ABS","ABT","ACE",..: 942 310 270 597 40 40 633 942 1232 365 ...
 $ first.bi     : Factor w/ 184 levels "AB","AC","AD",..: 159 44 175 165 17 17 165 159 132 6 ...
 $ mid.bi       : Factor w/ 205 levels "AB","AC","AD",..: 136 56 48 94 11 11 97 136 167 63 ...
 $ last.bi      : Factor w/ 209 levels "AB","AD","AF",..: 152 178 116 86 118 118 125 152 67 107 ...
 $ id.f         : Factor w/ 308 levels "ACME","AHEM",..: 266 172 279 172 9 9 172 266 172 172 ...
 $ dec.GAM      : logi  NA NA NA NA NA NA ...
 $ GAM          : num  0.206 0.139 0.115 0.139 0.113 ...
 $ acc.gam      : num  1 1 0 0 0 0 1 1 1 1 ...
 $ GAM2         : num  0.2599 0.1071 0.0872 0.1071 0.087 ...
 $ dec.GAM2     : Factor w/ 2 levels "NW","W": 1 1 1 1 1 1 2 1 1 1 ...
 $ acc.gam2     : num  0 1 0 1 0 0 0 0 1 1 ...
Out[8]:
RCall.RObject{RCall.NilSxp}
NULL

In [9]:
@rget(baboon);

In [10]:
for (n,v) in eachcol(baboon)
    println(rpad(n, 20), typeof(v))
end


ID                  Array{Int32,1}
trial               Array{Int32,1}
subject             CategoricalArrays.CategoricalArray{String,1,UInt32}
stimulus            CategoricalArrays.CategoricalArray{String,1,UInt32}
lex                 CategoricalArrays.CategoricalArray{String,1,UInt32}
acc                 Array{Int32,1}
Response            CategoricalArrays.CategoricalArray{String,1,UInt32}
Outcomes            CategoricalArrays.CategoricalArray{String,1,UInt32}
Block               Array{Float64,1}
dec.HOG             CategoricalArrays.CategoricalArray{String,1,UInt32}
dec.HOG.0           BitArray{1}
actN                Array{Float64,1}
actW                Array{Float64,1}
priorN              Array{Float64,1}
priorW              Array{Float64,1}
first               BitArray{1}
agr                 Array{Float64,1}
acc.model           Array{Float64,1}
diff                Array{Float64,1}
diff.abs            Array{Float64,1}
id                  CategoricalArrays.CategoricalArray{String,1,UInt32}
first.app           CategoricalArrays.NullableCategoricalArray{String,1,UInt32}
dist                CategoricalArrays.NullableCategoricalArray{String,1,UInt32}
first.app.num       NullableArrays.NullableArray{Float64,1}
first.trial         Array{Float64,1}
priorsWDS           Array{Float64,1}
actsWDS             Array{Float64,1}
dec.FINAL           CategoricalArrays.CategoricalArray{String,1,UInt32}
unique              Array{Float64,1}
old20               NullableArrays.NullableArray{Float64,1}
Correct             BitArray{1}
acc.f               BitArray{1}
first.tri           CategoricalArrays.CategoricalArray{String,1,UInt32}
last.tri            CategoricalArrays.CategoricalArray{String,1,UInt32}
first.bi            CategoricalArrays.CategoricalArray{String,1,UInt32}
mid.bi              CategoricalArrays.CategoricalArray{String,1,UInt32}
last.bi             CategoricalArrays.CategoricalArray{String,1,UInt32}
id.f                CategoricalArrays.CategoricalArray{String,1,UInt32}
dec.GAM             NullableArrays.NullableArray{Bool,1}
GAM                 Array{Float64,1}
acc.gam             Array{Float64,1}
GAM2                Array{Float64,1}
dec.GAM2            CategoricalArrays.CategoricalArray{String,1,UInt32}
acc.gam2            Array{Float64,1}

In Julia column names with embedded . characters must be quoted in an awkward way. It is better to replace the . characters with _.


In [11]:
nms_ = [Symbol(replace(string(nm), '.', '_')) for nm in names(baboon)]
names!(baboon, nms_)
baboon[:dec_HOG_0] = convert(Vector{Int}, baboon[:dec_HOG_0])
for (n,v) in eachcol(baboon)
    println(rpad(n, 20), typeof(v))
end


ID                  Array{Int32,1}
trial               Array{Int32,1}
subject             CategoricalArrays.CategoricalArray{String,1,UInt32}
stimulus            CategoricalArrays.CategoricalArray{String,1,UInt32}
lex                 CategoricalArrays.CategoricalArray{String,1,UInt32}
acc                 Array{Int32,1}
Response            CategoricalArrays.CategoricalArray{String,1,UInt32}
Outcomes            CategoricalArrays.CategoricalArray{String,1,UInt32}
Block               Array{Float64,1}
dec_HOG             CategoricalArrays.CategoricalArray{String,1,UInt32}
dec_HOG_0           DataArrays.DataArray{Int64,1}
actN                Array{Float64,1}
actW                Array{Float64,1}
priorN              Array{Float64,1}
priorW              Array{Float64,1}
first               BitArray{1}
agr                 Array{Float64,1}
acc_model           Array{Float64,1}
diff                Array{Float64,1}
diff_abs            Array{Float64,1}
id                  CategoricalArrays.CategoricalArray{String,1,UInt32}
first_app           CategoricalArrays.NullableCategoricalArray{String,1,UInt32}
dist                CategoricalArrays.NullableCategoricalArray{String,1,UInt32}
first_app_num       NullableArrays.NullableArray{Float64,1}
first_trial         Array{Float64,1}
priorsWDS           Array{Float64,1}
actsWDS             Array{Float64,1}
dec_FINAL           CategoricalArrays.CategoricalArray{String,1,UInt32}
unique              Array{Float64,1}
old20               NullableArrays.NullableArray{Float64,1}
Correct             BitArray{1}
acc_f               BitArray{1}
first_tri           CategoricalArrays.CategoricalArray{String,1,UInt32}
last_tri            CategoricalArrays.CategoricalArray{String,1,UInt32}
first_bi            CategoricalArrays.CategoricalArray{String,1,UInt32}
mid_bi              CategoricalArrays.CategoricalArray{String,1,UInt32}
last_bi             CategoricalArrays.CategoricalArray{String,1,UInt32}
id_f                CategoricalArrays.CategoricalArray{String,1,UInt32}
dec_GAM             NullableArrays.NullableArray{Bool,1}
GAM                 Array{Float64,1}
acc_gam             Array{Float64,1}
GAM2                Array{Float64,1}
dec_GAM2            CategoricalArrays.CategoricalArray{String,1,UInt32}
acc_gam2            Array{Float64,1}

In [12]:
mm1 = fit!(glmm(acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject), baboon, Bernoulli()))


Out[12]:
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 344103.0381

Variance components:
              Column     Variance   Std.Dev. 
 stimulus (Intercept)  0.729003384 0.8538169
 subject  (Intercept)  0.019953909 0.1412583

 Number of obs: 316871; levels of grouping factors: 8139, 6

Fixed-effects parameters:
              Estimate Std.Error  z value P(>|z|)
(Intercept)    1.06353 0.0587912    18.09  <1e-72
dec_HOG_0    -0.176357 0.0131468 -13.4144  <1e-40

In [13]:
@time mm1a = fit!(glmm(
acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject), baboon, Bernoulli()), fast = true)


 20.402963 seconds (210.08 k allocations: 1.302 GB, 1.35% gc time)
Out[13]:
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 344103.8657

Variance components:
              Column     Variance   Std.Dev.  
 stimulus (Intercept)  0.728887359 0.85374900
 subject  (Intercept)  0.021818356 0.14771038

 Number of obs: 316871; levels of grouping factors: 8139, 6

Fixed-effects parameters:
              Estimate Std.Error  z value P(>|z|)
(Intercept)    1.01788 0.0613767  16.5841  <1e-61
dec_HOG_0    -0.169316 0.0131424 -12.8832  <1e-37

In [14]:
mm1a.LMM.optsum


Out[14]:
Initial parameter vector: [1.0,1.0]
Initial objective value:  344296.1817233196

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [0.0,0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10,1.0e-10]

Function evaluations:     35
Final parameter vector:   [0.853749,0.14771]
Final objective value:    344103.86574724223
Return code:              FTOL_REACHED

In [15]:
@time mm1 = fit!(glmm(
    acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject), baboon, Bernoulli()))


 89.008223 seconds (248.00 k allocations: 4.008 GB, 0.53% gc time)
Out[15]:
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | subject)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 344103.0381

Variance components:
              Column     Variance   Std.Dev. 
 stimulus (Intercept)  0.729003384 0.8538169
 subject  (Intercept)  0.019953909 0.1412583

 Number of obs: 316871; levels of grouping factors: 8139, 6

Fixed-effects parameters:
              Estimate Std.Error  z value P(>|z|)
(Intercept)    1.06353 0.0587912    18.09  <1e-72
dec_HOG_0    -0.176357 0.0131468 -13.4144  <1e-40

In [16]:
mm1.LMM.optsum


Out[16]:
Initial parameter vector: [1.0,1.0]
Initial objective value:  344295.93829161214

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [0.0,0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10,1.0e-10]

Function evaluations:     81
Final parameter vector:   [1.06353,-0.176357,0.853817,0.141258]
Final objective value:    344103.0380669706
Return code:              FTOL_REACHED

In [17]:
@time mm2 = fit!(glmm(
    acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | first_tri) + (1 | last_tri) + (1 | subject),
    baboon, Bernoulli()))


1122.436415 seconds (210.29 M allocations: 110.799 GB, 1.67% gc time)
Out[17]:
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: acc ~ 1 + dec_HOG_0 + (1 | stimulus) + (1 | first_tri) + (1 | last_tri) + (1 | subject)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 337966.5911

Variance components:
               Column     Variance   Std.Dev.  
 stimulus  (Intercept)  0.016587679 0.12879316
 first_tri (Intercept)  0.225358482 0.47471937
 last_tri  (Intercept)  0.331953946 0.57615445
 subject   (Intercept)  0.021949342 0.14815310

 Number of obs: 316871; levels of grouping factors: 8139, 1991, 1539, 6

Fixed-effects parameters:
              Estimate Std.Error  z value P(>|z|)
(Intercept)    1.09759  0.064784  16.9424  <1e-63
dec_HOG_0    -0.132331 0.0130988 -10.1025  <1e-23