The mixed models considered in the previous chapter had only one random-effects term, which was a simple, scalar random-effects term, and a single fixed-effects coefficient. Although such models can be useful, it is with the facility to use multiple random-effects terms and to use random-effects terms beyond a simple, scalar term that we can begin to realize the flexibility and versatility of mixed models.

In this chapter we consider models with multiple simple, scalar random-effects terms, showing examples where the grouping factors for these terms are in completely crossed or nested or partially crossed configurations. For ease of description we will refer to the random effects as being crossed or nested although, strictly speaking, the distinction between nested and non-nested refers to the grouping factors, not the random effects.

```
In [1]:
```using DataFrames, Distributions, Feather, FreqTables, Gadfly, MixedModels
const datadir = Pkg.dir("MixedModels", "test", "data");
const ppt250 = inv(500) : inv(250) : 1.;
const zquantiles = quantile(Normal(), ppt250);
function hpdinterval(v, level=0.95)
n = length(v)
if !(0 < level < 1)
throw(ArgumentError("level = $level must be in (0, 1)"))
end
if (lbd = floor(Int, (1 - level) * n)) < 2
throw(ArgumentError(
"level = $level is too large from sample size $n"))
end
ordstat = sort(v)
leftendpts = ordstat[1:lbd]
rtendpts = ordstat[(1 + n - lbd):n]
(w, ind) = findmin(rtendpts - leftendpts)
return [leftendpts[ind], rtendpts[ind]]
end

```
Out[1]:
```

One of the areas in which the methods in the package for are particularly effective is in fitting models to cross-classified data where several factors have random effects associated with them. For example, in many experiments in psychology the reaction of each of a group of subjects to each of a group of stimuli or items is measured. If the subjects are considered to be a sample from a population of subjects and the items are a sample from a population of items, then it would make sense to associate random effects with both these factors.

In the past it was difficult to fit mixed models with multiple, crossed grouping factors to large, possibly unbalanced, data sets. The methods in the package are able to do this. To introduce the methods let us first consider a small, balanced data set with crossed grouping factors.

`Penicillin`

DataThe data are derived from Table 6.6, p. 144 of where they are described as coming from an investigation to

assess the variability between samples of penicillin by the

B. subtilismethod. In this test method a bulk-innoculated nutrient agar medium is poured into a Petri dish of approximately 90 mm. diameter, known as a plate. When the medium has set, six small hollow cylinders or pots (about 4 mm. in diameter) are cemented onto the surface at equally spaced intervals. A few drops of the penicillin solutions to be compared are placed in the respective cylinders, and the whole plate is placed in an incubator for a given time. Penicillin diffuses from the pots into the agar, and this produces a clear circular zone of inhibition of growth of the organisms, which can be readily measured. The diameter of the zone is related in a known way to the concentration of penicillin in the solution.

As with the `dyestuff`

data, we load `penicillin`

from a feather-format file

```
In [2]:
```penicillin = Feather.read(joinpath(datadir, "Penicillin.feather"), nullable = false);
names(penicillin)

```
Out[2]:
```

of the data, then plot it

[fig:Penicillindot]

(Fig. [fig:Penicillindot]).

The variation in the diameter is associated with the plates and with the samples. Because each plate is used only for the six samples shown here we are not interested in the contributions of specific plates as much as we are interested in the variation due to plates and in assessing the potency of the samples after accounting for this variation. Thus, we will use random effects for the factor. We will also use random effects for the factor because, as in the dyestuff example, we are more interested in the sample-to-sample variability in the penicillin samples than in the potency of a particular sample.

In this experiment each sample is used on each plate. We say that the and factors are *crossed*, as opposed to *nested* factors, which we will describe in the next section. By itself, the designation “crossed” just means that the factors are not nested. If we wish to be more specific, we could describe these factors as being *completely crossed*, which means that we have at least one observation for each combination of a level of and a level of . We can see this in Fig. [fig:Penicillindot] and, because there are moderate numbers of levels in these factors, we can check it in a cross-tabulation

```
In [3]:
```freqtable(convert(Vector{String}, penicillin[:sample]),
convert(Vector{String}, penicillin[:plate]))

```
Out[3]:
```

Like the `dyestuff`

data, the factors in the `penicillin`

data are balanced. That is, there are exactly the same number of observations on each plate and for each sample and, furthermore, there is the same number of observations on each combination of levels. In this case there is exactly one observation for each combination of `sample`

and `plate`

. We would describe the configuration of these two factors as an unreplicated, completely balanced, crossed design.

In general, balance is a desirable but precarious property of a data set. We may be able to impose balance in a designed experiment but we typically cannot expect that data from an observation study will be balanced. Also, as anyone who analyzes real data soon finds out, expecting that balance in the design of an experiment will produce a balanced data set is contrary to “Murphy’s Law”. That’s why statisticians allow for missing data. Even when we apply each of the six samples to each of the 24 plates, something could go wrong for one of the samples on one of the plates, leaving us without a measurement for that combination of levels and thus an unbalanced data set.

`Penicillin`

DataA model incorporating random effects for both the and the is straightforward to specify — we include simple, scalar random effects terms for both these factors.

```
In [4]:
```penm = fit!(lmm(diameter ~ 1 + (1 | plate) + (1 | sample), penicillin))

```
Out[4]:
```

This model display indicates that the sample-to-sample variability has the greatest contribution, then plate-to-plate variability and finally the “residual” variability that cannot be attributed to either the sample or the plate. These conclusions are consistent with what we see in the data plot (Fig. [fig:Penicillindot]).

The prediction intervals on the random effects (Fig. [fig:fm03ranef])

[fig:fm03ranef]

confirm that the conditional distribution of the random effects for has much less variability than does the conditional distribution of the random effects for , in the sense that the dots in the bottom panel have less variability than those in the top panel. (Note the different horizontal axes for the two panels.) However, the conditional distribution of the random effect for a particular , say sample F, has less variability than the conditional distribution of the random effect for a particular plate, say plate m. That is, the lines in the bottom panel are wider than the lines in the top panel, even after taking the different axis scales into account. This is because the conditional distribution of the random effect for a particular sample depends on 24 responses while the conditional distribution of the random effect for a particular plate depends on only 6 responses.

In chapter [chap:ExamLMM] we saw that a model with a single, simple, scalar random-effects term generated a random-effects model matrix, $\mathbf Z$, that is the matrix of indicators of the levels of the grouping factor. When we have multiple, simple, scalar random-effects terms, as in model , each term generates a matrix of indicator columns and these sets of indicators are concatenated to form the model matrix $\mathbf Z$. The transpose of this matrix, shown in Fig. [fig:fm03Ztimage], contains rows of indicators for each factor.

[fig:fm03Ztimage]

[fig:fm03LambdaLimage]

The relative covariance factor, $\Lambda_\theta$, (Fig. [fig:fm03LambdaLimage], left panel) is no longer a multiple of the identity. It is now block diagonal, with two blocks, one of size 24 and one of size 6, each of which is a multiple of the identity. The diagonal elements of the two blocks are $\theta_1$ and $\theta_2$, respectively. The numeric values of these parameters can be obtained as

```
In [5]:
```getθ(penm)

```
Out[5]:
```

or as the `Final parameter vector`

in the `opsum`

field of `penm`

```
In [6]:
```penm.optsum

```
Out[6]:
```

The first parameter is the relative standard deviation of the random effects for `plate`

, which has the value `0.8455646/0.5499331 = 1.53758`

at convergence, and the second is the relative standard deviation of the random effects for `sample`

(`1.7706475/0.5499331 = 3.21975`

).

Because $\Lambda_\theta$ is diagonal, the pattern of non-zeros in $\Lambda_\theta'\mathbf Z'\mathbf Z\Lambda_\theta+\mathbf I$ will be the same as that in $\mathbf Z'\mathbf Z$, shown in the middle panel of Fig. [fig:fm03LambdaLimage]. The sparse Cholesky factor, $\mathbf L$, shown in the right panel, is lower triangular and has non-zero elements in the lower right hand corner in positions where $\mathbf Z'\mathbf Z$ has systematic zeros. We say that “fill-in” has occurred when forming the sparse Cholesky decomposition. In this case there is a relatively minor amount of fill but in other cases there can be a substantial amount of fill and we shall take precautions so as to reduce this, because fill-in adds to the computational effort in determining the MLEs or the REML estimates.

A bootstrap simulation of the model

```
In [7]:
```@time penmbstp = bootstrap(10000, penm);

```
```

provides the density plots

```
In [8]:
```plot(x = penmbstp[:β₁], Geom.density(), Guide.xlabel("β₁"))

```
Out[8]:
```

```
In [9]:
```plot(x = penmbstp[:σ], Geom.density(), Guide.xlabel("σ"))

```
Out[9]:
```

```
In [10]:
```plot(x = penmbstp[:σ₁], Geom.density(), Guide.xlabel("σ₁"))

```
Out[10]:
```

```
In [11]:
```plot(x = penmbstp[:σ₂], Geom.density(), Guide.xlabel("σ₂"))

```
Out[11]:
```

A profile zeta plot (Fig. [fig:fm03prplot]) for the parameters in model

[fig:fm03prplot]

leads to conclusions similar to those from Fig. [fig:fm1prof] for model in the previous chapter. The fixed-effect parameter, $\beta_1$, for the constant term has symmetric intervals and is over-dispersed relative to the normal distribution. The logarithm of $\sigma$ has a good normal approximation but the standard deviations of the random effects, $\sigma_1$ and $\sigma_2$, are skewed. The skewness for $\sigma_2$ is worse than that for $\sigma_1$, because the estimate of $\sigma_2$ is less precise than that of $\sigma_1$, in both absolute and relative senses. For an absolute comparison we compare the widths of the confidence intervals for these parameters.

```
2.5 % 97.5 %
.sig01 0.6335658 1.1821040
.sig02 1.0957893 3.5562919
.sigma 0.4858454 0.6294535
(Intercept) 21.2666274 24.6778176
```

In a relative comparison we examine the ratio of the endpoints of the interval divided by the estimate.

```
2.5 % 97.5 %
.sig01 0.7492746 1.397993
.sig02 0.6188634 2.008469
```

The lack of precision in the estimate of $\sigma_2$ is a consequence of only having 6 distinct levels of the factor. The factor, on the other hand, has 24 distinct levels. In general it is more difficult to estimate a measure of spread, such as the standard deviation, than to estimate a measure of location, such as a mean, especially when the number of levels of the factor is small. Six levels are about the minimum number required for obtaining sensible estimates of standard deviations for simple, scalar random effects terms.

The profile pairs plot (Fig. [fig:fm03prpairs])

[fig:fm03prpairs]

shows patterns similar to those in Fig. [fig:fm1profpair] for pairs of parameters in model fit to the data. On the $\zeta$ scale (panels below the diagonal) the profile traces are nearly straight and orthogonal with the exception of the trace of $\zeta(\sigma_2)$ on $\zeta(\beta_0)$ (the horizontal trace for the panel in the $(4,2)$ position). The pattern of this trace is similar to the pattern of the trace of $\zeta(\sigma_1)$ on $\zeta(\beta_1)$ in Fig. [fig:fm1profpair]. Moving $\beta_0$ from its estimate, $\widehat{\beta}_0$, in either direction will increase the residual sum of squares. The increase in the residual variability is reflected in an increase of one or more of the dispersion parameters. The balanced experimental design results in a fixed estimate of $\sigma$ and the extra apparent variability must be incorporated into $\sigma_1$ or $\sigma_2$.

Contours in panels of parameter pairs on the original scales (i.e. panels above the diagonal) can show considerable distortion from the ideal elliptical shape. For example, contours in the $\sigma_2$ versus $\sigma_1$ panel (the $(1,2)$ position) and the $\log(\sigma)$ versus $\sigma_2$ panel (in the $(2,3)$ position) are dramatically non-elliptical. However, the distortion of the contours is not due to these parameter estimates depending strongly on each other. It is almost entirely due to the choice of scale for $\sigma_1$ and $\sigma_2$. When we plot the contours on the scale of $\log(\sigma_1)$ and $\log(\sigma_2)$ instead (Fig. [fig:lpr03pairs])

[fig:fm03lprpairs]

they are much closer to the elliptical pattern.

Conversely, if we tried to plot contours on the scale of $\sigma_1^2$ and $\sigma_2^2$ (not shown), they would be hideously distorted.

In this section we again consider a simple example, this time fitting a model with *nested* grouping factors for the random effects.

`Pastes`

DataThe third example from Davies (1972) is described as coming from

deliveries of a chemical paste product contained in casks where, in addition to sampling and testing errors, there are variations in quality between deliveries …As a routine, three casks selected at random from each delivery were sampled and the samples were kept for reference. …Ten of the delivery batches were sampled at random and two analytical tests carried out on each of the 30 samples.

The structure and summary of the data object are

```
In [12]:
```pastes = Feather.read(joinpath(datadir, "Pastes.feather"), nullable = false);
names(pastes)

```
Out[12]:
```

`sample`

factor with the label of the `batch`

factor followed by `a`

, `b`

or `c`

to distinguish the three samples taken from that batch.

```
In [13]:
```freqtable(convert(Vector{String}, pastes[:batch]),
convert(Vector{String}, pastes[:sample]))

```
Out[13]:
```

When plotting the `strength`

versus `sample`

and in the data we should remember that we have two strength measurements on each of the 30 samples. It is tempting to use the cask designation (‘a’, ‘b’ and ‘c’) to determine, say, the plotting symbol within a `sample`

. It would be fine to do this within a batch but the plot would be misleading if we used the same symbol for cask ‘a’ in different batches. There is no relationship between cask ‘a’ in batch ‘A’ and cask ‘a’ in batch ‘B’. The labels ‘a’, ‘b’ and ‘c’ are used only to distinguish the three samples within a batch; they do not have a meaning across batches.

[fig:Pastesplot]

In Fig. [fig:Pastesplot] we plot the two strength measurements on each of the samples within each of the batches and join up the average strength for each sample. The perceptive reader will have noticed that the levels of the factors on the vertical axis in this figure, and in Fig. [fig:Dyestuffdot] and [fig:Penicillindot], have been reordered according to increasing average response. In all these cases there is no inherent ordering of the levels of the covariate such as or . Rather than confuse our interpretation of the plot by determining the vertical displacement of points according to a random ordering, we impose an ordering according to increasing mean response. This allows us to more easily check for structure in the data, including undesirable characteristics like increasing variability of the response with increasing mean level of the response.

In Fig. [fig:Pastesplot] we order the samples within each batch separately then order the batches according to increasing mean strength.

Figure [fig:Pastesplot] shows considerable variability in strength between samples relative to the variability within samples. There is some indication of variability between batches, in addition to the variability induced by the samples, but not a strong indication of a batch effect. For example, batches I and D, with low mean strength relative to the other batches, each contained one sample (I:b and D:c, respectively) that had high mean strength relative to the other samples. Also, batches H and C, with comparatively high mean batch strength, contain samples H:a and C:a with comparatively low mean sample strength. In Sect. [sec:TestingSig2is0] we will examine the need for incorporating batch-to-batch variability, in addition to sample-to-sample variability, in the statistical model.

Because each level of occurs with one and only one level of we say that is *nested within* . Some presentations of mixed-effects models, especially those related to *multilevel modeling* or *hierarchical linear models* , leave the impression that one can only define random effects with respect to factors that are nested. This is the origin of the terms “multilevel”, referring to multiple, nested levels of variability, and “hierarchical”, also invoking the concept of a hierarchy of levels. To be fair, both those references do describe the use of models with random effects associated with non-nested factors, but such models tend to be treated as a special case.

The blurring of mixed-effects models with the concept of multiple, hierarchical levels of variation results in an unwarranted emphasis on “levels” when defining a model and leads to considerable confusion. It is perfectly legitimate to define models having random effects associated with non-nested factors. The reasons for the emphasis on defining random effects with respect to nested factors only are that such cases do occur frequently in practice and that some of the computational methods for estimating the parameters in the models can only be easily applied to nested factors.

This is not the case for the methods used in the package. Indeed there is nothing special done for models with random effects for nested factors. When random effects are associated with multiple factors exactly the same computational methods are used whether the factors form a nested sequence or are partially crossed or are completely crossed. There is, however, one aspect of nested grouping factors that we should emphasize, which is the possibility of a factor that is *implicitly nested* within another factor. Suppose, for example, that the factor was defined as having three levels instead of 30 with the implicit assumption that is nested within . It may seem silly to try to distinguish 30 different batches with only three levels of a factor but, unfortunately, data are frequently organized and presented like this, especially in text books. The factor in the data is exactly such an implicitly nested factor. If we cross-tabulate `cask`

and `batch`

```
batch
cask A B C D E F G H I J
a 2 2 2 2 2 2 2 2 2 2
b 2 2 2 2 2 2 2 2 2 2
c 2 2 2 2 2 2 2 2 2 2
```

we get the impression that the and factors are crossed, not nested. If we know that the cask should be considered as nested within the batch then we should create a new categorical variable giving the batch-cask combination, which is exactly what the factor is. A simple way to create such a factor is to use the interaction operator, ‘’, on the factors. It is advisable, but not necessary, to apply to the result thereby dropping unused levels of the interaction from the set of all possible levels of the factor. (An “unused level” is a combination that does not occur in the data.) A convenient code idiom is

or

In a small data set like we can quickly detect a factor being implicitly nested within another factor and take appropriate action. In a large data set, perhaps hundreds of thousands of test scores for students in thousands of schools from hundreds of school districts, it is not always obvious if school identifiers are unique across the entire data set or just within a district. If you are not sure, the safest thing to do is to create the interaction factor, as shown above, so you can be confident that levels of the district:school interaction do indeed correspond to unique schools.

Fitting a model with simple, scalar random effects for nested factors is done in exactly the same way as fitting a model with random effects for crossed grouping factors. We include random-effects terms for each factor, as in

```
In [14]:
```pstsm = fit!(lmm(strength ~ 1 + (1 | sample) + (1 | batch), pastes))

```
Out[14]:
```

Not only is the model specification similar for nested and crossed factors, the internal calculations are performed according to the methods described in Sect. [sec:definitions] for each model type. Comparing the patterns in the matrices $\Lambda$, $\mathbf Z'\mathbf Z$ and $\mathbf L$ for this model (Fig. [fig:fm04LambdaLimage]) to those in Fig. [fig:fm03LambdaLimage] shows that models with nested factors produce simple repeated structures along the diagonal of the sparse Cholesky factor, $\mathbf L$. This type of structure has the desirable property that there is no “fill-in” during calculation of the Cholesky factor. In other words, the number of non-zeros in $\mathbf L$ is the same as the number of non-zeros in the lower triangle of the matrix being factored, $\Lambda'\mathbf Z'\mathbf Z\Lambda+\mathbf I$ (which, because $\Lambda$ is diagonal, has the same structure as $\mathbf Z'\mathbf Z$).

`pstsm`

The parameter estimates are: $\widehat{\sigma_1}=$2.904, the standard deviation of the random effects for `sample`

; $\widehat{\sigma_2}=$1.095, the standard deviation of the random effects for `batch`

; $\widehat{\sigma}=$0.823, the standard deviation of the residual noise term; and $\widehat{\beta_0}=$60.053, the overall mean response, which is labeled `(Intercept)`

in these models.

The estimated standard deviation for `sample`

is nearly three times as large as that for `batch`

, which confirms what we saw in Fig. [fig:Pastesplot]. Indeed our conclusion from Fig. [fig:Pastesplot] was that there may not be a significant batch-to-batch variability in addition to the sample-to-sample variability.

Plots of the prediction intervals of the random effects (Fig. [fig:fm04ranef])

[fig:fm04ranef]

confirm this impression in that all the prediction intervals for the random effects for contain zero. Furthermore, a bootstrap sample

```
In [15]:
```srand(4321234)
@time pstsbstp = bootstrap(10000, pstsm);

```
```

```
In [16]:
```plot(x = pstsbstp[:β₁], Geom.density(), Guide.xlabel("β₁"))

```
Out[16]:
```

```
In [17]:
```plot(x = pstsbstp[:σ], Geom.density(), Guide.xlabel("σ"))

```
Out[17]:
```