LMM predictions and prediction intervals

Below we will fit a linear mixed model using the Ruby gem mixed_models and demonstrate the available prediction methods.

Data and linear mixed model

We use the same data and model formulation as in several previous examples, where we have looked at various parameter estimates (1) and demostrated many types hypotheses tests as well as confidence intervals (2).

The data set, which is simulated, contains two numeric variables Age and Aggression, and two categorical variables Location and Species. These data are available for 100 (human and alien) individuals.

We model the Aggression level of an individual of Species $spcs$ who is at the Location $lctn$ as:

$$Aggression = \beta_{0} + \gamma_{spcs} + Age \cdot \beta_{1} + b_{lctn,0} + Age \cdot b_{lctn,1} + \epsilon,$$

where $\epsilon$ is a random residual, and the random vector $(b_{lctn,0}, b_{lctn,1})^T$ follows a multivariate normal distribution (the same distribution but different realizations of the random vector for each Location).

We fit this model in mixed_models using a syntax familiar from the R package lme4.



In [1]:

    
require 'mixed_models'

alien_species = Daru::DataFrame.from_csv '../examples/data/alien_species.csv'
# mixed_models expects that all variable names in the data frame are ruby Symbols:
alien_species.vectors = Daru::Index.new(alien_species.vectors.map { |v| v.to_sym })

model_fit = LMM.from_formula(formula: "Aggression ~ Age + Species + (Age | Location)", 
                             data: alien_species)
model_fit.fix_ef_summary









    Out[1]:




Daru::DataFrame:47316264430760  rows: 5  cols: 4
coef sd z_score WaldZ_p_value
intercept 1016.2867207023459 60.19727495769054 16.882603430415077 0.0
Age -0.06531615342788907 0.0898848636725299 -0.7266646547504374 0.46743141066211646
Species_lvl_Human -499.69369529020855 0.2682523406941929 -1862.774781375937 0.0
Species_lvl_Ood -899.5693213535765 0.28144708140043684 -3196.2289922406003 0.0
Species_lvl_WeepingAngel -199.58895804200702 0.27578357795259995 -723.7158917283725 0.0

Predictions and prediction intervals

Often, the objective of a statistical model is the prediction of future observations based on new data input.

We consider the following new data set containing age, geographic location and species for ten individuals.



In [2]:

    
newdata = Daru::DataFrame.from_csv '../examples/data/alien_species_newdata.csv'
newdata.vectors = Daru::Index.new(newdata.vectors.map { |v| v.to_sym })
newdata









    Out[2]:




Daru::DataFrame:47316263806300  rows: 10  cols: 3
Age Species Location
0 209 Dalek OodSphere
1 90 Ood Earth
2 173 Ood Asylum
3 153 Human Asylum
4 255 WeepingAngel OodSphere
5 256 WeepingAngel Asylum
6 37 Dalek Earth
7 146 WeepingAngel Earth
8 127 WeepingAngel Asylum
9 41 Ood Asylum

Point estimates

Based on the fitted linear mixed model we can predict the aggression levels for the inidividuals, where we can specify whether the random effects estimates should be included in the calculations or not.



In [3]:

    
puts "Predictions of aggression levels on a new data set:"
pred =  model_fit.predict(newdata: newdata, with_ran_ef: true)









    



Predictions of aggression levels on a new data set:






    Out[3]:





[1070.9125752531208, 182.45206492790737, -17.06446875476354, 384.7881586199103, 876.1240725686446, 674.7113391148862, 1092.6985606350866, 871.1508855262363, 687.4629975728096, -4.016260100144294]

Now we can add the computed predictions to the data set, in order to see better which of the individuals are likely to be particularly dangerous.



In [4]:

    
newdata = Daru::DataFrame.from_csv '../examples/data/alien_species_newdata.csv'
newdata.vectors = Daru::Index.new(newdata.vectors.map { |v| v.to_sym })
newdata[:Predicted_Agression] = pred
newdata









    Out[4]:




Daru::DataFrame:47316262633840  rows: 10  cols: 4
Age Species Location Predicted_Agression
0 209 Dalek OodSphere 1070.9125752531208
1 90 Ood Earth 182.45206492790737
2 173 Ood Asylum -17.06446875476354
3 153 Human Asylum 384.7881586199103
4 255 WeepingAngel OodSphere 876.1240725686446
5 256 WeepingAngel Asylum 674.7113391148862
6 37 Dalek Earth 1092.6985606350866
7 146 WeepingAngel Earth 871.1508855262363
8 127 WeepingAngel Asylum 687.4629975728096
9 41 Ood Asylum -4.016260100144294

Interval estimates

Since the estimated fixed and random effects coefficients most likely are not exactly the true values, we probably should look at interval estimates of the predictions, rather than the point estimates computed above.

Two types of such interval estimates are currently available in LMM. On the one hand, a confidence interval is an interval estimate of the mean value of the response for given covariates (i.e. a population parameter); on the other hand, a prediction interval is an interval estimate of a future observation (for further explanation of this distinction see for example https://stat.ethz.ch/education/semesters/ss2010/seminar/06_Handout.pdf).



In [5]:

    
puts "88% confidence intervals for the predictions:"
ci = model_fit.predict_with_intervals(newdata: newdata, level: 0.88, type: :confidence)
Daru::DataFrame.new(ci, order: [:pred, :lower88, :upper88])









    



88% confidence intervals for the predictions:






    Out[5]:




Daru::DataFrame:47316259596660  rows: 10  cols: 3
pred lower88 upper88
0 1002.6356446359171 906.275473617091 1098.995815654743
1 110.83894554025937 17.15393113018095 204.5239599503378
2 105.41770480574462 10.164687937713381 200.67072167377586
3 506.59965393767027 411.8519191795299 601.3473886958107
4 800.0421435362272 701.9091174988788 898.1751695735755
5 799.9768273827992 701.8009453018722 898.1527094637263
6 1013.870023025514 920.443931319159 1107.296114731869
7 807.1616042598671 712.571759209002 901.7514493107321
8 808.402611174997 714.191640124036 902.613582225958
9 114.03943705822599 20.614034870631627 207.46483924582034



In [6]:

    
puts "88% prediction intervals for the predictions:"
pi = model_fit.predict_with_intervals(newdata: newdata, level: 0.88, type: :prediction)
Daru::DataFrame.new(pi, order: [:pred, :lower88, :upper88])









    



88% prediction intervals for the predictions:






    Out[6]:




Daru::DataFrame:47316258683700  rows: 10  cols: 3
pred lower88 upper88
0 1002.6356446359171 809.9100501459104 1195.3612391259237
1 110.83894554025937 -76.53615884686141 298.2140499273802
2 105.41770480574462 -85.09352864481423 295.92893825630347
3 506.59965393767027 317.0988995529618 696.1004083223787
4 800.0421435362272 603.7713980881146 996.3128889843398
5 799.9768273827992 603.6203777073699 996.3332770582285
6 1013.870023025514 827.0127232317805 1200.7273228192475
7 807.1616042598671 617.9767304115936 996.3464781081406
8 808.402611174997 619.9754792487822 996.8297431012118
9 114.03943705822599 -72.8161447158925 300.8950188323445

Remark: You might notice that #predict with with_ran_ef: true produces some values outside of the confidence intervals, because the confidence intervals are computed from #predict with with_ran_ef: false. However, #predict with with_ran_ef: false should always give values which lie in the center of the confidence or prediction intervals.

Daru::DataFrame:47316264430760 rows: 5 cols: 4
	coef	sd	z_score	WaldZ_p_value
intercept	1016.2867207023459	60.19727495769054	16.882603430415077	0.0
Age	-0.06531615342788907	0.0898848636725299	-0.7266646547504374	0.46743141066211646
Species_lvl_Human	-499.69369529020855	0.2682523406941929	-1862.774781375937	0.0
Species_lvl_Ood	-899.5693213535765	0.28144708140043684	-3196.2289922406003	0.0
Species_lvl_WeepingAngel	-199.58895804200702	0.27578357795259995	-723.7158917283725	0.0

Daru::DataFrame:47316263806300 rows: 10 cols: 3
	Age	Species	Location
0	209	Dalek	OodSphere
1	90	Ood	Earth
2	173	Ood	Asylum
3	153	Human	Asylum
4	255	WeepingAngel	OodSphere
5	256	WeepingAngel	Asylum
6	37	Dalek	Earth
7	146	WeepingAngel	Earth
8	127	WeepingAngel	Asylum
9	41	Ood	Asylum

Daru::DataFrame:47316262633840 rows: 10 cols: 4
	Age	Species	Location	Predicted_Agression
0	209	Dalek	OodSphere	1070.9125752531208
1	90	Ood	Earth	182.45206492790737
2	173	Ood	Asylum	-17.06446875476354
3	153	Human	Asylum	384.7881586199103
4	255	WeepingAngel	OodSphere	876.1240725686446
5	256	WeepingAngel	Asylum	674.7113391148862
6	37	Dalek	Earth	1092.6985606350866
7	146	WeepingAngel	Earth	871.1508855262363
8	127	WeepingAngel	Asylum	687.4629975728096
9	41	Ood	Asylum	-4.016260100144294

Daru::DataFrame:47316259596660 rows: 10 cols: 3
	pred	lower88	upper88
0	1002.6356446359171	906.275473617091	1098.995815654743
1	110.83894554025937	17.15393113018095	204.5239599503378
2	105.41770480574462	10.164687937713381	200.67072167377586
3	506.59965393767027	411.8519191795299	601.3473886958107
4	800.0421435362272	701.9091174988788	898.1751695735755
5	799.9768273827992	701.8009453018722	898.1527094637263
6	1013.870023025514	920.443931319159	1107.296114731869
7	807.1616042598671	712.571759209002	901.7514493107321
8	808.402611174997	714.191640124036	902.613582225958
9	114.03943705822599	20.614034870631627	207.46483924582034