Machine information
In [37]:
versioninfo()
For demonstration, we generate a random data set.
In [38]:
# generate data from a d-variate response variane component model
srand(123)
n = 1000 # no. observations
d = 2 # dimension of responses
m = 2 # no. variance components
p = 2 # no. covariates
# n-by-p design matrix
X = randn(n, p)
# p-by-d mean component regression coefficient
B = ones(p, d)
# a tuple of m covariance matrices
V = ntuple(x -> zeros(n, n), m)
for i = 1:m-1
Vi = randn(n, 50)
copy!(V[i], Vi * Vi')
end
copy!(V[m], eye(n)) # last covarianec matrix is idendity
# a tuple of m d-by-d variance component parameters
Σ = ntuple(x -> zeros(d, d), m)
for i in 1:m
Σi = randn(d, d)
copy!(Σ[i], Σi' * Σi)
end
# form overall nd-by-nd covariance matrix Ω
Ω = zeros(n * d, n * d)
for i = 1:m
Ω += kron(Σ[i], V[i])
end
Ωchol = cholfact(Ω)
# n-by-d responses
Y = X * B + reshape(Ωchol[:L] * randn(n*d), n, d);
To find the MLE of parameters $(B,\Sigma_1,\ldots,\Sigma_m)$, we take 3 steps:
Step 1 (Construct data). Construct an instance of VarianceComponentVariate
, which consists fields
Y
: $n$-by-$d$ responses X
: $n$-by-$p$ covariate matrix V=(V[1],...,V[m])
: a tuple of $n$-by-$n$ covariance matrices. The last covariance matrix must be positive definite and usually is the identity matrix.
In [39]:
using VarianceComponentModels
vcdata = VarianceComponentVariate(Y, X, V)
fieldnames(vcdata)
Out[39]:
In the absence of covariates $X$, we can simply initialize by vcdata = VarianceComponentVariate(Y, V)
.
Step 2 (Construct a model). Construct an instance of VarianceComponentModel
, which consists of fields
B
: $n$-by-$p$ mean regression coefficients Σ=(Σ[1],...,Σ[m])
: variane component parameters respectively. When constructed from a VarianceComponentVariate
instance, the mean parameters $B$ are initialized to be zero and the tuple of variance component parameters $\Sigma$ to be (eye(d),...,eye(d))
.
In [40]:
vcmodel = VarianceComponentModel(vcdata)
fieldnames(vcmodel)
Out[40]:
In [41]:
vcmodel
Out[41]:
The remaining fields A
, sense
, b
, lb
, ub
specify (optional) constraints on the mean parameters B
:
$A * \text{vec}(B) \,\, =(\text{or } \ge \text{or } \le) \,\, b$
$lb \le \text{vec}(B) \le ub$
A
is an constraint matrix with $pd$ columns, sense
is a vector of charaters taking values '<'
, '='
or '>'
, and lb
and ub
are the lower and upper bounds for vec(B)
. By default, A
, sense
, b
are empty, lb
is -Inf
, and ub
is Inf
. If any constraits are non-trivial, final estimates of B
are enforced to satisfy them.
When a better initial guess is available, we can initialize by calling vcmodel=VarianceComponentModel(B0, Σ0)
directly.
Step 3 (Fit model). Call optmization routine fit_mle!
. The keywork algo
dictates the optimization algorithm: :MM
(minorization-maximization algorithm) or :FS
(Fisher scoring algorithm).
In [42]:
vcmodel_mle = deepcopy(vcmodel)
@time logl, vcmodel_mle, Σse, Σcov, Bse, Bcov = fit_mle!(vcmodel_mle, vcdata; algo = :MM);
The output of fit_mle!
contains
In [43]:
logl
Out[43]:
In [44]:
fieldnames(vcmodel_mle)
Out[44]:
In [45]:
vcmodel_mle
Out[45]:
In [46]:
Σse
Out[46]:
In [47]:
Σcov
Out[47]:
In [48]:
Bse
Out[48]:
In [49]:
Bcov
Out[49]:
REML (restricted maximum likelihood estimation) is a popular alternative to the MLE. To find the REML of a variane component model, we replace the above step 3 by
Step 3. Call optmization routine fit_reml!
.
In [50]:
vcmodel_reml = deepcopy(vcmodel)
@time logl, vcmodel_reml, Σse, Σcov, Bse, Bcov = fit_reml!(vcmodel_reml, vcdata; algo = :MM);
The output of fit_reml!
contains
In [51]:
logl
Out[51]:
In [52]:
fieldnames(vcmodel_reml)
Out[52]:
In [53]:
vcmodel_reml
Out[53]:
In [54]:
Σse
Out[54]:
In [55]:
Σcov
Out[55]:
In [56]:
Bse
Out[56]:
In [57]:
Bcov
Out[57]:
Finding the MLE or REML of variance component models is a non-trivial nonlinear optimization problem. The main complications are the non-convexity of objective function and the positive semi-definiteness constraint of variane component parameters $\Sigma_1,\ldots,\Sigma_m$. In specific applications, users should try different algorithms with different starting points in order to find a better solution. Here are some tips for efficient computation.
In general the optimization algorithm needs to invert the $nd$ by $nd$ overall covariance matrix $\Omega = \Sigma_1 \otimes V_1 + \cdots + \Sigma_m \otimes V_m$ in each iteration. Inverting a matrix is an expensive operation with $O(n^3 d^3)$ floating operations. When there are only two varianec components ($m=2$), this tedious task can be avoided by taking one (generalized) eigendecomposion of $(V_1, V_2)$ and rotating data $(Y, X)$ by the eigen-vectors.
In [58]:
vcdatarot = TwoVarCompVariateRotate(vcdata)
fieldnames(vcdatarot)
Out[58]:
Two optimization algorithms are implemented: Fisher scoring (mle_fs!
) and the minorization-maximization (MM) algorithm (mle_mm!
). Both take the rotated data as input. These two functions give finer control of the optimization algorithms. Generally speaking, MM algorithm is more stable while Fisher scoring (if it converges) yields more accurate answer.
In [59]:
vcmodel_mm = deepcopy(vcmodel)
@time mle_mm!(vcmodel_mm, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
In [60]:
# MM estimates
vcmodel_mm.B
Out[60]:
In [61]:
# MM estimates
vcmodel_mm.Σ
Out[61]:
In [62]:
# Fisher scoring using Ipopt
vcmodel_ipopt = deepcopy(vcmodel)
@time mle_fs!(vcmodel_ipopt, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
In [63]:
# Ipopt estimates
vcmodel_ipopt.B
Out[63]:
In [64]:
# Ipopt estimates
vcmodel_ipopt.Σ
Out[64]:
Knitro is a commercial software and users need to follow instructions at KNITRO.jl for proper functioning. Following code invokes Knitro as the backend optimization solver.
using KNITRO
# Fisher scoring using Knitro
vcmodel_knitro = deepcopy(vcmodel)
@time mle_fs!(vcmodel_knitro, vcdatarot; solver=:Knitro, maxiter=1000, verbose=true);
# Knitro estimates
vcmodel_knitro.B
# Knitro estimates
vcmodel_knitro.Σ
Here are a few strategies for successful optimization.
TwoVarCompVariateRotate
and use it for optimization.Many applications invoke constraints on the mean parameters B
. For demonstration, we enforce B[1,1]=B[1,2]
and all entries of B
are within [0, 2].
In [65]:
# set up constraints on B
vcmodel_constr = deepcopy(vcmodel)
vcmodel_constr.A = [1.0 0.0 -1.0 0.0]
vcmodel_constr.sense = '='
vcmodel_constr.b = 0.0
vcmodel_constr.lb = 0.0
vcmodel_constr.ub = 2.0
vcmodel_constr
Out[65]:
We first try the MM algorithm.
In [66]:
# MM algorithm for constrained estimation of B
@time mle_mm!(vcmodel_constr, vcdatarot; maxiter=10000, funtol=1e-8, verbose = true);
In [67]:
fieldnames(vcmodel_constr)
Out[67]:
In [68]:
vcmodel_constr.B
Out[68]:
In [69]:
vcmodel_constr.Σ
Out[69]:
Now let's try Fisher scoring.
In [70]:
# Fisher scoring using Ipopt for constrained estimation of B
vcmodel_constr = deepcopy(vcmodel)
vcmodel_constr.A = [1.0 0.0 -1.0 0.0]
vcmodel_constr.sense = '='
vcmodel_constr.b = 0.0
vcmodel_constr.lb = 0.0
vcmodel_constr.ub = 2.0
vcmodel_constr
@time mle_fs!(vcmodel_constr, vcdatarot; solver=:Ipopt, maxiter=1000, verbose=true);
In [71]:
vcmodel_constr.B
Out[71]:
In [72]:
vcmodel_constr.Σ
Out[72]: