Low-rank approximation of matrices and tensors with application to dynamical and optimization problems

Ivan Oseledets

Skolkovo Institute of Science and Technology (Skoltech), Russia

Low-rank approximation

Low-rank approximation plays an increasingly important role in fast methods for large problems

  • Integral equations & fast direct solvers for sparse matrices (H-matrices)
  • High-dimensional PDEs (Fokker-Planck)
  • Stochastic PDEs and multiparametric PDEs, uncertainty quantification

Low-rank approximation of matrices

In two dimensions, we have the SVD: $$A \approx UV^{\top}, \quad U \in \mathbb{R}^{n \times r}, \quad V \in \mathbb{R}^{m \times r}$$ The approximation is sought on the manifold of low-rank matrices.

Low-rank approximation of tensors

There are different ways to generalize SVD to tensors

  • CP-format: $A(i_1, \ldots,i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha)$
  • Subspace approaches (Tucker format, HT-format, TT-format), that represent a "good" manifold as an intersection of "low-rank" manifolds in $\mathbb{R}^{n^d}$

Example: Tensor Train decomposition

Given a tensor $A(i_1, \ldots, i_d)$ we define unfoldings $A_1, \ldots, A_k$ as reshapes of the tensor $\mathbf{A}$,

and the TT-manifold is specified by the ranks of these unfoldings

TT-decomposition: algebraic structure

$$A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d),$$

where $G_k(i_k)$ is $r_{k-1} \times r_k$, $r_0 = r_d = 1$.

Hypothesis: if these is a good algorithm for the low-rank matrix approximation, then there is a good algorithm for the subspace tensor representation.

Algorithms in the TT-format

  • We can do basic linear algebra (add, multiply, compute norms)
  • We can do full to TT compression
  • We can do "rounding" (reapproximate a given tensor with another with guaranteed complexity)
  • We can do cross approximation of the tensor
  • It is all in the open-source software (TT-Toolbox in MATLAB and Python)

General concept for solving problems

We have selected our manifold $\mathcal{M}$ of structured solutions, and we want to approximate the solution of our original problem.

For example, a linear system: $A(X) = f$,

where the solution $X$ can be indexed by $d$ indices (high-dim PDE, QTT-format).

Options for finding a manifold solution

Option 1: Do ordinary method (preconditioned Krylov) with tensor arithmetics, $X_{k+1} = P(F(X_k))$

Option 2: Reformulate as minimization problem, $(Ax, x) - 2(f, x) \rightarrow \min$ and replace the minimization by the minimization over the manifold.

Minimization over the low-rank manifold is crucially important!

Dynamical low-rank approximation

Dynamical low-rank approximation is yet another crucial concept.

We have time-varying matrix $A(t)$, and we want to approximate by low-rank matrix $A(t)$

(the time $t$ later can be discrete and generalize to any iterative method).

Dirac-Frenkel principle

Dirac-Frenkel principle gives the variational formulation for dynamical low-rank approximation $$\Vert \frac{dA}{dt} - \frac{dX}{dt} \Vert \rightarrow \min$$

Equations of motion in 2D case

(Lubich, Koch, H.-D. Meyer,...)

$$\begin{split} &\frac{dU}{dt} S = \frac{dA}{dt} V,\\ &\frac{dV}{dt} S^{\top} = \frac{dA^{\top}}{dt} U, \\ &\frac{dS}{dt} = U^{\top} \frac{dA}{dt} V. \end{split}$$

Suppose, you start from rank-$1$ and evolve into rank-$10$ case: the matrix $S$ will be singular!

Equations of motion, 2D case (2)

The equations of motion can be rewritten for the full matrix
$$ \frac{dX}{dt} = P_X(\frac{dA}{dt}), $$ where $P_X$ is the projector onto the tangent space

Projector-splitting

Theorem: $P_X$ can be represented as $$P_X(Z) = Z - (I - UU^{\top}) Z (I - VV^{\top}) = UU^{\top} Z + Z VV^{\top} - UU^{\top} Z VV^{\top},$$

$$P_X(Z) = P_1 + P_2 - P_3$$

Integration of the substeps

Theorem (O., Lubich, 2014): The equations with $P_1$, $P_2$, $P_3$ can be efficiently integrated

  • Step "L": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} \rightarrow \frac{d(VS^{\top})}{dt} = \frac{dA^{\top}}{dt} U, \frac{dU}{dt}=0,$$

Thus $L_1 = L_0 + (A_1 - A_0)^{\top} U_0, U_1 = U_0$.

  • Step "K": $$\frac{dX}{dt} = \frac{dA}{dt} VV^{\top} \rightarrow \frac{d(US)}{dt} = \frac{dA}{dt} V, \frac{dV}{dt}=0,$$ Thus $K_1 = K_0 + (A_1 - A_0) V_0, V_1 = V_0$

  • Step "S": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} VV^{\top} \rightarrow \frac{d(S)}{dt} = U\frac{dA}{dt} V, \frac{dV}{dt}=0, \frac{dU}{dt} = 0,$$

Thus $S_1 = S_0 \mathbf{-} U^{\top}_0(A_1 - A_0) V_0, V_1 = V_0, U_1 = U_0$
Here $A_0 = A(t), A_1 = A(t + \tau)$

K-S-L order vs. K-L-S order

We can do a short demo of the different orders of splitting by taking two values $A_0, A_1$.


In [2]:
import numpy as np

n = 5
r = 2
m = 5
def random_usv(n, m, r):
    u = np.random.randn(n, r)
    u = np.linalg.qr(u)[0]
    v = np.random.randn(m, r)
    v = np.linalg.qr(v)[0]
    s = np.random.randn(r, r)
    return u, s, v
u0, s0, v0 = random_usv(n, m, r)
u1, s1, v1 = random_usv(n, m, r)

#Generate A0, A1 from the manifold
A0 = u0.dot(s0).dot(v0.T)
A1 = u1.dot(s1).dot(v1.T)

u = u0.copy(); v = v0.copy(); s = s0.copy()
# K-step
K = u.dot(s); K = K + (A1 - A0).dot(v)
u, s = np.linalg.qr(K)
# S-step
s = s - u.T.dot(A1 - A0).dot(v)

# L-step
L = v.dot(s.T); L = L + (A1 - A0).T.dot(u)
v, s = np.linalg.qr(L)
s = s.T


Appr = u.dot(s).dot(v.T)
print 'Error:', np.linalg.norm(A1 - Appr)


Error: 1.64942923088e-15

Exactness result

The order of the splitting steps is crucial, moreover, we have the following exactness result

If $A_0 = U_0 S_0 V^{\top}_0$ and $A_1$ is of rank $r$, the projector-splitting scheme in the KSL order is exact.

Projector-splitting in the dD-case

$$ \frac{Y(t)}{dt} = P_{Y(t)}(\frac{dA(t)}{dt}), \qquad Y(t_0)=Y_0\in \mathcal{M}, $$$$ P_X =\sum_{k=1}^{d-1} (P_{\le k-1} P_{\ge k+1} - P_{\le k} P_{\ge k+1}) + P_{\le d-1} P_{\ge d+1}, $$$$ \begin{split} &P_{\le k}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_k ( \matP_{\le k} \matZ^{\langle k \rangle})\\ &P_{\ge i}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_{k-1} ( \matZ^{\langle k-1 \rangle} \matP_{\ge k} ) . \end{split} $$

$ \matP_{\le k} = \matQ_{\le k}\matQ_{\le k}^\top, \qquad\hbox{and}\qquad \matP_{\ge k} = \matQ_{\ge k}\matQ_{\ge k}^\top, $

Lubich, Vandreycken, O., SINUM, 2015 - accepted.

KSL as retraction

Thus, KSL can be used as rectraction from the tangent space to the manifold, i.e. $A_1 = A_0 + \delta A$, then KSL gives simple formulas for such projection (and it is a second-order retraction as shown in O., Absil, 2014)

Riemannian optimization

The simplest "geometrical method" for the optimization over the low-rank manifold is the projected gradient iteration:

$$ X_{k+1} = R(X_k + P (\mathrm{grad}(F))), $$

where $P$ is the projection onto the tangent space, $R$ brings the iterate backs to the manifold. Suggestion: use KSL for the retraction. The gradient method often has slow convergence.

Riemannian optimization "for the poor"

There are several second-order methods; a simpler way is a Gauss-Newton type method. Suppose our initial point is $x_0$, and the vector on the tangent space can be parametrized as $Qp$.

Then, consider the next step as a minimization of the target functional over the full tangent space.

For quadratic functionals (i.e., tensor completion, linear systems, eigenvalue problems)

we have $$Q^{\top} A Q p = Q^{\top} f$$

as the system for $p$. For the TT-case, vector $p$ consists of $\delta G_k$, so the system is meant to be solved iteratively (and preconditioned by the ALS method). The matrix-by-vector product has a simple meaning.

Preliminary implementation for the tensor completion shows that it works well (and ALS may not always work well!).

Numerical experiments

Now we can go to numerical experiments

Vibrational spectra

We compare the TT-KSL scheme with the MCTDH package for the benchmark problem with Henon-Heiles potential \begin{equation} \frac{d\psi}{dt} = i H \psi, \quad \psi(0) = \psi_0, \end{equation} where $H$ has the form \begin{equation} \def\Hlap{-\frac12 \Delta} \def\Hhar{\frac12 \sum_{k=1}^f q^2_k} \def\Hanh{\sum_{k=1}^{f-1}\left(q^2_k q_{k+1} - \frac13 q^3_{k+1} \right)} H = \Hlap + \underbrace{\Hhar + \overbrace{\lambda \Hanh}^{\textrm{anharmonic part}}}_{\textrm{Henon-Heiles potential}~V(q_1,\ldots,q_f)}. \end{equation}

Spectra comparison

Spectra for a 10D problem

</div>

Zoomed spectra for a 10D problem

</div>

Computational time: 54354 (MCTDH) vs 4425 (TT-KSL)

Low-rank in space & time for dynamical problems

We want to utilized the low-rank in space and time:

Given an ODE

$\frac{dy}{dt} = Ay, \quad y(0) = y_0$

we want to find an optimal small-dimensional subspace $U$ for the solution:

$$y(t) \approx U c(t), \quad U^* U = I_r.$$

Lyapunov equation

The optimal $U$ can be recovered from the leading eigenvectors of the solution of the Lyapunov equation: $$ A X + XA^* = -y_0 y^*_0, $$

$$ X \approx U Z U^{\top}. $$

In http://arxiv.org/abs/1410.3335 (D. Kolesnikov, O.) we have proposed to use $$ F(U) = \int^{\infty}_0 \Vert y - U e^{Bt} U^{*} y_0 \Vert^2 dt, $$

where $B = U^* A U$ as the functional for low-rank solution of the Lyapunov equation.

This functional is computable, given an unitary $U$!

ALR method

The final alternating low-rank method is very simple to implement:

$$U_{k+1} = \mathrm{orth}([U_k, v_k, w_k]),$$

where $w$ is the next Arnoldi vector, and $v_k$ is the rational Krylov vector

$$v_k = (A + \lambda_k I)^{-1} y_0,$$

and $\lambda_k$ is computed as

$$ \lambda_k = q^* B_k q, $$

where $q$ is the last column of the solution $Z$ of the "small" Lyapunov equation $$ B_k Z + ZB_k^* = c_0 c^*_0$$

Papers and software

My webpage: http://oseledets.github.io

Software

Papers

Conclusions

  • Manifold optimization is the key to fast algorithms
  • Projector-splitting (KSL) scheme is extremely simple and efficient
Questions?

In [19]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../common/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[19]: