Time integration of tensor trains

Ivan Oseledets*, Christian Lubich, Bart Vandereycken

*Skolkovo Institute of Science and Technology

$\def\matC {{\boldsymbol{\mathrm{C}}}}$ $\def\matX {{\boldsymbol{\mathrm{X}}}}$ $\def\matI {{\boldsymbol{\mathrm{I}}}}$ $\def\matZ {{\boldsymbol{\mathrm{Z}}}}$ $\def\matQ {{\boldsymbol{\mathrm{Q}}}}$ $\def\matP {{\boldsymbol{\mathrm{P}}}}$

$\def\R {\mathbb{R}}$

Outline of the talk

  • What is dynamical low-rank approximation
  • What is a tensor train
  • Projector-splitting integrator in 2D
  • Projector-splitting integrator in dD

Dynamical low-rank approximation

Dynamical low-rank rank approximation is a low-rank approximation of time-varying matrix:

$$A(t) \approx Y(t) = U(t) S(t) V^{\top}(t), $$

where $U(t) \in \mathbb{R}^{n \times r}, V(t) \in \mathbb{R}^{m \times r}, S(t) \in \mathbb{R}^{r \times r}$

Simple DLR

You can use point-wise SVD:

$$\min_{\mathrm{rank}(X(t)) = r}\Vert A(t) - X(t) \Vert $$

but this has some drawbacks:

  • The dependence is non-smooth in time
  • The energy is not conserved
  • No history $A(t)$ is taken into account!

Instead, use Dirac-Frenkel principle

Dirac-Frenkel principle can be used instead:

$$ \langle \frac{dA}{dt} - \frac{dX}{dt}, Z \rangle_F = 0, $$

for all $Z$ in the tangent space of the manifold $\mathcal{M}_r$ of all matrices of rank $r$,
and tangent space is defined as

$$ Z = \delta U S V^{\top} + U \delta S V^{\top} + U S \delta V^{\top} $$

Equations of motion in 2D case

(Lubich, Koch, Meyer,...)

$$\begin{split} &\frac{dU}{dt} S = \frac{dA}{dt} V,\\ &\frac{dV}{dt} S^{\top} = \frac{dA^{\top}}{dt} U, \\ &\frac{dS}{dt} = U^{\top} \frac{dA}{dt} V. \end{split}$$

Suppose, you start from rank-$1$ and evolve into rank-$10$ case: the matrix $S$ will be singular!

Equations of motion, 2D case (2)

The equations of motion can be rewritten for the full matrix

$$ \frac{dX}{dt} = P_X(\frac{dA}{dt}), $$

where $P_X$ is the projector onto the tangent space

Projector-splitting

Theorem: $P_X$ can be represented as

$$P_X(Z) = Z - (I - UU^{\top}) Z (I - VV^{\top}) = UU^{\top} Z + Z VV^{\top} - UU^{\top} Z VV^{\top},$$$$P_X(Z) = P_1 + P_2 - P_3$$

Integration of the substeps

Theorem (O., Lubich, 2014): The equations with $P_1$, $P_2$, $P_3$ can be efficiently integrated

  • Step "L": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} \rightarrow \frac{d(VS^{\top})}{dt} = \frac{dA^{\top}}{dt} U, \frac{dU}{dt}=0,$$

    Thus $L_1 = L_0 + (A_1 - A_0)^{\top} U_0, U_1 = U_0$.

  • Step "K": $$\frac{dX}{dt} = \frac{dA}{dt} VV^{\top} \rightarrow \frac{d(US)}{dt} = \frac{dA}{dt} V, \frac{dV}{dt}=0,$$

    Thus $K_1 = K_0 + (A_1 - A_0) V_0, V_1 = V_0$

  • Step "S": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} VV^{\top} \rightarrow \frac{d(S)}{dt} = U\frac{dA}{dt} V, \frac{dV}{dt}=0, \frac{dU}{dt} = 0,$$

    Thus $S_1 = S_0 \mathbf{-} U^{\top}_0(A_1 - A_0) V_0, V_1 = V_0, U_1 = U_0$
    Here $A_0 = A(t), A_1 = A(t + \tau)$

Splitting integrator

The final splitting steps are

  • Step "L": $$\begin{split}L_0 = V_0 S^{\top}_0 \ L_1 = L_0 + (A_1 - A_0)^{\top} U_0 \
     [V_1, S^{\top}_1] = QR(L_1)\end{split}$$
  • Step "K": $$\begin{split} K_0 = U_0 S^{\top}_0 \ K_1 = K_0 + (A_1 - A_0) V_0 \

     [U_1, S_1] = QR(K_1)\end{split}$$
  • Step "L": $$S_1 = S_0 - U^{\top}_0 (A_1 - A_0) V_0$$ (note the minus!) The order of the steps matters

K-S-L order vs. K-L-S order

We can do a short demo of the different orders of splitting by taking two values $A_0, A_1$.


In [4]:
import numpy as np

n = 5
r = 2
m = 5
def random_usv(n, m, r):
    u = np.random.randn(n, r)
    u = np.linalg.qr(u)[0]
    v = np.random.randn(m, r)
    v = np.linalg.qr(v)[0]
    s = np.random.randn(r, r)
    return u, s, v
u0, s0, v0 = random_usv(n, m, r)
u1, s1, v1 = random_usv(n, m, r)

#Generate A0, A1 from the manifold
A0 = u0.dot(s0).dot(v0.T)
A1 = u1.dot(s1).dot(v1.T)
u = u0.copy(); v = v0.copy(); s = s0.copy()
# K-step
K = u.dot(s); K = K + (A1 - A0).dot(v)
u, s = np.linalg.qr(K)
# S-step
s = s - u.T.dot(A1 - A0).dot(v)
# L-step
L = v.dot(s.T); L = L + (A1 - A0).T.dot(u)
v, s = np.linalg.qr(L)
s = s.T

Appr = u.dot(s).dot(v.T)
print 'Error:', np.linalg.norm(A1 - Appr)


Error: 1.69076748252e-15

A note on minus

(O., Lubich, Tubingen 2012)

Exactness result

If $A_0 = U_0 S_0 V^{\top}_0$ and $A_1$ is of rank $r$, the projector-splitting scheme in the KSL order is exact.

Generalizing to dD

  • How to generalize to many dimensions?

  • Then we have to replace "low-rank approximation" by something.

  • High-dimensional problems are actually the main application of DLR

Canonical format: bad idea

A simplest (but naive) way to generalize low-rank to many dimensions is to use sum-of-products defintion of the matrix rank:

$$ A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha) $$

This is called canonical format and canonical rank

The canonical format can be unstable and in general can not be efficiently computed!

SVD-based formats: good idea

In 2009 active development of stable tensor formats began.

Two "similar" formats we proposed

  • Tree Tucker (O., Tyrtyshnikov) -> Tensor Train (O.),
    later found as Matrix Product States (MPS) in physics
    (White, Schwollwock, Vidal, Verstraete)
  • Hierarchical Tucker (Hackbusch, Grasedyck), later found as ML-MCTDH (Meyer, Manthe) in chemistry

Tensor-train format

A tensor is said to be in the TT-format, if

$$X(i_1, \ldots, i_d) = C_1(i_1) \ldots C_d(i_d),$$

where $C_k(i_k)$ is an $r_{k-1} \times r_k$ matrix for each fixed $i_k$ and $r_0 = r_d = 1$.

Some notations for the TT-format

Notation is hard; we use one from (Lubich, Vandreycken, O., 2014).

We can introduce left product of cores and right product of cores

$$ \matX_{\le k} (i_1,\dots,i_k,:) = \matC_1(i_1) \cdots \matC_k(i_k) $$

and the right partial product $\matX_{\ge k+1} \in \R^{n_{k+1} \times \cdots \times n_d \times r_k}$ as

$$ \matX_{\ge k+1} (i_{k+1},\dots,i_d,:) = \matC_{k+1}(i_{k+1} \cdots \matC_d(i_d). $$

Some notations (2)

The k-th unfolding satisfies

$$\mathrm{Ten}_k(X) = \matX_{\leq k} \matX_{\geq k+1}^{\top}$$

and there is a recursive connection

$$\matX_{\ge k} = ( \matX_{\ge k+1} \otimes \matI_{n_k} ) \matC_K^>$$

This forms the basis for the orthogonalization:

Given QR decomposition of $\matX_{\ge k}$ it is very simple
to orthogonalize $\matX_{\ge k+1}$ just by orthogonalizing the next core from left to right.

Dynamical low-rank in the TT-format

The DLR problem for the TT-format is formulated in the same way!

Projector-splitting in the dD-case

$$ \frac{Y(t)}{dt} = P_{Y(t)}(\frac{dA(t)}{dt}), \qquad Y(t_0)=Y_0\in \mathcal{M}, $$$$ P_X =\sum_{k=1}^{d-1} (P_{\le k-1} P_{\ge k+1} - P_{\le k} P_{\ge k+1}) + P_{\le d-1} P_{\ge d+1}, $$$$ \begin{split} &P_{\le k}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_k ( \matP_{\le k} \matZ^{\langle k \rangle})\\ &P_{\ge i}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_{k-1} ( \matZ^{\langle k-1 \rangle} \matP_{\ge k} ) . \end{split} $$

$ \matP_{\le k} = \matQ_{\le k}\matQ_{\le k}^\top, \qquad\hbox{and}\qquad \matP_{\ge k} = \matQ_{\ge k}\matQ_{\ge k}^\top, $

The scheme of the algorithm

The algorithm is organized through sweeps

Solving equations

It is very easy to use KSL for solving equations!

$$ \frac{dY}{dt} = F(Y), $$

you just replace $\frac{dA}{dt}$ by $F(Y)$.

In the case of linear $F(Y)$, the local system for the cores will be also linear,
which is a huge advantage over other methods!

Numerical experiments

Now we can go to numerical experiments

Application: Vibrational spectra

We compare the TT-KSL scheme with the MCTDH package for the benchmark problem with Henon-Heiles potential \begin{equation} \frac{d\psi}{dt} = i H \psi, \quad \psi(0) = \psi_0, \end{equation} where $H$ has the form \begin{equation} \def\Hlap{-\frac12 \Delta} \def\Hhar{\frac12 \sum_{k=1}^f q^2_k} \def\Hanh{\sum_{k=1}^{f-1}\left(q^2_k q_{k+1} - \frac13 q^3_{k+1} \right)} H = \Hlap + \underbrace{\Hhar + \overbrace{\lambda \Hanh}^{\textrm{anharmonic part}}}_{\textrm{Henon-Heiles potential}~V(q_1,\ldots,q_f)}. \end{equation}

Spectra comparison

Spectra for a 10D problem

</div>

Zoomed spectra for a 10D problem

</div>

Computational time: 54354 sec (MCTDH) vs 4425 sec (TT-KSL)

Conclusions

  • A stable explicit scheme for DLR based on QR-decomposition with very cheap step
  • Direct applications (quantum spin systems, chemical master equation)
  • Indirect applications (projection of the update, optimization, ...)

In [3]:
from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[3]: