In [1]:

    
from IPython.html.services.config import ConfigManager
from IPython.utils.path import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
              'theme': 'sky',
              'transition': 'zoom',
              'start_slideshow_at': 'selected',
})
%load_ext tikzmagic









    



/Users/ivan/anaconda/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
/Users/ivan/anaconda/lib/python2.7/site-packages/IPython/utils/path.py:282: UserWarning: locate_profile has moved to the IPython.paths module
  warn("locate_profile has moved to the IPython.paths module")

Low-rank and time integration

Ivan Oseledets

Skolkovo Institute of Science and Technology

Based on joint work with C. Lubich, B. Vandereycken, D. Kolesnikov

Outline of the talk

I would like to talk about three results:

A new method for the low-rank solution of the Lyapunov equation
Time integration of tensor trains (dynamical low-rank approximation, projector-splitting scheme)
Curvature-independent convergence on low-rank manifolds (we need to rethink our understanding of topology of low-rank manifolds!)

Lyapunov equation

Consider continious-time Lyapunov equation

$$AX + XA^{\top} = -y_0 y^{\top}_0$$

We want to compute low-rank approximation to the solution:

$$X \approx U Z U^{\top}, \quad U \in \mathbb{R}^{n \times r}, \quad U^{\top} U = I.$$

Lyapunov equation as a minimization problem

$ \def\vec{\mathop{\mathrm{vec}}\nolimits} \def\trace{\mathop{\mathrm{tr}}\nolimits} \def\rk{\mathop{\mathrm{rank}}\nolimits} \def\Span{\mathop{\mathrm{span}}\nolimits} $ A typical way to define ``best'' rank-$r$ approximation is to formulate as a minimization problem

$$R(X) \rightarrow \min,$$

where typically

$$ T_1(X) = \Vert AX + XA^{\top} - y_0 y^{\top}_0 \Vert_F. $$

For $A = A^{\top} > 0$ we can use the energy functional

$$ T_2(X) = \trace X A X + \trace X y_0 y^{\top}_0. $$

For $A$ being a discrete differential operators, these can be ``ill-conditioned'' functionals!

Our approach

In Kolesnikov, Oseledets, arxiv 1410.3335

Motivated by the seminal paper by Y. Saad, we propose to use the following functional:

$$F(U) = \int_{0}^{\infty} \Vert y - \widehat{y} \Vert^2 dt,$$

where $y = e^{At} y_0, \widehat{y} = U e^{Bt} U^{\top} y_0, B = U^{\top} A U$

Connection to the Lyapunov equation

Well-known in model order reduction (however for this particular case we still miss the exact reference). Consider an ODE $$\frac{dy}{dt} = Ay, \quad y(0) = y_0.$$ And look for the optimal subspace $$y(t) \approx \widetilde{y} = U c(t)$$. The error measure: $$F_0(U) = \int^{\infty}_0 \Vert y - \widetilde{y} \Vert^2 dt,$$ best $U$ can be obtained from $r$ leading eigenvectors of $X$ $$AX + XA^{\top} = -y_0 y^{\top}_0, \quad F_0(U) = \Vert X - X_r \Vert.$$

Our functional

Given $U$, the optimal $c(t) = U^{\top} y(t)$ is not computable.

We replace it with the Galerkin projection:

$$\frac{dc}{dt} = B c, \quad c(0) = c_0, \quad B = U^{\top} A U, \quad c_0 = U^{\top} y_0.$$

And introduce the final functional

$$F(U) = \int_{0}^{\infty} \Vert y - \widehat{y} \Vert^2 dt,$$

where $y = e^{At} y_0, \widehat{y} = U e^{Bt} U^{\top} y_0, B = U^{\top} A U$

It can be computed!

This functional can be computed

$$ F(U) = \mathrm{tr}(X) - 2 \mathrm{tr}(U^{\top}(P - UZ)) - \mathrm{tr}(Z)$$

where $$B Z + Z B^{\top} = c_0 c^{\top}_0, \quad A P + P B^{\top} = y_0 c^{\top}_0, \quad c_0 = U^{\top} y_0.$$

In principle, we can use optimization over Stiefel manifold now, but we do not know the rank.

Solution accuracy

If $F(U)$ is small, we get the approximation to the solution ($P$ is the solution of the Sylvester equation).

$$\Vert X - UP^{\top} - PU^{\top} - UZU^{\top} \Vert_F \leq F(U).$$

``Doubling'' method

To adapt the rank, we can ``fix-point'' iteration:

Solve for $$A P + P B^{\top} = -y_0 c^{\top}_0$$

Add $P$ to the basis:

$$U := \mathrm{orth}[U, P]$$

(Similar to IRKA, Sorensen...);

A rational Krylov method in disguise, Too many linear solvers ($r$ at step)

Useful residual theorems

$$T_1(U) = \Vert A UZU^{\top} + UZU^{\top} A^{\top} - y_0 y^{\top}_0 \Vert$$$$T_2(U) = \Vert A UZ + UZ B^{\top} - y_0 c^{\top}_0 \Vert$$

Theorem
If $UU^{\top} y_0 = y_0$, then

$$T_1(U) = \sqrt{2} T_2(U) = \sqrt{2} \Vert (AU - UB) Z \Vert.$$

Based on this theorem, it is a good idea to add Krylov & Rational Krylov vector at each step.

Additing Krylov & Rat. Krylov

Theorem
If we add a rational Krylov vector and a Rational Krylov vector at each step to $U$, the residual of the Sylvester equation stays rank-$1$:

$$A P_1 + P_1 B^{\top} = w q^{\top},$$

where $P_1 = P - UZ$.

Alternating low-rank (ALR) method

Assume rank-$1$ approximation to $P_1$, with $P_1 \approx v q^{\top}$, $q = \frac{z}{\Vert z \Vert}$.

It leads to the following equation for $v$:

$$(A + (q^{\top} B^{\top} q) I) v = w.$$

Note the remarkably simple formula for the new shift!

We only need to solve the small Lyapunov equation ($\mathcal{O}(r)^3$ cost) $$BZ + ZB^{\top} = c_0 c^{\top}_0$$ and take the last column of $Z$ (and normalize it).

Comparison with other methods

RKSM/ERKSM method (Druskin, Simoncini), adapt the shifts globally
KPIK / extended Krylov (Knizhnermann, Druskin, Simoncini), take $A$, $A^{-1}$ spaces

The shifts produced by ALR lies in the range of $B = U^{\top} A U$, thus are much less spread than those for RKSM;
we can reuse the AMG hierarchy for the shifted system.

Numerical example

$$ Lu = u_{xx} + u_{yy} + u_{zz} - 10x u_{x} - 1000y u_{y} - u_{z}$$

with Dirichlet boundary conditions on the unit cube using 7-point stencil operator.

$$ \begin{array}{lllllr} \textrm{grid} &\textrm{method} &\textrm{precomp.} &\textrm{1 solver time} &\textrm{final time} &\textrm{it-s/rank}\\ &ALR & -- &0.6424 &3.2881 &6/13 \\ &ALR(AMG) &0.2231 &0.0466 &\bf{0.5020} &6/13 \\ &KPIK(LU) &0.6687 &\bf{0.0086} &0.8351 &9/19 \\ &KPIK(AMG) &0.2231 &0.0436 &0.6787 &9/19 \\ 20\times 20 \times 20 &KPIK & -- &0.8533 &7.0838 &9/19 \\ &RKSM &\bf{0.0265} &0.6610 &5.5919 &9/10 \\ &RKSM(AMG) &0.2496 &0.0491 &0.9061 &9/10 \\ &ERKSM &\bf{0.0265} &0.7303 &4.8876 &7/15 \\ &ERKSM(AMG) &0.2496 &0.0541 &0.9108 &6/13 \\ \hline &ALR & -- &9.2135 &53.2061 &7/15 \\ &ALR(AMG) &0.9827 &0.1987 &\bf{2.3924} &7/15 \\ &KPIK(LU) &10.1743 &\bf{0.0497} &11.0392 &9/19 \\ &KPIK(AMG) &0.9827 &0.2055 &3.0282 &9/19 \\ 30\times 30\times 30 &KPIK & -- &12.7750 &105.1641 &9/19 \\ &RKSM &\bf{0.1215} &9.3640 &85.8903 &10/11 \\ &RKSM(AMG) &1.1042 &0.1792 &3.2822 &10/11 \\ &ERKSM &\bf{0.1215} &9.5372 &58.2637 &7/15 \\ &ERKSM(AMG) &1.1042 &0.1609 &3.2703 &8/17 \\ \end{array} $$

Shifts distribution

The shifts are much less spread!

Summary and part-2

We have a new functional for the Lyapunov equation, and we have a very simple shift-generating procedure (parameter-free).
Now we go from low-rank in "space-time" to low-rank in the space variables (dynamical low-rank approximation)

Dynamical low-rank approximation

Dynamical low-rank rank approximation is a low-rank approximation of time-varying matrix:

$$A(t) \approx Y(t) = U(t) S(t) V^{\top}(t), $$

where $U(t) \in \mathbb{R}^{n \times r}, V(t) \in \mathbb{R}^{m \times r}, S(t) \in \mathbb{R}^{r \times r}$

Simple DLR

You can use point-wise SVD:

$$\min_{\mathrm{rank}(X(t)) = r}\Vert A(t) - X(t) \Vert $$

but this has some drawbacks:

The dependence is non-smooth in time
The energy is not conserved
No history $A(t)$ is taken into account!

Instead, use Dirac-Frenkel principle

Dirac-Frenkel principle can be used instead:

$$ \langle \frac{dA}{dt} - \frac{dX}{dt}, Z \rangle_F = 0, $$

for all $Z$ in the tangent space of the manifold $\mathcal{M}_r$ of all matrices of rank $r$,
and tangent space is defined as

$$ Z = \delta U S V^{\top} + U \delta S V^{\top} + U S \delta V^{\top} $$

Equations of motion in 2D case

(Lubich, Koch, Meyer,...)

$$\begin{split} &\frac{dU}{dt} S = \frac{dA}{dt} V,\\ &\frac{dV}{dt} S^{\top} = \frac{dA^{\top}}{dt} U, \\ &\frac{dS}{dt} = U^{\top} \frac{dA}{dt} V. \end{split}$$

Suppose, you start from rank-$1$ and evolve into rank-$10$ case: the matrix $S$ will be singular!

Equations of motion, 2D case (2)

The equations of motion can be rewritten for the full matrix

$$ \frac{dX}{dt} = P_X(\frac{dA}{dt}), $$

where $P_X$ is the projector onto the tangent space

Projector-splitting

Theorem: $P_X$ can be represented as

$$P_X(Z) = Z - (I - UU^{\top}) Z (I - VV^{\top}) = UU^{\top} Z + Z VV^{\top} - UU^{\top} Z VV^{\top},$$$$P_X(Z) = P_1 + P_2 - P_3$$

Integration of the substeps

Theorem (O., Lubich, 2014): The equations with $P_1$, $P_2$, $P_3$ can be efficiently integrated

Step "L": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} \rightarrow \frac{d(VS^{\top})}{dt} = \frac{dA^{\top}}{dt} U, \frac{dU}{dt}=0,$$

Thus $L_1 = L_0 + (A_1 - A_0)^{\top} U_0, U_1 = U_0$.
Step "K": $$\frac{dX}{dt} = \frac{dA}{dt} VV^{\top} \rightarrow \frac{d(US)}{dt} = \frac{dA}{dt} V, \frac{dV}{dt}=0,$$

Thus $K_1 = K_0 + (A_1 - A_0) V_0, V_1 = V_0$
Step "S": $$\frac{dX}{dt} = UU^{\top}\frac{dA}{dt} VV^{\top} \rightarrow \frac{d(S)}{dt} = U\frac{dA}{dt} V, \frac{dV}{dt}=0, \frac{dU}{dt} = 0,$$

Thus $S_1 = S_0 \mathbf{-} U^{\top}_0(A_1 - A_0) V_0, V_1 = V_0, U_1 = U_0$
Here $A_0 = A(t), A_1 = A(t + \tau)$

Splitting integrator

The final splitting steps are

Step "L": $$\begin{split}L_0 = V_0 S^{\top}_0 \ L_1 = L_0 + (A_1 - A_0)^{\top} U_0 \
```
 [V_1, S^{\top}_1] = QR(L_1)\end{split}$$
```
Step "K": $$\begin{split} K_0 = U_0 S^{\top}_0 \ K_1 = K_0 + (A_1 - A_0) V_0 \
```
 [U_1, S_1] = QR(K_1)\end{split}$$
```
Step "L": $$S_1 = S_0 - U^{\top}_0 (A_1 - A_0) V_0$$ (note the minus!) The order of the steps matters

A note on minus

(O., Lubich, Tubingen 2012)

Exactness result

If $A_0 = U_0 S_0 V^{\top}_0$ and $A_1$ is of rank $r$, the projector-splitting scheme in the KSL order is exact.

Generalizing to dD

How to generalize to many dimensions?
Then we have to replace "low-rank approximation" by something.
High-dimensional problems are actually the main application of DLR

Canonical format: bad idea

A simplest (but naive) way to generalize low-rank to many dimensions is to use sum-of-products defintion of the matrix rank:

$$ A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha) $$

This is called canonical format and canonical rank

The canonical format can be unstable and in general can not be efficiently computed!

SVD-based formats: good idea

In 2009 active development of stable tensor formats began.

Two "similar" formats we proposed

Tree Tucker (O., Tyrtyshnikov) -> Tensor Train (O.),
later found as Matrix Product States (MPS) in physics
(White, Schwollwock, Vidal, Verstraete)
Hierarchical Tucker (Hackbusch, Grasedyck), later found as ML-MCTDH (Meyer, Manthe) in chemistry

Tensor-train format

A tensor is said to be in the TT-format, if

$$X(i_1, \ldots, i_d) = C_1(i_1) \ldots C_d(i_d),$$

where $C_k(i_k)$ is an $r_{k-1} \times r_k$ matrix for each fixed $i_k$ and $r_0 = r_d = 1$.

Some notations for the TT-format

$\def\matX {{\boldsymbol{\mathrm{X}}}} \def\matY {{\boldsymbol{\mathrm{Y}}}} \def\matZ {{\boldsymbol{\mathrm{Z}}}} \def\matC {{\boldsymbol{\mathrm{X}}}} $

Notation is hard; we use one from (Lubich, Vandreycken, O., 2014).

We can introduce left product of cores and right product of cores

$$ \matX_{\le k} (i_1,\dots,i_k,:) = \matC_1(i_1) \cdots \matC_k(i_k) $$

and the right partial product $\matX_{\ge k+1} \in \mathbb{R}^{n_{k+1} \times \cdots \times n_d \times r_k}$ as

$$ \matX_{\ge k+1} (i_{k+1},\dots,i_d,:) = \matC_{k+1}(i_{k+1} \cdots \matC_d(i_d). $$

Some notations (2)

The k-th unfolding satisfies

$$\mathrm{Ten}_k(X) = \matX_{\leq k} \matX_{\geq k+1}^{\top}$$

and there is a recursive connection

$$\matX_{\ge k} = ( \matX_{\ge k+1} \otimes \matI_{n_k} ) \matC_K^>$$

This forms the basis for the orthogonalization:

Given QR decomposition of $\matX_{\ge k}$ it is very simple
to orthogonalize $\matX_{\ge k+1}$ just by orthogonalizing the next core from left to right.

Dynamical low-rank in the TT-format

The DLR problem for the TT-format is formulated in the same way!

Projector-splitting in the dD-case

$$ \frac{Y(t)}{dt} = P_{Y(t)}(\frac{dA(t)}{dt}), \qquad Y(t_0)=Y_0\in \mathcal{M}, $$$$ P_X =\sum_{k=1}^{d-1} (P_{\le k-1} P_{\ge k+1} - P_{\le k} P_{\ge k+1}) + P_{\le d-1} P_{\ge d+1}, $$$$ \begin{split} &P_{\le k}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_k ( \matP_{\le k} \matZ^{\langle k \rangle})\\ &P_{\ge i}\colon \R^{n_1 \times \cdots \times n_d} \to T_X \mathcal{M}, \ Z \mapsto \mathrm{Ten}_{k-1} ( \matZ^{\langle k-1 \rangle} \matP_{\ge k} ) . \end{split} $$

$ \matP_{\le k} = \matQ_{\le k}\matQ_{\le k}^\top, \qquad\hbox{and}\qquad \matP_{\ge k} = \matQ_{\ge k}\matQ_{\ge k}^\top, $

The scheme of the algorithm

The algorithm is organized through sweeps

Solving equations

It is very easy to use KSL for solving equations!

$$ \frac{dY}{dt} = F(Y), $$

you just replace $\frac{dA}{dt}$ by $F(Y)$.

In the case of linear $F(Y)$, the local system for the cores will be also linear,
which is a huge advantage over other methods!

Numerical experiments

Now we can go to numerical experiments

Application: Vibrational spectra

We compare the TT-KSL scheme with the MCTDH package for the benchmark problem with Henon-Heiles potential \begin{equation} \frac{d\psi}{dt} = i H \psi, \quad \psi(0) = \psi_0, \end{equation} where $H$ has the form \begin{equation} \def\Hlap{-\frac12 \Delta} \def\Hhar{\frac12 \sum_{k=1}^f q^2_k} \def\Hanh{\sum_{k=1}^{f-1}\left(q^2_k q_{k+1} - \frac13 q^3_{k+1} \right)} H = \Hlap + \underbrace{\Hhar + \overbrace{\lambda \Hanh}^{\textrm{anharmonic part}}}_{\textrm{Henon-Heiles potential}~V(q_1,\ldots,q_f)}. \end{equation}

Spectra comparison

Spectra for a 10D problem

</div>

Zoomed spectra

</div>

Computational time: 54354 sec (MCTDH) vs 4425 sec (TT-KSL)

Conclusions

A stable explicit scheme for DLR based on QR-decomposition with very cheap step
Direct applications (quantum spin systems, chemical master equation)
Indirect applications (projection of the update, optimization, ...)

Projector-splitting for optimization

The projector-splitting scheme has high potential for optimization, since it is discrete.

Given a process $x_{k+1} = x_k + F_k$, we create a manifold version

$$x_{k+1} = I(x_k, F_k).$$

What about the convergence?
Typical estimates use the curvature of the manifold; for low-rank matrices the curvature is given by the condition number,

I.e. can be arbitrary high. In practice, it makes no difference.

Manifold iteration is better

$$x = Qx + f, \quad \Vert Q \Vert \leq \delta, $$ Solution $x^*$ has rank $5$ with $3$ large singular values, one $10^{-3}$ of the largest, one $10^{-6}$ of the largest. Blue : original gradient method, Green : Manifold-projected gradient, Cyan : The normal component of the error

Papers and software

Software

TT-Toolbox: https://github.com/oseledets/ttpy (Python)
https://github.com/oseledets/TT-Toolbox (MATLAB) Papers
Christian Lubich and Ivan V. Oseledets. A projector-splitting integrator for dynamical low-rank approximation. BIT, 54(1):171–188, 2014
Christian Lubich, Ivan Oseledets, and Bart Vandereycken. Time integration of tensor trains. arXiv preprint 1407.2042, 2014., published in SINUM
Jutho Haegeman, Christian Lubich, Ivan Oseledets, Bart Vandereycken, and Frank Verstraete. Unifying time evolution and optimization with matrix product states. arXiv preprint 1408.5056, 2014.



In [2]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[2]:



In [ ]: