In [2]:

from IPython.html.services.config import ConfigManager
from IPython.paths import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
'theme': 'sky',
'transition': 'zoom',
'start_slideshow_at': 'selected',
})




Out[2]:

{u'scroll': True,
u'start_slideshow_at': 'selected',
u'theme': 'sky',
u'transition': 'zoom'}




In [1]:

#import
import tt
import numpy as np
import scipy.linalg as la
import scipy.sparse as ssp
import scipy.sparse.linalg as sla
import math

#Graphics
import matplotlib.pyplot as plt
#import prettyplotlib as ppl
%matplotlib inline



# Tensor decompositions: new results and open problems

## Tensor decompositions

The idea of tensor decompositions is based on the idea of separation of variables.

Rank-$1$ tensor: $$A(i_1, \ldots, i_d) \approx U_1(i_1) \ldots U_d(i_d),$$

Canonical rank-$r$ tensor: $$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha),$$

Tensor-train (MPS): $$A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d),$$ where $G_k(i_k)$ is an $r_{k-1} \times r_k$ matrix.

Tree-tensor networks (include H-Tucker, TT) are the most general stable representation class.

## (Well-known) properties of the tensor formats

• Canonical format is in general an unstable representation; the set of tensors with bounded canonical rank does not form a manifold
• SVD-based format with fixed ranks form a Riemannian manifold, and come with a bunch of stable algorithms

## Tensor-train format properties:

• Fast linear algebra
• Generalization of the SVD (TT-SVD, Vidal algorithm)
• Rounding proceduring in $\mathcal{O}(dnr^3)$ complexity
• Exact recovery from samples of low-rank tensors (cross approximation)
• Software available for MATLAB & Python for testing and development purposes.

## Challenges

There are several challenges in tensor decompositions:

1. Make it work for the particular problem: right parametrization/reshaping into a tensor (i.e. similar
2. Develop black-box optimization over low-rank manifolds, when you know that the solution is in the right format
3. Investigate theoretical convergence properties: the current analysis is insufficient for the convergence of ALS-type methods.

## New result: tensors for the multiscale problems

One of the most interesting applications of the tensor methods is the direct resolution in multiscale problems with very fine meshes.

[Nice picture]

## Model problem

As a model problem, consider a diffusion problem with a highly oscillating coefficient

$$\nabla \cdot k \nabla u = f,$$

in $[0, 1], u_{\partial \Omega} = 0.$

for simplicity, let us focus on the 1D case, and suppose two-scale problem:

$$k = K_1(x) K_2\left(\frac{x}{\epsilon}\right),$$

where $K_2(y)$ is a 1-periodic function, $K_1(x)$ is smooth, $\epsilon \ll 1$.

## The scheme:

The ideal tensor scheme:

Introduce a very fine mesh with $n = 2^d$ points, use the simplest low-order finite-element method

take a vector of value of

## The theory

The theory in our recent paper Kazeev, Schwab, Rakbuba, Oseledets shows that the solution can be approximated in with QTT-FEM format with optimal complexity.

## The solver

We put the matrix into the QTT-format, the right-hand side into the QTT-format,

and have a linear system

$$Ax = f$$

where we know that the solution can be well-approximated in the QTT-format.

A standard way is to reformulate the problem as a minimization problem of the functional $F(x)$

$$F(x) = <Ax, x> - 2<f, x>,$$

and minimize over the set of all tensors with bounded ranks. There is a very nice black-box software for doing that (ALS, DMRG, AMEN method by Dolgov & Savostyanov).

## Practice:

Does not work even for $k = 1$!



In [25]:

import numpy as np
import tt
from tt.amen import amen_solve
import matplotlib.pyplot as plt
%matplotlib inline

d = 10
n = 2 ** d b
a = tt.qlaplace_dd([d]) * (n + 1) ** 2
rhs = tt.ones(2, d)

sol = amen_solve(a, rhs, rhs, 1e-6, verb=1)
#print (tt.matvec(a, sol) - rhs).norm()/rhs.norm()
t = np.linspace(0, 1, n+2)[1:-1]
sol_analyt = 0.5 * t * (1 - t)
#plt.plot(t, sol.full().flatten("F"))
plt.plot(t, sol_analyt - sol.full().flatten("F"), 'r')
plt.title('Solution for d={0: d}'.format(d))




Out[25]:

<matplotlib.text.Text at 0x7fe865097450>



## What is the source of the problem:

We have the discretization error and the floating point error:

$$\frac{C\epsilon}{h^2} + h^2$$

and this is the minimum for

$$h \sim \epsilon^{1/4} \approx 10^{-4}$$

for $\epsilon = 10^{-16}$ (btw. this is why you never use single precision in FEM).

Standard FEM can not allow to go beyond $h = 10^{-4}$, but we need in tensor methods to go to astronomically large sizes $n$.

## Discretization for astronomically large $n$.

We have come up with a simple'' scheme that solves the problem.

In 1D, we have new variable:

$$\frac{d}{dx} k \frac{d}{dx} u = f,$$

we introduce

$$u_x = \frac{du}{dx}, \quad u = \int^x_0 u_x(t) dt.$$

and replace this by a discretization

$$u = B u_x,$$

where $B$ is a lower triangular Toeplitz matrix with all elements equal to $h$.

$$k u_x = B f + \alpha, \quad e^{\top} u_x = 0.$$

## 2D: the idea

In two dimensions, we introduce in a similar way unknowns $u_x$, $u_y$,

which are connected:

$$u = (B \otimes I) u_x, \quad u = (I \otimes B) u_y, \quad \mbox{+ b.c.}.$$

## Final equation

The final equation for $\nabla \cdot k \nabla$ has the form $$\begin{split} &M_x D^{-1}_x M_x^{\top} \mu + M_y D^{-1}_y M_y^{\top} \mu - M_x D^{-1}_x S^{\top}_x Q^{-1}_x S_x D_x^{-1} M_x^{\top} \mu \ &- M_y D^{-1}_y S^{\top}_y Q^{-1}_y S_y D^{-1}_y M_y^{\top} \mu = M_x D^{-1}_x M_x^{\top} f  - M_x D^{-1}_x S^{\top}_x Q^{-1}_x S_x D_x^{- 1} M_x^{\top} f,  \end{split}$$ where $D_x$, $D_y$, $Q_x$, $Q_y$ are auxiliary diagonal matrices,

and that is the system solved by AMEN method.

## Typical convergence

The QTT-ranks stay constant, solution time is of order of ten seconds ($d = 30$ total number of virtual unknowns $2^{60}$.)

<img src='simple_ref.png' /img>

## Summary

Special discretization are needed for QTT-tools to be efficient in the PDE setting.

Theory is available for 2D elliptic PDEs in polygonal domain (Kazeev & Schwab) and for 1D multiscale (O., Rakhuba, Kazeev, Schwab).

Multiscale experiments also done in the paper by Khoromskij & Repin

## High-dimensional eigenvalue problem

In vibrational computations, it is needed to find the minimal eigenvalues of the molecular Schrodinger operator,

$$H \Psi_k = E_k \Psi_k, \quad H = \frac{1}{2} \Delta + V,$$

where potential is already given in the sum-of-product form (typically, polynomial expansion).

Discretization on a tensor-product grid yield an $N$-dimensional tensor, where $N$ is the number of molecular degrees of freedom.

## New approach

Run Locally Optimal Block Conjugate Method (LOBPCG) by Knyazev with iterates stored separatedly in the TT-format.

Need:

• Block Matvec: fast approximate available in the TT-Toolbox
• Orthogonalization: done via Gram matrix, can be done inexactly.

Key idea:

Fine-tune with inexact inverse iteration

## Fine-tuning with inexact inverse iteration

Given a guess $\lambda_i$ to the eigenvalue we solve

$$(A - \lambda_i I) x_i^{(k+1)} = x^{(k)}_i$$

by running one sweep of ALS.

It is very cheap, but it works extremely well.

## Key idea explained again

Typically, in the AMEN approach the gradient direction is used to enhance the optimization method;

In the new approach, we use ALS to precondition the outer tensor-structured iteration.

## Results

<img src='comparison_smolyak.png' /img>

## Results (2)

<img src='lobpcg.png' /img>

## Open problems

One of the most interesting open problem is the convergence analysis of the optimization algorithm on the manifold of low-rank matrices and tensors.

Even for matrices it is not yet understood, and requires experts from topology and differential geometry.

## Optimization problems examples

• Linear systems: $F(X) = \langle A(X), X \rangle - 2 \langle F, X\rangle.$
• Eigenvalue problems $F(X) = \langle A(X), X \rangle, \mbox{s.t.} \Vert X \Vert = 1.$
• Tensor completion: $F(X) = \Vert W \circ (A - X)\Vert.$

My claim is that we need better understanding of how these methods converge!

## General scheme

The general scheme is that we have some (convergent method) in a full space:

$$X_{k+1} = \Phi(X_k),$$

and we know that the solution $X_*$ is on the manifold $\mathcal{M}$.

Then we introduce a Riemannian projected method

$$X_{k+1} = R(X_k + P_{\mathcal{T}} \Phi(X_k)),$$

where $P_{\mathcal{T}}$ is a projection on the tangent space, and $R$ is a retraction.

## Riemannian optimization is easy

The projection onto the tangent space is trivial for low-rank (and is not difficult for TT)

$$P_{T}(X) = X - (I - UU^{\top}) X (I - VV^{\top}).$$

For the retraction, there many choices (see the review Absil, O., 2014).

## Projector-splitting scheme

One of the simplest second-order retractions is the projector-splitting (or KSL) scheme

Initially proposed as a time integrator for the dynamical low-rank approximation (C. Lubich, I. Oseledets, A projector-splitting... 2014) for matrices and (C. Lubich, I. Oseledets, B. Vandreycken, Time integration of tensor trains, SINUM, 2015).

Reformulated as a retraction in (Absil, Oseledets)

Has a trivial form: a half-step (or full) step of the Alternating least squares (ALS).

## The simplest retraction possible:

The projector-splitting scheme can be implemented in a very simple "half-ALS" step.

$$U_1, S_1 = \mathrm{QR}(A V_0), \quad V_1, S_2^{\top} = \mathrm{QR}(A^{\top} U_1).$$

## Projection onto the tangent space not needed!

$$X_{k+1} = I(X_k, F),$$

where $I$ the is the projector-splitting integrator, and $F$ is the step for the full method.

Now, instead of $$X_{k+1} = \Phi(X_k), \quad X_* = \Phi(X_*),$$

and $X_*$ is on the manifold, we have the manifold-projected process

$$Y_{k+1} = I(Y_k, \Phi(Y_k) - Y_k).$$

## Linear manifold

Suppose the linear case, $$\Phi(X) = X + Q(X), \quad \Vert Q \Vert < 1.$$

and $M$ is a linear subspace. Then the projected method is always not slower.

On a curved manifold, the curvature of the manifold plays crucial role in the convergence estimates.

## Curvature of fixed-rank manifold

The curvature of the manifold of matrices with rank $r$ at point $X$ is equal to the $$\Vert X^{-1} \Vert_2,$$

i.e. the inverse of the minimal singular value.

In practice we know, that zero singular values for the best rank-r approximation do not harm:

Approximation of a rank-$1$ matrix with a rank-$2$ matrix is ok (block power method.)

## Experimental convergence study

Our numerical experiments confirm that the low-rank matrix manifold behaves typically like a linear manifold, i. e.

projected gradient is almost always faster, independent of the curvature.



In [26]:

#2d case functions
#u, s, v = x
#u0, s0, v0 = x0
#u_new = np.linalg.qr(np.hstack((u, u0)))[0]
#v_new = np.linalg.qr(np.hstack((v, v0)))[0]
#s_new = u_new.T.dot(u).dot(s).dot(v.T.dot(v_new)) - u_new.T.dot(u0).dot(s0).dot(v0.T.dot(v_new))
return x0 - A.dot(full(x).flatten()).reshape(x0.shape)
#return (u_new, s_new, v_new)

#it is Frobenius norm
def get_norm(x):
u, s, v = x
return la.norm(s)
#return math.sqrt(np.trace((u.T.dot(u)).T.dot(v.T.dot(v))))

def check_orthogonality(u):
if la.norm(u.T.dot(u) - np.eye(u.shape[1])) / math.sqrt(u.shape[1]) < 1e-12:
return True
else:
return False

def orthogonalize(x):
u, s, v = x
u_new, ru = np.linalg.qr(u)
v_new, lv = np.linalg.qr(u)
s_new = ru.dot(s.dot(lv))
return (u_new, s_new, v_new)

def diagonalize_core(x):
u, s, v = x
ls, s_diag, rs = la.svd(s)
return (u.dot(ls), np.diag(s_diag), v.dot(rs))

def func(x, x0):

def full(dx):
return dx[0].dot(dx[1].dot(dx[2].T))

def projector_splitting_2d(x, dx, flag_dual=False):
n, r = x[0].shape
u, s, v = x[0].copy(), x[1].copy(), x[2].copy()
if not flag_dual:
u, s = np.linalg.qr(u.dot(s) + dx.dot(v))
s = s - u.T.dot(dx).dot(v)
v, s = np.linalg.qr(v.dot(s.T) + dx.T.dot(u))
s = s.T
else:
v, s = np.linalg.qr(v.dot(s.T) + dx.T.dot(u))
s = s.T
s = s - u.T.dot(dx).dot(v)
u, s = np.linalg.qr(u.dot(s) + dx.dot(v))

return u, s, v

def inter_point(x, dx):
u, s, v = x
u, s = np.linalg.qr(u.dot(s) + dx.dot(v))
#dx - (I - uu') dx (I - vv') = uu' dx + dx * vv' - uu' dx vv'
dx_tangent = u.dot(u.T.dot(dx)) + (dx.dot(v)).dot(v.T) - u.dot(u.T.dot(dx).dot(v)).dot(v.T)

return u.copy(), v.copy(), dx_tangent

def minus(x1, x2):
u1, s1, v1 = x1
u2, s2, v2 = x2
u_new = np.linalg.qr(np.hstack((u1, u2)))[0]
v_new = np.linalg.qr(np.hstack((v1, v2)))[0]
s_new = u_new.T.dot(u1).dot(s1).dot(v1.T.dot(v_new)) - u_new.T.dot(u2).dot(s2).dot(v2.T.dot(v_new))
return u_new, s_new, v_new

def ps_proj(x, dx):
return full(projector_splitting_2d(x, dx)) - full(x)

def rotation(u):
return la.qr(u)[0]



## Linear test case

$$\Phi (X) = X + A(X) - f,$$

where $A$ is a linear operator on matrix space, $\Vert A - I\Vert = \delta$,

$f$ is known right-hand side and $X_*$ is the solution of linear equation $A(X_*) = f$.

This problem is equivalent to the minimization problem of the quadratic functional $F(X) = \langle A(X) - f, X \rangle.$

## Setting up the experiment

Generate some random matrices...



In [27]:

#Init sizes
n, r, r0 = 40, 7, 7

M = n * n
Q = np.random.randn(M, M)
Q = Q + Q.T
Q = (Q/np.linalg.norm(Q, 2)) * 0.8 # contraction coefficient
A = np.eye(M) + Q



## Generic case

In the generic case, the convergence is as follows.



In [29]:

# Case 1: Projector-Splitting versus Gradient descent
#Random initialization
x_orig = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start, x_orig = orthogonalize(x_start), orthogonalize(x_orig)
x_orig = diagonalize_core(x_orig)
print 'The singular values of fixed point matrix are \n', np.diag(x_orig[1])
f = full(x_orig)
f = A.dot(f.flatten()).reshape(f.shape)
grad_dist, orth_proj_norm, tangent_proj_norm = [], [], []
k = 50
x = full(x_start)
for i in xrange(k):
dx = f - (A.dot(x.flatten())).reshape(x.shape)
x = x + dx
# Projector Splitting Convergence
x = x_start
dx_orig = full(x)-full(x_orig)
for i in xrange(k):
u1, v, dx_tangent = inter_point(x, dx_orig)
dx_orig = full(x)-full(x_orig)
dx_orig_tangent = u1.dot(u1.T.dot(dx_orig)) + (dx_orig.dot(v)).dot(v.T) - u1.dot(u1.T.dot(dx_orig).dot(v)).dot(v.T)
orth_proj_norm.append(la.norm(dx_orig - dx_orig_tangent))
tangent_proj_norm.append(la.norm(dx_orig_tangent))
x = projector_splitting_2d(x, dx)
# Plotting
plt.semilogy(orth_proj_norm, marker='o', label="Orthogonal projection")
plt.semilogy(tangent_proj_norm, marker='o', label="Tangent projection")
plt.legend(bbox_to_anchor=(0.40, 1), loc=2)#(1.05, 1), loc=2
plt.xlabel('Iteration')
plt.ylabel('Error')




The singular values of fixed point matrix are
[ 147.80354057  140.06760243  111.44163613   87.57279904   63.16382999
29.52707096    1.3109238 ]

Out[29]:

<matplotlib.text.Text at 0x7fe864fc2ed0>



Now let us design bad singular values for the solution,

$$\sigma_k = 10^{2-2k}.$$


In [30]:

# Case 2: Stair convergence
x_orig = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start, x_orig = orthogonalize(x_start), orthogonalize(x_orig)
x_orig = diagonalize_core(x_orig)
u, s, v = x_orig
s_diag = [10**(2-2*i) for i in xrange(r)]
x_orig = (u, np.diag(s_diag), v)
print 'The singular values of fixed point matrix are \n', s_diag
f = full(x_orig)
f = A.dot(f.flatten()).reshape(f.shape)
grad_dist, orth_proj_norm, tangent_proj_norm = [], [], []
k = 50
x = full(x_start)
for i in xrange(k):
dx = f - (A.dot(x.flatten())).reshape(x.shape)
x = x + dx
# Projector Splitting Convergence
x = x_start
for i in xrange(k):
u1, v, dx_tangent = inter_point(x, dx_orig)
dx_orig = full(x)-full(x_orig)
dx_orig_tangent = u1.dot(u1.T.dot(dx_orig)) + (dx_orig.dot(v)).dot(v.T) - u1.dot(u1.T.dot(dx_orig).dot(v)).dot(v.T)
orth_proj_norm.append(la.norm(dx_orig - dx_orig_tangent))
tangent_proj_norm.append(la.norm(dx_orig_tangent))
x = projector_splitting_2d(x, dx)
# Plotting
plt.semilogy(orth_proj_norm, marker='o', label="Orthogonal projection")
plt.semilogy(tangent_proj_norm, marker='o', label="Tangent projection")
plt.legend(bbox_to_anchor=(0.40, 1), loc=2)#(1.05, 1), loc=2
plt.xlabel('Iteration')
plt.ylabel('Error')




The singular values of fixed point matrix are
[100, 1, 0.01, 0.0001, 1e-06, 1e-08, 1e-10]

Out[30]:

<matplotlib.text.Text at 0x7fe864d55ed0>



## Consequences

The typical convergence:

• Tangent component convergence linearly.
• Normal components decays quadratically until it hits the next singular values, and then waits for the tangent component to catch.
• The convergence is monotone: First the first singular vector converge, then the second and so on.

Adversal examples are still possible (similar to the convergence of the power method).

## Case when Riemannian is worse

You can make up an example, when on the first iteration it is worse: it is basically the angle between $V_*$ and $V_0$.



In [11]:

# Case 3: Stair convergence
x_orig = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start = np.random.randn(n, r0), np.random.randn(r0, r0), np.random.randn(n, r0)
x_start, x_orig = orthogonalize(x_start), orthogonalize(x_orig)
eps = 1e-12
u, s, v = x_orig
u1, s1, v1 = x_start
u1 = u1 - u.dot(u.T.dot(u1))
v1 = v1 - v.dot(v.T.dot(v1))
x_start = u1, s1, v1
x_orig = diagonalize_core(x_orig)
print 'The singular values of fixed point matrix are \n', s_diag
f = full(x_orig)
f = A.dot(f.flatten()).reshape(f.shape)
k = 10
x = full(x_start)
for i in xrange(k):
dx = f - (A.dot(x.flatten())).reshape(x.shape)
x = x + dx
# Projector Splitting Convergence
x = x_start
for i in xrange(k):
u1, v, dx_tangent = inter_point(x, dx_orig)
dx_orig = full(x)-full(x_orig)
proj_dist.append(la.norm(dx_orig))
x = projector_splitting_2d(x, dx)
# Plotting
plt.semilogy(proj_dist, marker='o', label="Projector Splitting")
plt.legend(bbox_to_anchor=(0.40, 1), loc=2)#(1.05, 1), loc=2
plt.xlabel('Iteration')
plt.ylabel('Error')




The singular values of fixed point matrix are
[100, 1, 0.01, 0.0001, 1e-06, 1e-08, 1e-10]

Out[11]:

<matplotlib.text.Text at 0x111e52250>



## Convergence proof

In a recent paper with Denis Kolesnikov we managed to prove the estimates on the convergence.

http://arxiv.org/abs/1604.02111

## Summary

We do not understand the topology of the manifold (i.e. it behaves almost always like a manifold without singular points!)

## Other open problems

Important problems include solving non-symmetric systems, when minimization with residual squares the condition number.

Many applications (include machine learning, recommender systems, ..) are in progress.

## Papers and software

Software

oseledets.github.io



In [31]:

from IPython.core.display import HTML
def css_styling():
return HTML(styles)
css_styling()




Out[31]:

@font-face {
font-family: "Computer Modern";
src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
}
div.cell{
/*width:80%;*/
/*margin-left:auto !important;
margin-right:auto;*/
}
h1 {
font-family: 'Alegreya Sans', sans-serif;
}
h2 {
font-family: 'Fenix', serif;
}
h3{
font-family: 'Fenix', serif;
margin-top:12px;
margin-bottom: 3px;
}
h4{
font-family: 'Fenix', serif;
}
h5 {
font-family: 'Alegreya Sans', sans-serif;
}
div.text_cell_render{
font-family: 'Alegreya Sans',Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
line-height: 1.2;
font-size: 120%;
/*width:70%;*/
/*margin-left:auto;*/
margin-right:auto;
}
.CodeMirror{
font-family: "Source Code Pro";
font-size: 90%;
}
/*    .prompt{
display: None;
}*/
.text_cell_render h1 {
font-weight: 200;
font-size: 50pt;
line-height: 110%;
color:#CD2305;
margin-bottom: 0.5em;
margin-top: 0.5em;
display: block;
}
.text_cell_render h5 {
font-weight: 300;
font-size: 16pt;
color: #CD2305;
font-style: italic;
margin-bottom: .5em;
margin-top: 0.5em;
display: block;
}

li {
line-height: 110%;
}
.warning{
color: rgb( 240, 20, 20 )
}

MathJax.Hub.Config({
TeX: {
extensions: ["AMSmath.js"]
},
tex2jax: {
inlineMath: [ ['$','$'], ["\$","\$"] ],
displayMath: [ ['$$','$$'], ["\$","\$"] ]
},
displayAlign: 'center', // Change this to 'center' to center equations.
"HTML-CSS": {
styles: {'.MathJax_Display': {"margin": 4}}
}
});