In [1]:

    
from traitlets.config.manager import BaseJSONConfigManager
#path = "/home/damian/miniconda3/envs/rise_latest/etc/jupyter/nbconfig"
cm = BaseJSONConfigManager()
cm.update('livereveal', {
              'theme': 'sky',
              'transition': 'zoom',
              'start_slideshow_at': 'selected',
              'scroll': True
})









    Out[1]:





{'scroll': True,
 'start_slideshow_at': 'selected',
 'theme': 'sky',
 'transition': 'zoom'}

Tensor decompositions

Overall plan

Multivariate function approximation: curse of dimensionality, polynomial chaos, optimal experiment design, connection to linear algebra.
Tensor decompositions:
Deep learning methods.

Forward problem

In the forward propagation of uncertainty, we have a known model F for a system of interest.

We model its inputs $X$ as a random variable and wish to understand the output random variable

$$Y = F(X)$$

(also denoted $Y \vert X$) and reads $Y$ given $X$.

We want to infer statistical properties of $Y$ given statistical properties of $X$.

In the simplest case, this correponds to the computation of high-dimensional integrals.

$$E Y = \int F(X) \rho(X) dX.$$

Approximation

In the simplest case, this correponds to the computation of high-dimensional integrals.

$$E Y = \int F(X) \rho(X) dX.$$

We select the model $G(X, \theta)$ and fit $\theta$ such that

$$F(X_i) \approx G(X_i, \theta), \quad i =1, \ldots, N_s.$$

Multivariate function interpolation

There are not too many efficient methods for working with multivariate functions:

Radial basis functions
Sparse grids
Best $N$-term approximation (wavelets?)
Different machine learning techniques (decision trees, neural networks)

Many successful methods are based on the idea of separation of variables

Separation of variables

Function is called separable, if $$ f(x_1, \ldots, x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d).$$

Not many functions can be represented in this form

(or even approximated, it means independence of variables in some sense.)

A more general and efficient class of functions is the sum of separable functions (known also proper generalized decomposition).

$$f(x_1,\ldots, x_d) \approx \sum_{\alpha=1}^r f^{(\alpha)}_1(x_1) \ldots f^{(\alpha)}_d(x_d)$$

Fitting by sums of separable functions

How to fit a given function $f$ by a sums of separable functions?

We can always consider $x_1, \ldots, x_d$ to be discrete, i.e

$$A(i_1, \ldots, i_d) = f(x_1(i_1), \ldots, x_d(i_d))$$

Second, we approximate the tensor by sampling.

Why it is possible to be done?

Number of parameters

In the discrete case, representation $$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha)$$ is called canonical format, or canonical-polyadic (CP-format).

The number of parameters is $dnr$, so if $r$ is small, a good idea is that it is possible to fit those parameters.

Matrix case

In two dimensions separation boils down to low-rank approximation: $$A \approx UV^{\top}$$

Skeleton decomposition

How can we approximation a low-rank matrix without computing all its elements?
The skeleton decomposition gives an answer: if a matrix has rank $r$, it can be exactly recovered from its $r$ linearly independent columns and $r$ linearly independent rows.

Cross approximation

Cross approximation is a heuristic sampling technique for the selection of "good" rows and columns in a low-rank matrix.

The goal is to find rows and column that maximize the volume of the submatrix.

The maximal volume principle (Goreinov, Tyrtyshnikov) states that

$$ \Vert A - A_{skel} \Vert_C \leq (r + 1) \sigma_{r+1},$$

and $\Vert \cdot \Vert$ is the maximal in modulus element of a matrix.

Multivariate case: problems

There is a big problem in many dimensions:

it is not possible to do the exact interpolation trick for the sum-of-products (canonical format).

The problem is not with the algorithms but with the format itself:

the best approximation may not even exist!

Bad example

$$f(x_1, \ldots, x_d) = x_1 + \ldots + x_d$$

$$g(t) = (1 + x_1 t) (1 + x_2 t) \ldots (1 + x_d t)$$$$g'(0) = f = \underbrace{\frac{g(h) - g(0)}{h}}_{\mathrm{rank~2}} + \mathcal{O}(h),$$

but no exact rank 2 exist!

Another example: Tucker

Another attempt to avoid was the Tucker format

(Tucker 1966, Lathauwer, 2000+), used first in statistics, where the data was given as a data cube.

$$A(i,j,k) \approx \sum_{\alpha \beta \gamma} G(\alpha,\beta,\gamma) U_1(i,\alpha)V(j,\alpha)W(k,\alpha)$$

You can compute Tucker by means of the SVD:

Compute unfoldings : $A_1, A_2, A_3$
Compute left SVD factors: $A_i \approx U_i \Phi_i$
Compute the core: $G = A \times_1 U^{\top}_1 \times_2 U^{\top}_2 \times_3 U^{\top}_3$.

Main problem with Tucker format

The main problem with the Tucker format is that it can not be used to represent high-dimensional functions

We reduce $n^d$ complexity to $r^d$, which makes $d = 10,\ldots, 20$ tracktable, but not more.

Are there other ideas?

Main motivation

Matrices are good, so reduce everything to matrices!

Splitting into halves

Slit index set into two, get $n^{d/2} \times n^{d/2}$ matrix, compute its SVD:

$$A(I, J) = \sum_{\alpha=1}^r U_{I, \alpha} V_{J, \alpha},$$

then we have $d_1 +1 $ and $d_2 + 1$ - dimensional tensors. Do it recursively!

Another construction

Another construction, so-called Hierarchical Tucker, forms the construction:

Compute Tucker decomposition, get a $r \times r \times \ldots \times r$ array;
Merge indices by two, and compute transfer matrices of size $r \times r \times r$.
Go up the tree

Simplest construction

The simplest construction separates one index at a time and gives the tensor-train (or matrix-product state) decomposition

$$A(i_1, \ldots, i_d) \approx G_1(i_1) \ldots G_d(i_d), $$

where $G_k(i_k)$ is $r_{k-1} \times r_k$ matrix for any fixed $r_k$.

Similar to Hidden Markov models

TT-format in a nutshell

If there is a canonical representation with rank $R$, then $r_k \leq R$
TT-ranks are computable via SVD of auxiliary matrices
Basic linear algebra operations
Solve optimization problems with solution in the TT-format
There is an exact interpolation formula with $\mathcal{O}(dnr^2)$ parameters (TT-cross method)

TT-ranks are matrix ranks

Define unfoldings:

$A_k = A(i_1 \ldots i_k; i_{k+1} \ldots i_d)$, $n^k \times n^{d-k}$ matrix

Theorem: there exists a TT-decomposition with TT-ranks

$$r_k = \mathrm{rank} A_k$$

Stability estimate

No exact ranks in practice -- stability estimate!

If $A_k = R_k + E_k$, $||E_k|| = \varepsilon_k$ $$||\mathbf{A}-\mathbf{TT}||_F \leq \sqrt{\sum_{k=1}^{d-1} \varepsilon^2_k}.$$

Fast and trivial linear algebra

Addition, Hadamard product, scalar product, convolution

All scale linear in $d$

$$C(i_1,\ldots,i_d) = A(i_1,\ldots,i_d) B(i_1,\ldots,i_d)$$

$$C_k(i_k) = A_k(i_k) \otimes B_k(i_k),$$ ranks are multiplied

Rounding

Rounding procedure solves the following problem: given a TT-representation

$$A(i_1, \ldots, i_d) = G_1(i_1) G_2(i_2) \ldots G_d(i_d)$$

find another one with smaller memory that approximates it with threshold $\epsilon$.

It can be done in $\mathcal{O}(dnr^3)$ operations using the linear structure of the format:

We can do orthogonalization of the cores!

Cross approximation in $d$-dimensions

Oseledets, Tyrtyshnikov, TT-cross approximation of multidimensional arrays, which generalizes the matrix result.

Theorem: A rank-$r$ tensor can be recovered from $\mathcal{O}(dnr^2)$ elements.

Packages

We have two basic packages:

TT-Toolbox (http://github.com/oseledets/TT-Toolbox) which is a MATLAB Toolbox that implements all the basic algorithms
ttpy (http://github.com/oseledets/ttpy) - its Python counterpart
Tensor train on tensorflow (t3f) (https://github.com/Bihaqo/t3f)

Many basic and advanced algorithms are implemented, and can be used as building blocks.



In [15]:

    
import numpy as np
import tt
from tt.cross.rectcross import cross
d = 20
n = 10
x0 = tt.rand(d, n, 3)
h = 1e-2
def myfun1(x):
    return np.sum(x, axis=1)
    #return np.sqrt((((h * x) ** 2).sum(axis=1)))

x1 = cross(myfun1, x0)
x1 = x1.round(1e-8)
print(x1)









    



swp: 0/9 er_rel = 1.9e+00 er_abs = 5.8e+08 erank = 7.0 fun_eval: 8400
swp: 1/9 er_rel = 6.8e-16 er_abs = 2.1e-07 erank = 11.9 fun_eval: 36700
This is a 10-dimensional tensor 
r(0)=1, n(0)=20 
r(1)=2, n(1)=20 
r(2)=2, n(2)=20 
r(3)=2, n(3)=20 
r(4)=2, n(4)=20 
r(5)=2, n(5)=20 
r(6)=2, n(6)=20 
r(7)=2, n(7)=20 
r(8)=2, n(8)=20 
r(9)=2, n(9)=20 
r(10)=1

Advanced methods

Sometime the function is given only implicitly. How to construct an approximation?

General strategy: formulate the problem as a minimization problem

$$ F(X) \rightarrow \min, \quad X \in \mathcal{M}. $$

Linear systems: $F(x) = (Ax, x) - 2(f, x)$, where $x$ can be mapped to a tensor
Eigenvalue problems: $F(x) = (Ax, x)/(x, x)$
Dynamic problem: $f(x) = \Vert \frac{dx}{dt} - \frac{dA}{dt} \Vert$,

Minimization techniques

Idea 1: Use ordinary optimization method + projection: $X_{k+1} = P(X_k + G(X_k))$
Idea 2: Use alternating least squares (polylinear structure)
Best: combine 1+2 (AMEN method). Instead of optimizing $A \approx UV^{\top}$ we enrich the basis with new elements.
All of them are implemented in the Toolboxes (both Python and MATLAB)



In [18]:

    
import numpy as np
import tt
from tt.amen import amen_solve

d = 8 
f = 4
mat = tt.qlaplace_dd([d] * f)
rhs = tt.ones(2, d * f)
sol = amen_solve(mat, rhs, rhs, 1e-6)
print('Error:', (tt.matvec(mat, sol) - rhs).norm()/rhs.norm())
print('Efficient rank:', sol.erank)









    



Error: 5.405330241887626e-07
Efficient rank: 35.31683561172893

Applications

There are many applications of tensor decompositions to UQ (Dolgov, Schleihl, Kazeev, Zhang, L. Daniel, Ballani, Grasedyck, Schwab, Khoromskij, ...)