In [1]:
from traitlets.config.manager import BaseJSONConfigManager
#path = "/home/damian/miniconda3/envs/rise_latest/etc/jupyter/nbconfig"
cm = BaseJSONConfigManager()
cm.update('livereveal', {
              'theme': 'sky',
              'transition': 'zoom',
              'start_slideshow_at': 'selected',
              'scroll': True
})


Out[1]:
{'scroll': True,
 'start_slideshow_at': 'selected',
 'theme': 'sky',
 'transition': 'zoom'}

Tensor decompositions

Overall plan

  1. Multivariate function approximation: curse of dimensionality, polynomial chaos, optimal experiment design, connection to linear algebra.

  2. Tensor decompositions:

  3. Deep learning methods.

Forward problem

In the forward propagation of uncertainty, we have a known model F for a system of interest.

We model its inputs $X$ as a random variable and wish to understand the output random variable

$$Y = F(X)$$

(also denoted $Y \vert X$) and reads $Y$ given $X$.

We want to infer statistical properties of $Y$ given statistical properties of $X$.

In the simplest case, this correponds to the computation of high-dimensional integrals.

$$E Y = \int F(X) \rho(X) dX.$$

Approximation

In the simplest case, this correponds to the computation of high-dimensional integrals.

$$E Y = \int F(X) \rho(X) dX.$$

We select the model $G(X, \theta)$ and fit $\theta$ such that

$$F(X_i) \approx G(X_i, \theta), \quad i =1, \ldots, N_s.$$

Multivariate function interpolation

There are not too many efficient methods for working with multivariate functions:

  • Radial basis functions
  • Sparse grids
  • Best $N$-term approximation (wavelets?)
  • Different machine learning techniques (decision trees, neural networks)

Many successful methods are based on the idea of separation of variables

Separation of variables

Function is called separable, if $$ f(x_1, \ldots, x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d).$$

Not many functions can be represented in this form

(or even approximated, it means independence of variables in some sense.)

A more general and efficient class of functions is the sum of separable functions (known also proper generalized decomposition).

$$f(x_1,\ldots, x_d) \approx \sum_{\alpha=1}^r f^{(\alpha)}_1(x_1) \ldots f^{(\alpha)}_d(x_d)$$

Fitting by sums of separable functions

How to fit a given function $f$ by a sums of separable functions?

We can always consider $x_1, \ldots, x_d$ to be discrete, i.e

$$A(i_1, \ldots, i_d) = f(x_1(i_1), \ldots, x_d(i_d))$$

Second, we approximate the tensor by sampling.

Why it is possible to be done?

Number of parameters

In the discrete case, representation $$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha)$$ is called canonical format, or canonical-polyadic (CP-format).

The number of parameters is $dnr$, so if $r$ is small, a good idea is that it is possible to fit those parameters.

Matrix case

In two dimensions separation boils down to low-rank approximation: $$A \approx UV^{\top}$$

Skeleton decomposition

How can we approximation a low-rank matrix without computing all its elements?
The skeleton decomposition gives an answer: if a matrix has rank $r$, it can be exactly recovered from its $r$ linearly independent columns and $r$ linearly independent rows.

Cross approximation

Cross approximation is a heuristic sampling technique for the selection of "good" rows and columns in a low-rank matrix.

The goal is to find rows and column that maximize the volume of the submatrix.

The maximal volume principle (Goreinov, Tyrtyshnikov) states that

$$ \Vert A - A_{skel} \Vert_C \leq (r + 1) \sigma_{r+1},$$

and $\Vert \cdot \Vert$ is the maximal in modulus element of a matrix.

Multivariate case: problems

There is a big problem in many dimensions:

it is not possible to do the exact interpolation trick for the sum-of-products (canonical format).

The problem is not with the algorithms but with the format itself:

the best approximation may not even exist!

Bad example

$$f(x_1, \ldots, x_d) = x_1 + \ldots + x_d$$

.

$$g(t) = (1 + x_1 t) (1 + x_2 t) \ldots (1 + x_d t)$$$$g'(0) = f = \underbrace{\frac{g(h) - g(0)}{h}}_{\mathrm{rank~2}} + \mathcal{O}(h),$$

but no exact rank 2 exist!

Another example: Tucker

Another attempt to avoid was the Tucker format

(Tucker 1966, Lathauwer, 2000+), used first in statistics, where the data was given as a data cube.

$$A(i,j,k) \approx \sum_{\alpha \beta \gamma} G(\alpha,\beta,\gamma) U_1(i,\alpha)V(j,\alpha)W(k,\alpha)$$

You can compute Tucker by means of the SVD:

  • Compute unfoldings : $A_1, A_2, A_3$
  • Compute left SVD factors: $A_i \approx U_i \Phi_i$
  • Compute the core: $G = A \times_1 U^{\top}_1 \times_2 U^{\top}_2 \times_3 U^{\top}_3$.

Main problem with Tucker format

The main problem with the Tucker format is that it can not be used to represent high-dimensional functions

We reduce $n^d$ complexity to $r^d$, which makes $d = 10,\ldots, 20$ tracktable, but not more.

Are there other ideas?

Main motivation

Matrices are good, so reduce everything to matrices!

Splitting into halves

Slit index set into two, get $n^{d/2} \times n^{d/2}$ matrix, compute its SVD:

$$A(I, J) = \sum_{\alpha=1}^r U_{I, \alpha} V_{J, \alpha},$$

then we have $d_1 +1 $ and $d_2 + 1$ - dimensional tensors. Do it recursively!

Another construction

Another construction, so-called Hierarchical Tucker, forms the construction:

  1. Compute Tucker decomposition, get a $r \times r \times \ldots \times r$ array;
  2. Merge indices by two, and compute transfer matrices of size $r \times r \times r$.
  3. Go up the tree

Simplest construction

The simplest construction separates one index at a time and gives the tensor-train (or matrix-product state) decomposition

$$A(i_1, \ldots, i_d) \approx G_1(i_1) \ldots G_d(i_d), $$

where $G_k(i_k)$ is $r_{k-1} \times r_k$ matrix for any fixed $r_k$.

Similar to Hidden Markov models

TT-format in a nutshell

  • If there is a canonical representation with rank $R$, then $r_k \leq R$
  • TT-ranks are computable via SVD of auxiliary matrices
  • Basic linear algebra operations
  • Solve optimization problems with solution in the TT-format
  • There is an exact interpolation formula with $\mathcal{O}(dnr^2)$ parameters (TT-cross method)

TT-ranks are matrix ranks

Define unfoldings:

$A_k = A(i_1 \ldots i_k; i_{k+1} \ldots i_d)$, $n^k \times n^{d-k}$ matrix

Theorem: there exists a TT-decomposition with TT-ranks

$$r_k = \mathrm{rank} A_k$$

Stability estimate

No exact ranks in practice -- stability estimate!

If $A_k = R_k + E_k$, $||E_k|| = \varepsilon_k$ $$||\mathbf{A}-\mathbf{TT}||_F \leq \sqrt{\sum_{k=1}^{d-1} \varepsilon^2_k}.$$

Fast and trivial linear algebra

Addition, Hadamard product, scalar product, convolution

All scale linear in $d$

$$C(i_1,\ldots,i_d) = A(i_1,\ldots,i_d) B(i_1,\ldots,i_d)$$

$$C_k(i_k) = A_k(i_k) \otimes B_k(i_k),$$ ranks are multiplied

Rounding

Rounding procedure solves the following problem: given a TT-representation

$$A(i_1, \ldots, i_d) = G_1(i_1) G_2(i_2) \ldots G_d(i_d)$$

find another one with smaller memory that approximates it with threshold $\epsilon$.

It can be done in $\mathcal{O}(dnr^3)$ operations using the linear structure of the format:

We can do orthogonalization of the cores!

Cross approximation in $d$-dimensions

Oseledets, Tyrtyshnikov, TT-cross approximation of multidimensional arrays, which generalizes the matrix result.

Theorem: A rank-$r$ tensor can be recovered from $\mathcal{O}(dnr^2)$ elements.

Packages

We have two basic packages:

Many basic and advanced algorithms are implemented, and can be used as building blocks.


In [15]:
import numpy as np
import tt
from tt.cross.rectcross import cross
d = 20
n = 10
x0 = tt.rand(d, n, 3)
h = 1e-2
def myfun1(x):
    return np.sum(x, axis=1)
    #return np.sqrt((((h * x) ** 2).sum(axis=1)))

x1 = cross(myfun1, x0)
x1 = x1.round(1e-8)
print(x1)


swp: 0/9 er_rel = 1.9e+00 er_abs = 5.8e+08 erank = 7.0 fun_eval: 8400
swp: 1/9 er_rel = 6.8e-16 er_abs = 2.1e-07 erank = 11.9 fun_eval: 36700
This is a 10-dimensional tensor 
r(0)=1, n(0)=20 
r(1)=2, n(1)=20 
r(2)=2, n(2)=20 
r(3)=2, n(3)=20 
r(4)=2, n(4)=20 
r(5)=2, n(5)=20 
r(6)=2, n(6)=20 
r(7)=2, n(7)=20 
r(8)=2, n(8)=20 
r(9)=2, n(9)=20 
r(10)=1 

Advanced methods

Sometime the function is given only implicitly. How to construct an approximation?

General strategy: formulate the problem as a minimization problem

$$ F(X) \rightarrow \min, \quad X \in \mathcal{M}. $$
  • Linear systems: $F(x) = (Ax, x) - 2(f, x)$, where $x$ can be mapped to a tensor
  • Eigenvalue problems: $F(x) = (Ax, x)/(x, x)$
  • Dynamic problem: $f(x) = \Vert \frac{dx}{dt} - \frac{dA}{dt} \Vert$,

Minimization techniques

  • Idea 1: Use ordinary optimization method + projection: $X_{k+1} = P(X_k + G(X_k))$
  • Idea 2: Use alternating least squares (polylinear structure)
  • Best: combine 1+2 (AMEN method). Instead of optimizing $A \approx UV^{\top}$ we enrich the basis with new elements.
    All of them are implemented in the Toolboxes (both Python and MATLAB)

In [18]:
import numpy as np
import tt
from tt.amen import amen_solve

d = 8 
f = 4
mat = tt.qlaplace_dd([d] * f)
rhs = tt.ones(2, d * f)
sol = amen_solve(mat, rhs, rhs, 1e-6)
print('Error:', (tt.matvec(mat, sol) - rhs).norm()/rhs.norm())
print('Efficient rank:', sol.erank)


Error: 5.405330241887626e-07
Efficient rank: 35.31683561172893

Applications

  • There are many applications of tensor decompositions to UQ (Dolgov, Schleihl, Kazeev, Zhang, L. Daniel, Ballani, Grasedyck, Schwab, Khoromskij, ...)