Link to this talk: http://bit.ly/1xsnRqg

A tensor is a $d$-way array, $A(i_1, \ldots, i_d)$. For $d=2$ we get matrices.

A **tensor decomposition** is a (compact) representation of a tensor using fewer number of parameters.

Review: T. Kolda, B. Bader, Tensor decompositions

They appear in many applications:

multivariate functions approximations, quantum mechanics, multiparametric problems, hypergraphs, multidimensional data...

There are **not too many** efficient methods for working with multivariate functions:

- Radial basis functions
- Sparse grids
- Best $N$-term approximation (wavelets?)
- Different machine learning techniques (decision trees, neural networks)

Many successful methods are based on the idea of **separation of variables**

Function is called **separable**, if
$$ f(x_1, \ldots, x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d).$$

Not many functions can be represented in this form

(or even approximated, it means **independence** of variables in some sense.)

A more general and efficient class of functions is the **sum of separable functions**:

In the discrete case, representation
$$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha)$$
is called **canonical format**, or canonical-polyadic (CP-format).

The number of parameters is $dnr$, so if $r$ is small, a good idea is that it is possible to fit those parameters.

Cross approximation is a heuristic sampling technique for the selection of "good" rows and columns in a low-rank matrix.

The goal is to find rows and column that **maximize the volume** of the submatrix.

The maximal volume principle (Goreinov, Tyrtyshnikov) states that

$$ \Vert A - A_{skel} \Vert_C \leq (r + 1) \sigma_{r+1},$$and $\Vert \cdot \Vert$ is the maximal in modulus element of a matrix.

We can do a short demo (and compare with the SVD)

```
In [26]:
```import numpy as np
from rect_cross2d import rect_cross2d
from scipy.linalg import hilbert
a = hilbert(1000)
%timeit U, S, V = rect_cross2d(a, 1e-5)
%timeit U, S, V = np.linalg.svd(a)
U, S, V = rect_cross2d(a, 1e-5)
print 'True error:', np.linalg.norm(U.dot(np.diag(S).dot(V)) - a)

```
```

It motivated the development of new tensor formats, that are stable as the singular value decomposition.

The **Tensor Train (TT)** format is the one that I use the most: it is simple and efficient

where $G_k(i_k)$ has size $r_{k-1} \times r_k$, $r_0 = r_d = 1$.

- If there is a canonical representation with rank $R$, then $r_k \leq R$
- TT-ranks are computable via SVD of auxiliary matrices
- Basic linear algebra operations
- Solve optimization problems with solution in the TT-format
**There is an exact interpolation formula**with $\mathcal{O}(dnr^2)$ parameters (TT-cross method)

We have two basic packages:

- TT-Toolbox (http://github.com/oseledets/TT-Toolbox) which is a MATLAB Toolbox that implements all the basic algorithms
- ttpy (http://github.com/oseledets/ttpy) - its Python counterpart

Many basic and advanced algorithms are implemented, and can be used as building blocks.

Demo on the cross approximation of tensors

```
In [27]:
```import tt
from tt.cross import rect_cross
d = 20
n = 10
x0 = tt.rand(d, n, 3)
h = 1e-2
def myfun1(x):
return np.sum(x, axis=1)
#return np.sqrt((((h * x) ** 2).sum(axis=1)))
x1 = rect_cross(myfun1, x0)
x1 = x1.round(1e-8)
print x1

```
```

General strategy: formulate the problem as a minimization problem

$$ F(X) \rightarrow \min, \quad X \in \mathcal{M}. $$- Linear systems: $F(x) = (Ax, x) - 2(f, x)$, where $x$ can be mapped to a tensor
- Eigenvalue problems: $F(x) = (Ax, x)/(x, x)$
- Dynamic problem: $f(x) = \Vert \frac{dx}{dt} - \frac{dA}{dt} \Vert$,

- Idea 1: Use ordinary optimization method + projection: $X_{k+1} = P(X_k + G(X_k))$
- Idea 2: Use alternating least squares (polylinear structure)
**Best: combine 1+2**(AMEN method). Instead of optimizing $A \approx UV^{\top}$ we enrich the basis with new elements.

All of them are implemented in the Toolboxes (both Python and MATLAB)

Linear system demo

```
In [28]:
```import numpy as np
import tt
from tt.amen import amen_solve
d = 8
f = 4
mat = tt.qlaplace_dd([d] * f)
rhs = tt.ones(2, d * f)
sol = amen_solve(mat, rhs, rhs, 1e-6)
print 'Error:', (tt.matvec(mat, sol) - rhs).norm()/rhs.norm()
print 'Efficient rank:', sol.erank

```
```

We define $p$-volume of a matrix $K \times r$ matrix $A$ with rows $a_i, i = 1, \ldots, K$, as the volume of the set

$$G = \{ x: x = \sum_i c_i a_i, \quad \Vert c \Vert_p \leq 1 \}$$The 2-volume:

$$\mathrm{vol}(G) = \gamma \det A^* A$$The 2-volume can be efficiently optimized, leading to the **rectmaxvol** algorithm.

Typically, $K \approx 2r$.

https://bitbucket.org/muxas/rect_maxvol (Alexander Mikhalev)

(Lubich, Koch, H.-D. Meyer,...)

$$\begin{split} &\frac{dU}{dt} S = \frac{dA}{dt} V,\\ &\frac{dV}{dt} S^{\top} = \frac{dA^{\top}}{dt} U, \\ &\frac{dS}{dt} = U^{\top} \frac{dA}{dt} V. \end{split}$$Suppose, you start from rank-$1$ and evolve into rank-$10$ case: the matrix $S$ will be singular!

The final splitting steps are

**Step "L":**$$\begin{split}L_0 = V_0 S^{\top}_0 \ L_1 = L_0 + (A_1 - A_0)^{\top} U_0 \`[V_1, S^{\top}_1] = QR(L_1)\end{split}$$`

**Step "K":**$$\begin{split} K_0 = U_0 S^{\top}_0 \ K_1 = K_0 + (A_1 - A_0) V_0 \`[U_1, S_1] = QR(K_1)\end{split}$$`

**Step "L":**$$S_1 = S_0 - U^{\top}_0 (A_1 - A_0) V_0$$ (note the minus!) The order of the steps matters

Can be used for updating!

We used tensors to speed up CNN with a factor of 10. \begin{equation} V(x,y,t) = \sum_{i=x-\delta}^{x+\delta} \; \sum_{j=y-\delta}^{y+\delta} \; \sum_{s=1}^S K(i,j,s,t)\, U(i,j,s) \end{equation}

We approximate $K$ by a CP-decomposition and get an excellent initial condition for the fine-tuning!

```
In [12]:
```from IPython.core.display import HTML
def css_styling():
styles = open("../common/custom.css", "r").read()
return HTML(styles)
css_styling()

```
Out[12]:
```