Link to this talk: http://bit.ly/1xsnRqg
A tensor is a $d$-way array, $A(i_1, \ldots, i_d)$. For $d=2$ we get matrices.
A tensor decomposition is a (compact) representation of a tensor using fewer number of parameters.
Review: T. Kolda, B. Bader, Tensor decompositions
They appear in many applications:
multivariate functions approximations, quantum mechanics, multiparametric problems, hypergraphs, multidimensional data...
There are not too many efficient methods for working with multivariate functions:
Many successful methods are based on the idea of separation of variables
Function is called separable, if $$ f(x_1, \ldots, x_d) = f_1(x_1) f_2(x_2) \ldots f_d(x_d).$$
Not many functions can be represented in this form
(or even approximated, it means independence of variables in some sense.)
A more general and efficient class of functions is the sum of separable functions:
$$f(x_1,\ldots, x_d) \approx \sum_{\alpha=1}^r f^{(\alpha)}_1(x_1) \ldots f^{(\alpha)}_d(x_d)$$In the discrete case, representation
$$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha)$$
is called canonical format, or canonical-polyadic (CP-format).
The number of parameters is $dnr$, so if $r$ is small, a good idea is that it is possible to fit those parameters.
Cross approximation is a heuristic sampling technique for the selection of "good" rows and columns in a low-rank matrix.
The goal is to find rows and column that maximize the volume of the submatrix.
The maximal volume principle (Goreinov, Tyrtyshnikov) states that
$$ \Vert A - A_{skel} \Vert_C \leq (r + 1) \sigma_{r+1},$$and $\Vert \cdot \Vert$ is the maximal in modulus element of a matrix.
We can do a short demo (and compare with the SVD)
In [26]:
import numpy as np
from rect_cross2d import rect_cross2d
from scipy.linalg import hilbert
a = hilbert(1000)
%timeit U, S, V = rect_cross2d(a, 1e-5)
%timeit U, S, V = np.linalg.svd(a)
U, S, V = rect_cross2d(a, 1e-5)
print 'True error:', np.linalg.norm(U.dot(np.diag(S).dot(V)) - a)
It motivated the development of new tensor formats, that are stable as the singular value decomposition.
The Tensor Train (TT) format is the one that I use the most: it is simple and efficient
$$A(i_1, \ldots, i_d) \approx G_1(i_1) \ldots G_d(i_d),$$where $G_k(i_k)$ has size $r_{k-1} \times r_k$, $r_0 = r_d = 1$.
We have two basic packages:
Many basic and advanced algorithms are implemented, and can be used as building blocks.
Demo on the cross approximation of tensors
In [27]:
import tt
from tt.cross import rect_cross
d = 20
n = 10
x0 = tt.rand(d, n, 3)
h = 1e-2
def myfun1(x):
return np.sum(x, axis=1)
#return np.sqrt((((h * x) ** 2).sum(axis=1)))
x1 = rect_cross(myfun1, x0)
x1 = x1.round(1e-8)
print x1
General strategy: formulate the problem as a minimization problem
$$ F(X) \rightarrow \min, \quad X \in \mathcal{M}. $$Linear system demo
In [28]:
import numpy as np
import tt
from tt.amen import amen_solve
d = 8
f = 4
mat = tt.qlaplace_dd([d] * f)
rhs = tt.ones(2, d * f)
sol = amen_solve(mat, rhs, rhs, 1e-6)
print 'Error:', (tt.matvec(mat, sol) - rhs).norm()/rhs.norm()
print 'Efficient rank:', sol.erank
We define $p$-volume of a matrix $K \times r$ matrix $A$ with rows $a_i, i = 1, \ldots, K$, as the volume of the set
$$G = \{ x: x = \sum_i c_i a_i, \quad \Vert c \Vert_p \leq 1 \}$$The 2-volume:
$$\mathrm{vol}(G) = \gamma \det A^* A$$The 2-volume can be efficiently optimized, leading to the rectmaxvol algorithm.
Typically, $K \approx 2r$.
https://bitbucket.org/muxas/rect_maxvol (Alexander Mikhalev)
(Lubich, Koch, H.-D. Meyer,...)
$$\begin{split} &\frac{dU}{dt} S = \frac{dA}{dt} V,\\ &\frac{dV}{dt} S^{\top} = \frac{dA^{\top}}{dt} U, \\ &\frac{dS}{dt} = U^{\top} \frac{dA}{dt} V. \end{split}$$Suppose, you start from rank-$1$ and evolve into rank-$10$ case: the matrix $S$ will be singular!
The final splitting steps are
[V_1, S^{\top}_1] = QR(L_1)\end{split}$$
Step "K": $$\begin{split} K_0 = U_0 S^{\top}_0 \ K_1 = K_0 + (A_1 - A_0) V_0 \
[U_1, S_1] = QR(K_1)\end{split}$$
Step "L": $$S_1 = S_0 - U^{\top}_0 (A_1 - A_0) V_0$$ (note the minus!) The order of the steps matters
Can be used for updating!
We used tensors to speed up CNN with a factor of 10. \begin{equation} V(x,y,t) = \sum_{i=x-\delta}^{x+\delta} \; \sum_{j=y-\delta}^{y+\delta} \; \sum_{s=1}^S K(i,j,s,t)\, U(i,j,s) \end{equation}
We approximate $K$ by a CP-decomposition and get an excellent initial condition for the fine-tuning!
In [12]:
from IPython.core.display import HTML
def css_styling():
styles = open("../common/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[12]: