In [4]:
from IPython.html.services.config import ConfigManager
from IPython.paths import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
'theme': 'sky',
'transition': 'zoom',
'start_slideshow_at': 'selected',
'scroll': True,
})
Out[4]:
In [ ]:
%reload_ext tikzmagic
I have been working on low-rank approximation of matrices, where singular value decomposition (SVD) gives the best rank-$r$ approximation, which can be written as
$$A_{ij} \approx \sum_{\alpha=1}^r U_{i \alpha} V_{j \alpha}.$$And best rank-$r$ approximation always exists, efficient algorithms are in LAPACK, efficient sampling via skeleton decomposition is possible from $\mathcal{O}(dnr)$ entries.
The first example of a tensor that is bad for CP is the following one: $$A(i_1, \ldots, i_d) = i_1 + \ldots + i_d.$$
Define $$P(i_1, \ldots, i_d, t) = (1 + i_1 t)(1 + i_2 t) \ldots (1 + i_d t), $$ then
$$A = P'(0) = \frac{P(h) - P(0)}{h} + \mathcal{O}(h).$$CP-rank is d, can be approximated to any accuracy with rank 2.
Tucker decomposition is an old-time alternative to CPD, but it is amenable to the curse of dimensionality:
$$A(i_1, \ldots, i_d) \approx \sum_{\alpha_1, \ldots, \alpha_d} G(\alpha_1, \ldots, \alpha_d) U_1(i_1, \alpha) \ldots U_d(i_d, \alpha).$$The number of parameters is $dnr + r^d$ (very ok for compression purposes and 3D-..., but not for 100D).
Good news: Quasioptimal approximation can be computed by SVD, best approximation always exist (closed set).
The Tensor is said to be in the TT-format if
$$A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d), $$where $G_k(i_k)$ for a fixed $i_k$ is a matrix of size $r_{k-1} \times r_k$.
Index form: $$A(i_1, \ldots, i_d) = \sum_{\alpha_1, \ldots, \alpha_{d-1}} G_1(i_1, \alpha_1) G_2(\alpha_1,i_2, \alpha_2) \ldots G_d(\alpha_d, i_d).$$
The quasioptimal approximation can be computed via sequential SVD of the auxiliary matrices:
$$A(i_1; i_2 \ldots i_d) \approx \sum_{\alpha=1}^{r_1} G_1(i_1, \alpha_1) G_2(\alpha_1 i_2; i_3 \ldots i_d),$$then we separate $\alpha_1 i_2$ by another SVD and so on.
TT-SVD can be coded in $\sim$ 50 lines of MATLAB / Python code.
Given the tensor $\mathcal{A}$ as a black-box subroutine that allows us to compute any prescribed element.
Suppose we have apriori knowledge that $r_k \leq r$.
Then the tensor can be exactly recovered from $\mathcal{O}(dnr^2)$ samples,
and we have a fairly robust deterministic algorithm (TT-cross) to compute those entries.
The TT-cross result is a direct generalization of the matrix skeleton decomposition (and also comes with several adaptive algorithms).
Experience: The randomization is needed to improve robustness of the algorithms, but a good algorithm can not rely only on random sampling of entries, it should be adaptive.
Implement basic operations, but also quite advanced algorithms like
solving optimization problems with TT-rank constraint:
$$F(X) \rightarrow \min$$$X$ has low TT-ranks.
This includes high-dimensional linear systems, eigenvalue problems, dynamical low-rank approximation.
An optimization over TT-manifolds is a separate story.
The structure is polilynear, thus ALS is the most simple choice, but its convergence is tremendously better
I.e. it is typical that the methods converges in $5-20$ iterations to very high accuracy.
This is due to the right parametrization and stable representation (i.e. ALS in 2D converges very fast).
Given a model $$y = f(x)$$ where $x$ are now random variables, represent in a tensor-structured way the distribution of the output variables $$\rho(y_1, \ldots, y_k).$$
That requires algebraic manipulations over multivariate functions, and this all can be implemented though the TT-formalism (including the non-linear operations).
The software is robust, so let us find high dimensions where there are no high-dimensions.
Consider a 1D signal $$f(x) = \sin x$$ sampled on a uniform grid with $2^d$ points.
Reshape it into $2 \times 2 \times \ldots \times 2$ $d$-dimensional tensor.
That gives you a TT-tensor of rank 2, thus giving $\mathcal{O}(2dr^2) = \mathcal{O}(\log N)$ parameters.
This gives a concept: take the simplest representation but with a huge virtual grid with $2^d$ points.
Kazeev & Schwab (2015) have proved that for a wide class of 2D elliptic PDEs the solution has low-rank QTT structure.
We have recently implemented a solver that allows to "solve" problems with $2^{60}$ unknowns on a laptop in seconds.
In [1]:
from IPython.core.display import HTML
def css_styling():
styles = open("../custom.css", "r").read()
return HTML(styles)
css_styling()
Out[1]: