Tensor-train (in physics known as matrix-product-state) representation has the form
$$ A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d), $$where $G_k(i_k)$ has size $r_{k-1} \times r_k$ and $r_0 = r_d = 1$.
Now we have d-1 ranks (we call them TT-ranks). The parametrization is defined by TT-cores.
Why tensor train?
Theorem: there exists a TT-decomposition with TT-ranks
$$r_k = \mathrm{rank} A_k$$
Rounding procedure solves the following problem: given a TT-representation
$$A(i_1, \ldots, i_d) = G_1(i_1) G_2(i_2) \ldots G_d(i_d)$$find another one with smaller memory that approximates it with threshold $\epsilon$.
It can be done in $\mathcal{O}(dnr^3)$ operations using the linear structure of the format:
We can do orthogonalization of the cores! (whiteboard again).
To used tensor decompositions, we have to approximate a continious problem by a discrete, where the vector of unknowns can be indexed by a $d$-index.
It is a mapping from 1D functions to $d$-dimensional arrays, using the following representation:
$$v_k = f(x_k), \quad k = 1, \ldots, 2^d.$$$V(i_1, \ldots, i_d) = v(a + i h)$, $\quad i = i_1 + 2 i_2 + \ldots 2^{d-1} i_d.$
It gives $\mathcal{O}(2dr^2) = \mathcal{O}(\log N)$ parameters.
Let us do some demo
In [17]:
import numpy as np
import tt
fun = lambda x: np.sin(20*x)
a = 0
b = 20
d = 70
N = 2 ** d
h = (b - a)*1.0/(N + 1)
t = tt.xfun(2, d) * h
eps = 1e-12
s1 = tt.multifuncrs2([t], fun, eps)
s1 = s1.round(eps)
In [20]:
import math
fun = lambda x: np.sin(x)/x
a = 0
b = 1e6
d = 60
N = 2 ** d
h = (b - a)*1.0/(N - 1)
e = tt.ones(2, d)
t = (tt.xfun(2, d) + e)* h
eps = 1e-12
s1 = tt.multifuncrs2([t], fun, eps)
tt.dot(s1, e) * h - math.pi/2
Out[20]:
In this paper we focus on the one-dimensional reaction-diffusion equation with initial distribution $f(x): \mathbb{R} \to \mathbb{R}^{+}$ and a constant diffusion coefficient~$\sigma$
\begin{equation} \left\{ \begin{aligned} & \frac{\partial}{\partial t} u(x,t) = \sigma \frac{\partial^2}{\partial x^2} u(x,t) - V(x,t) u(x,t),\\ & u(x,0)=f(x) \end{aligned} \right. \qquad t \in [0, T], \quad x \in \mathbb{R}. \end{equation}The solution can be expressed by the Feynman-Kac formula \begin{equation} u_{f}(x,T)=\int_{\mathcal C\{x,0; T \}} f(\xi(T)) e^{-\int_{0}^{T}\! V(\xi(\tau),T-\tau) d\tau } \mathcal{D}_{\xi}, \end{equation} where the integration is done over a set of all continuous paths $\xi(T): [0,T]\to \mathbb{R}$ from the Banach space $\Xi([0,T], \mathbb{R})$ starting at $\xi(0)=x$ and stopping at arbitrary endpoints at time $T$. $\mathcal{D}_{\xi}$ is the Wiener measure, and $\xi(t)$ is the Wiener process.
A standard way to discretize the path integral is to break the time range $[0,T]$ into $n$ intervals \begin{equation} \tau_k=k \cdot \delta t, \qquad 0\le k\le n, \qquad n\!: \tau_n=T. \end{equation} \begin{equation} u^{(n)}(x,t)=\int_{-\infty}^{\infty} \mathcal{D}^{(n)}_{\xi} \, f\!\left( \xi^{(n)}\right) \! \prod_{i=0}^{n} e^{ -w_{i} V^{(n)}_{i} \delta t }, \end{equation} The integration sign here denotes an $n$-dimensional integration over the particle steps $\xi_k$,
We rewrite the process as the sequence of convolutions \begin{equation} F^{(n)}_k(x)= \sqrt{\lambda/\pi} \int_{-\infty}^{\infty} \Phi^{(n)}_{k+1}(x + \xi) \, e^{-\lambda \xi^2 } d\xi, \qquad x \in \mathbb{R}, \qquad 1\le k \le n, \end{equation} where \begin{equation} \Phi^{(n)}_{k+1}(x)= F^{(n)}_{k+1}(x) \, e^{ -w_k V (x, \tau_{n - k}) \delta t }, \end{equation} and the boundary condition \begin{equation} F^{(n)}_{n+1}(x)=f(x), \end{equation} the solution is expressed as follows \begin{equation} u^{(n)}_{f}(x,T) = F^{(n)}_{1}(x) \, e^{-w_0 V(x, T) \, \delta t}. \end{equation} The iteration starts from $k=n$ and goes down to $k=1$.
We start from a large domain with $L \sim M^2$ points and then after $M$ steps find the solution at the required domain.
The complexity is then $\mathcal{O}(M^2 \log M)$ (due to the FFT).
To reduce complexity, we reshape the vector of length $M^2$ to a $M \times M$ vector and do low-rank approximation of it!
We compute PV (Present Discounted Value) that takes into account that value of money changes in time due to basic economic parameters. \begin{equation} PV=\left( \frac{\alpha}{\pi} \right)^{d/2} \int_{-\infty}^{\infty} \ldots \int_{-\infty}^{\infty} v(\xi_1,\ldots, \xi_d) \, e^{-\alpha \xi_{1}^2}\ldots e^{-\alpha \xi_{d}^2} d\xi_d \ldots d\xi_1 \qquad v(\xi_1,\ldots, \xi_d) = \sum_{k=1}^{d} u_k m_k \end{equation} \begin{equation*} u_k=\prod_{j=0}^{k-1} \frac{1}{1+i_j}, \qquad m_k=cr_k(1-w_k +w_k c_k), \qquad r_k=\prod_{j=1}^{k-1}(1-w_j), \end{equation*} \begin{equation} c_k = \sum^{d-k}_{j=0}\frac{1}{(1+i_0)^j} \qquad w_k=K_1 + K_2 \arctan(K_3 i_k + K_4), \qquad i_k = K^{k}_0 e^{\xi_1 + \ldots + \xi_k} i_0 \end{equation}
Accuracy of the cross approximation $\varepsilon = 10^{-10}$. Dimension of the integral is labeled by~$n$, It corresponds to the number of monthes in mortgage-backed security model. Its parameters are $(i_0,K_1,K_2,K_3, K_4, \sigma^2)=(0.007,0.01, -0.005, 10, 0.5,0.0004)$. Ranks of the matrix are presented in column labeled by $r$.
n | r | $T_{cross}$ (min.) | $T_{MC}$ (min.) |
---|---|---|---|
36 | 7 | 0.1 | 7.7 |
48 | 10 | 0.16 | 11.6 |
90 | 10 | 0.3 | 21.4 |
180 | 10 | 0.5 | 41.5 |
360 | 10 | 0.8 | 81.6 |
720 | 9 | 1.56 | 178 |
1440 | 9 | 2.81 | 354 |
The idea of QTT can be used to solve PDEs on very fine grids.
Kazeev & Schwab (2015) have proved that for all second-order elliptic/parabolic PDEs with analytic right-hand side the solution has the QTT-structure.
There are hp-methods that are capable of optimal complexity, but very difficult to be used
QTT-representation is very simple
Let us do a demo in another window.
Say we want to
To do it, we may build a probabilistic model of our images:
$$ P: 256^{1000 \times 1000} \mapsto [0, 1] $$ Now we can
Random variable $\vec{x} = (x_1, x_2, x_3)$, $x_j \in \{1, 2\}$:
Estimate all 8 parameters from the data:
$$ P(\vec{x}) := \frac{1}{N} \sum_{i=1}^N [\vec{x} = \vec{x}^i] $$
Just store it explicitly: $P \in \mathbb{R}^{2 \times 2 \times 2}$.
Million dimensions.
How to estimate and store $256^{1000 000}$ parameters (probabilities)?
Even if the potentials are known, $Z(X, W)$ is difficult to compute (sum of a huge array).
One example from the paper by Putting MRFs on a Tensor Train, A. Novikov, A. Rodomanov, A. Osokin, D. Vetrov, ICML-2014
Spin glass models, $T = 1$, $c_{ij} \sim U[-f, f]$.
The set of tensors with bounded TT-ranks allows for nice approximation; i.e. it is a good idea to use it as a regularization for machine learning problems.
This leads to the problems of the form
$$F(X) \rightarrow \min$$where $X$ belongs to the manifold of tensors with bounded TT-ranks
This can be solved using Riemannian optimization techniques, which seems to be very difficult, but in fact very simple and efficient (see Manopt).
In fact, the nonlinear manifold of TT-tensors with bounded ranks has properties similar to the linear subspace!
In [14]:
from IPython.core.display import HTML
def css_styling():
styles = open("custom.css", "r").read()
return HTML(styles)
css_styling()
Out[14]: