Before going into a multidimensional integrals. Now people talk about machine learning, but many things are known (see table).
Machine learning | Statistics |
---|---|
network, graphs model | weights parameters |
learning | fitting |
generalization | test set performance |
supervised learning | regression/classification |
unsupervised learning | density estimation, clustering |
large grant = \$1,000,000 | large grant = $50,000 |
Paskov, Traub, Faster valuation of financial derivatives
They considered a 360-dimensional collateralized mortgage obligation problems;
A typical way is to use Monte-Carlo sampling: sample $N$ points, compute average
The key idea is the idea of separation of variables:
$$A(i_1, \ldots, i_d) = U_1(i_1) \ldots U_d(i_d)$$.
In probability theory, it corresponds to the idea of independence of variables.
This class is too narrow to represent useful functions (or probability distributions).
For $d > 3$ The canonical decomposition of the form
$$A(i_1, \ldots, i_d) = \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha),$$then it is very difficult to compute numerically; there is even an example of $9 \times 9 \times 9$
tensor for which the tensor rank is not known!
But this is not the only tensor decomposition we can use.
Some times, it is better to study with the code and to try your own application
The maximal-volume principle is present in many areas, one of the earliest related references I know is the Design-optimality (Fedorov, 1972), and is related to the selection of optimal interpolation points for a given basis set:
$$f(x) \approx \sum_{\alpha=1}^r c_{\alpha} g_{\alpha}(x),$$instead of doing linear regression, just do interpolation at $r$ points.
Given an $n \times r$ matrix, find the submatrix of largest volume it in.
Use greedy algorithm (now we know that it is D-optimality from at least 1972)
Bad things about the canonical format:
You can compute Tucker by means of the SVD:
The simplest construction separates one index at a time and gives the tensor-train (or matrix-product state) decomposition
$$A(i_1, \ldots, i_d) \approx G_1(i_1) \ldots G_d(i_d), $$where $G_k(i_k)$ is $r_{k-1} \times r_k$ matrix for any fixed $r_k$.
Similar to Hidden Markov models
Tensor-train (in physics known as matrix-product-state) representation has the form
$$ A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d), $$where $G_k(i_k)$ has size $r_{k-1} \times r_k$ and $r_0 = r_d = 1$.
Now we have d-1 ranks (we call them TT-ranks). The parametrization is defined by TT-cores.
Why tensor train?
Theorem: there exists a TT-decomposition with TT-ranks
$$r_k = \mathrm{rank} A_k$$
Rounding procedure solves the following problem: given a TT-representation
$$A(i_1, \ldots, i_d) = G_1(i_1) G_2(i_2) \ldots G_d(i_d)$$find another one with smaller memory that approximates it with threshold $\epsilon$.
It can be done in $\mathcal{O}(dnr^3)$ operations using the linear structure of the format:
We can do orthogonalization of the cores! (whiteboard again).
To used tensor decompositions, we have to approximate a continious problem by a discrete, where the vector of unknowns can be indexed by a $d$-index.
It is a mapping from 1D functions to $d$-dimensional arrays, using the following representation:
$$v_k = f(x_k), \quad k = 1, \ldots, 2^d.$$$V(i_1, \ldots, i_d) = v(a + i h)$, $\quad i = i_1 + 2 i_2 + \ldots 2^{d-1} i_d.$
It gives $\mathcal{O}(2dr^2) = \mathcal{O}(\log N)$ parameters.
One of the examples:
We have a (hierarchical) model with random input.
Each output becomes a random variable as well.
http://arxiv.org/pdf/1407.3023.pdf Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition, Zheng Zhang, Xiu Yang, Ivan V. Oseledets, George Em Karniadakis, and Luca Daniel
For the stochastic circuit simulator it gives over 200x speedup for a 45-parameter system!
Integral over Brownian motion the one-dimensional reaction-diffusion equation with initial distribution $f(x): \mathbb{R} \to \mathbb{R}^{+}$ and a constant diffusion coefficient~$\sigma$
\begin{equation} \left\{ \begin{aligned} & \frac{\partial}{\partial t} u(x,t) = \sigma \frac{\partial^2}{\partial x^2} u(x,t) - V(x,t) u(x,t),\\ & u(x,0)=f(x) \end{aligned} \right. \qquad t \in [0, T], \quad x \in \mathbb{R}. \end{equation}The solution can be expressed by the Feynman-Kac formula \begin{equation} u_{f}(x,T)=\int_{\mathcal C\{x,0; T \}} f(\xi(T)) e^{-\int_{0}^{T}\! V(\xi(\tau),T-\tau) d\tau } \mathcal{D}_{\xi}, \end{equation} where the integration is done over a set of all continuous paths $\xi(T): [0,T]\to \mathbb{R}$ from the Banach space $\Xi([0,T], \mathbb{R})$ starting at $\xi(0)=x$ and stopping at arbitrary endpoints at time $T$. $\mathcal{D}_{\xi}$ is the Wiener measure, and $\xi(t)$ is the Wiener process.
A standard way to discretize the path integral is to break the time range $[0,T]$ into $n$ intervals \begin{equation} \tau_k=k \cdot \delta t, \qquad 0\le k\le n, \qquad n\!: \tau_n=T. \end{equation} \begin{equation} u^{(n)}(x,t)=\int_{-\infty}^{\infty} \mathcal{D}^{(n)}_{\xi} \, f\!\left( \xi^{(n)}\right) \! \prod_{i=0}^{n} e^{ -w_{i} V^{(n)}_{i} \delta t }, \end{equation} The integration sign here denotes an $n$-dimensional integration over the particle steps $\xi_k$,
We compute PV (Present Discounted Value) that takes into account that value of money changes in time due to basic economic parameters. \begin{equation} PV=\left( \frac{\alpha}{\pi} \right)^{d/2} \int_{-\infty}^{\infty} \ldots \int_{-\infty}^{\infty} v(\xi_1,\ldots, \xi_d) \, e^{-\alpha \xi_{1}^2}\ldots e^{-\alpha \xi_{d}^2} d\xi_d \ldots d\xi_1 \qquad v(\xi_1,\ldots, \xi_d) = \sum_{k=1}^{d} u_k m_k \end{equation} \begin{equation*} u_k=\prod_{j=0}^{k-1} \frac{1}{1+i_j}, \qquad m_k=cr_k(1-w_k +w_k c_k), \qquad r_k=\prod_{j=1}^{k-1}(1-w_j), \end{equation*} \begin{equation} c_k = \sum^{d-k}_{j=0}\frac{1}{(1+i_0)^j} \qquad w_k=K_1 + K_2 \arctan(K_3 i_k + K_4), \qquad i_k = K^{k}_0 e^{\xi_1 + \ldots + \xi_k} i_0 \end{equation}
Accuracy of the cross approximation $\varepsilon = 10^{-10}$. Dimension of the integral is labeled by $n$, It corresponds to the number of monthes in mortgage-backed security model. Its parameters are $(i_0,K_1,K_2,K_3, K_4, \sigma^2)=(0.007,0.01, -0.005, 10, 0.5,0.0004)$. Ranks of the matrix are presented in column labeled by $r$.
n | r | $T_{cross}$ (min.) | $T_{MC}$ (min.) |
---|---|---|---|
36 | 7 | 0.1 | 7.7 |
48 | 10 | 0.16 | 11.6 |
90 | 10 | 0.3 | 21.4 |
180 | 10 | 0.5 | 41.5 |
360 | 10 | 0.8 | 81.6 |
720 | 9 | 1.56 | 178 |
1440 | 9 | 2.81 | 354 |
In [2]:
from IPython.core.display import HTML
def css_styling():
styles = open("custom.css", "r").read()
return HTML(styles)
css_styling()
Out[2]: