# Low-rank approximation of matrices and tensors

## Skoltech

• Founded 2011 with 7 people and a fax machine;
• English-speaking, international research university
• Located just outside of Moscow
• Joint initiative with MIT
• 180 students as of fall 2014 (40 PhD)
• 30+ faculty, no departments
• Main campus to be built in 2015

## My group

My group is 1yr old and is focused on different aspects of Scientific computing. The main idea is to develop broadly applicable methods based on separation of variables and low-rank approximation of matrices and tensors.

## Talk plan

• Concept of low-rank approximation of matrices
(and where it is applied for integral equations)
• Adaptive nested cross approximation of non-local operators and the h2tools package
• Efficient method for the solution of large dense linear system
with hierarchical structure
• Low-rank approximation of tensors and what we can do
(ttpy and TT-Toolbox packages)

## The interest in IE

The group in Moscow, lead by Prof. Eugene Tyrtyshnikov has been working on efficient methods for solving integral equations for decades, with electromagnetics as one of the main applications.

A model example, scalar boundary integral equation of the form $$\int_{\Omega} G(x, y) q(y) dy = u(x), \quad G(x, y) = \frac{1}{r}, \quad \mbox{or} \quad G(x,y) = \frac{e^{ikr}}{r},$$ and $\Omega$ is a surface in 3D.
A typical discretization using a mesh with triangular/rectangular elements may require $N \sim 10^3-10^5$ elements depending on the accuracy required, so the number of matrix elements would be $10^6 - 10^{10}$.
Even the computational of all matrix elements may require a lot of memory.

Evaluation of a single matrix element can also be time-consuming, since typically it reduces to evaluation of the integrals of the form
$$a_{ij} = \int_{T_i} \int_{T_j} K(x, y) \phi_i(x) \phi_j(y) dx dy,$$ where $\phi_i(x)$ and $\phi_j(y)$ are the selected basis functions.

## Fast methods

Fast methods have been in development for IE and non-local operators for decades.
The main focus is on the fast evaluation of the operator: i.e., given an input vector $x$, compute $y \approx Ax$ with
contollable accuracy $\varepsilon$ an hopefully in $\mathcal{O}(N)$ time. There are several classes of approaches.

• Fast multipole method (FMM, Greengard, Rokhlin): it is based on the analytical expansion of the kernel into multipole expansion:
If two sets of sources $y_j$ and receivers $x_i$ are well-separated, then the kernel can be expanded as
$$K(x, y) \approx \sum_{\alpha=1}^r f_{\alpha}(x) f_{\alpha}(y)$$ (this is an oversimplified view on the FMM, which is actually a 3-step algorithm with M2M, M2L and L2L steps)
• Precorrected FFT approach (J. White): use the shift-invariance of the kernel, interpolate onto the uniform grid,
evaluate using Fast Fourier Transform and correct in the near field
• Linear algebra via (block) low-rank approximation
(first in the works by W. Hackbusch, E. Tyrtyshnikov, B. Khoromskij).
The matrix is treated as a grey box: the input is a geometry of the problem (location of sources $y_j \in \mathbb{R}^d$ and receivers $x_i \in \mathbb{R}^d$, and a function that evaluates a matrix element $a_{ij}$.
Main goal: Approximate a dense matrix by sampling only $\mathcal{O}(N \log^{\alpha} n \log^{\beta} \varepsilon)$ elements

## Mosaic partioning

Why $\mathcal{O}(n)$ sampling is possible?
We hierarchically partition the matrix into large blocks, most of which corresponding to the far zone interaction.
Assumption: a block corresponding to the well-separated sources and receivers can be well approximated by a low-rank matrix.

Now we have to approximate a given matrix (block) of the full matrix by a low-rank matrix.
How can we do that without computing all the entries?
But before that I will do a short demo on a 1D example, illustrating the (block) low-rank idea.

First, consider a Hilbert matrix with the elements
$$a_{ij} = \frac{1}{i + j + 1/2}$$ It is a famous example of an ill-conditioned matrix, and it can be well-approximated by a low-rank matrix.
The best approximation can be computed using the singular value decomposition (SVD).



In [108]:

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()

n = 500
a = [[1.0 / ( i + j + 0.5) for i in xrange(n)] for j in xrange(n) ]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.semilogy(s/s[0])
plt.title('Decay of singular values for a Hilbert matrix')




Out[108]:

<matplotlib.text.Text at 0x151b2a750>



If we consider a matrix with the elements $$a_{ij} = \frac{1}{i - j + 1/2},$$ which correspond to the interaction of a "charges" on an interval with themselves, the low-rank property is lost, but it is not lost for the blocks!



In [109]:

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()

n = 500
a = [[1.0 / ( i - j + 0.5) for i in xrange(n)] for j in xrange(n) ]
a = np.array(a)
u, s, v = np.linalg.svd(a)
u1, s1, v1 = np.linalg.svd(a[:n/2,n/2:])
fig, ax = plt.subplots(1,  2, figsize = (10, 5))
ax[0].plot(s)
ax[0].set_title('Full matrix')
ax[1].semilogy(s1)
ax[1].set_title('12 block')
fig.tight_layout()






## Skeleton decomposition

How can we approximation a low-rank matrix without computing all its elements?
The skeleton decomposition gives an answer: if a matrix has rank $r$, it can be exactly recovered from its $r$ linearly independent columns and $r$ linearly independent rows.

## Cross approximation

Cross approximation is a heuristic sampling technique for the selection of "good" rows and columns in a low-rank matrix.
The goal is to find rows and column that maximize the volume of the submatrix.
The maximal volume principle (Goreinov, Tyrtyshnikov) states that $$\Vert A - A_{skel} \Vert_C \leq (r + 1) \sigma_{r+1},$$ and $\Vert \cdot \Vert$ is the maximal in modulus element of a matrix.

We can do a short demo (and compare with the SVD)



In [110]:

from rect_cross2d import rect_cross2d
from scipy.linalg import hilbert
a = hilbert(1000)
%timeit U, S, V = rect_cross2d(a, 1e-5)
%timeit U, S, V = np.linalg.svd(a)




10 loops, best of 3: 29.9 ms per loop
1 loops, best of 3: 720 ms per loop



## Hierachical matrices

The idea behind $\mathcal{H}$-matrices is now simple:

1. Split the matrix into small (dense) blocks corresponding to the near-field interaction, and large low-rank blocks corresponding to the far field interaction.
2. Approximate the far field blocks by low-rank

Mosaic partitioning corresponding to one-dimensional and two-dimensional distribution of particles.

Algebraical, but not optimal! There is a relation between the far field blocks as well.
This leads to so-called $\mathcal{H}^2$-matrices: an algebraic version of the FMM.
Algebraic property of an "FMM-matrix": the block row has small rank.

## H2tools

H2tools package (http://bitbucket.org/muxas/h2tools) is the main package we develop for working with block low-rank matrices.
Developers: Alexander Mikhalev, Daria Sushnikova, Ivan Oseledets
It is written in the Python language with parts accelerated with Cython.

Current functionality:

1. Gray-box approximation of a given matrix
2. Efficient method for the resulting linear systems

I will give a short demo for a problem with randomly generated particles in 3D that interact with a potential $G(x, y) = \frac{1}{\Vert x - y \Vert}$



In [1]:

import numpy as np
import h2py
from h2py.data.particles import inv_distance, Data
from h2py.main import Problem




In [2]:

import time
# parameters:
#   N -- number of particles
#   block_size -- no more, than block_size particles in close interactions
#                    between nodes of trees
N = 20000
block_size = 20

# Generating particles randomly in cube [0;1]^3
particles = np.random.rand(3, N)

# Saving particles in special data class and setting parameter k
data = Data(particles)
#data.check_far = check_far
# Building tree
t = -time.time()
problem = Problem(inv_distance, data, block_size = block_size,)
t += time.time()
print "Time for tree generation", t
# Generating symmetric computational queue
t = -time.time()
problem.gen_queue(symmetric = 1)
t += time.time()
print "Time for the far/close blocks marking", t
t = time.time()
problem.factorize('h2', tau = 1e-3, iters = 1, onfly = 0, verbose = 0, rect_maxvol_tol = 3)
t = time.time()-t
print 'Time for factorization:', t




Time for tree generation 16.3673560619
Time for the far/close blocks marking 0.00257802009583
Time for factorization: 16.203772068



## Solving linear systems with block low-rank matrices

Approximation of the matrix is not enough: we have to solve the resulting linear system!
(and that might cause problems).

The H2tools package has a black-box preconditioner built-in,
which is based on the sparse-extended (SE-form) of the matrix:
The classical three-step FMM-like procedure the multiplication
of an $\mathcal{H}^2$ matrix can be rewritten as a solution of a larger sparse linear system.

A typical SE-form looks as follows:

## Timings for a model surface integral equation

We have tested 5 different methods how we can you the SE-form on a model surface problem with $G(x, y) = \frac{1}{\Vert x - y\Vert}$.

Size of matrix Dir sol (block)+(SE) (full)+(SE) (Full)+(H2) (SVD)+(H2)
3928 2.585 3.215 2.11 2.16 0.87
13640 37.76 10.83 9.69 8.95 4.65
28080 234 22.33 28.08 25.47 16.92
59428 - 53.14 78.15 70.37 42.01
98936 - 122,41 144.31 127.95 89.94

Now I will talk about another direction, which is the main in my research: tensors.

## Multivariate functions

The main research direction is the approximation of multidimensional arrays (tensors).
Such tensors can appear, for example, in multiparametric/stochastic problems, when the approximated quantity depends on the deterministic or random parameters.
$$f(x_1, \ldots, x_d).$$ After the discretization of $x_k$ this gives a $d$-dimensional array (tensor) of the form
$$A(i_1, \ldots, i_d).$$ How to approximate it?
A very fruitful approach is based on the separation of variables
$$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha).$$

## Canonical format

$$A(i_1, \ldots, i_d) \approx \sum_{\alpha=1}^r U_1(i_1, \alpha) \ldots U_d(i_d, \alpha).$$

This is called canonical format. Its main disadvantage that the approximation problem is not convex and may not even have a solution.

## Tensor train

A tensor train is an alternative to the canonical format that can be computed in a robust way: $$A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d),$$ where $G_k(i_k)$ is a matrix of size $r_{k-1} \times r_k$ and $r_0 = r_d$.

## Typical problems

• Approximate a given tensor by a low-rank one: (solved, can apply to Green function compression), TT-SVD
• Do basic linear algebra
• We have generalized the skeleton decomposition to many dimensions (TT-cross)
• Solve optimization problems where the solution is sought in the low-rank format
• Use low-rank sampling to do global optimizations

## Packages

We have two basic packages:

Many basic and advanced algorithms are implemented, and can be used as building blocks.

Demo on the cross approximation of tensors



In [3]:

import tt
from tt.cross import rect_cross

d = 20
n = 10
x0 = tt.rand(d, n, 3)
h = 1e-2
def myfun1(x):
return np.sum(x, axis=1)
#return np.sqrt((((h * x) ** 2).sum(axis=1)))

x1 = rect_cross(myfun1, x0)
x1 = x1.round(1e-8)
print x1




swp: 0/9 er = 5.9e+08 erank = 7.0 fun_eval: 8400
swp: 1/9 er = 2.3e-07 erank = 11.9 fun_eval: 36320
This is a 10-dimensional tensor
r(0)=1, n(0)=20
r(1)=2, n(1)=20
r(2)=2, n(2)=20
r(3)=2, n(3)=20
r(4)=2, n(4)=20
r(5)=2, n(5)=20
r(6)=2, n(6)=20
r(7)=2, n(7)=20
r(8)=2, n(8)=20
r(9)=2, n(9)=20
r(10)=1



## Sample applications

The TT-Toolbox has been succesfully applied to different applications.

Hierarchical uncertainty quantification (Z. Zhang, MIT)

</div>

Dynamics of quantum spin Hamiltonians

</div>

• Multidimensional integrals in theoretical physics (M. Litsarev)
• Chemical master equation in biology (work by V. Kazeev and S. Dolgov)

## Publications, software

##### Questions?


In [107]:

from IPython.core.display import HTML
def css_styling():
return HTML(styles)
css_styling()




Out[107]:

@font-face {
font-family: "Computer Modern";
src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
}
div.cell{
/*width:80%;*/
/*margin-left:auto !important;
margin-right:auto;*/
}
h1 {
font-family: 'Alegreya Sans', sans-serif;
}
h2 {
font-family: 'Fenix', serif;
}
h3{
font-family: 'Fenix', serif;
margin-top:12px;
margin-bottom: 3px;
}
h4{
font-family: 'Fenix', serif;
}
h5 {
font-family: 'Alegreya Sans', sans-serif;
}
div.text_cell_render{
font-family: 'Alegreya Sans',Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
font-size: 160%;
line-height: 121%;
/*width:70%;*/
/*margin-left:auto;*/
margin-right:auto;
}
li {
line-height: 121%;
}
.CodeMirror{
font-family: "Source Code Pro";
font-size: 90%;
}
/*    .prompt{
display: None;
}*/
.text_cell_render h1 {
font-weight: 200;
font-size: 50pt;
line-height: 110%;
color:#CD2305;
margin-bottom: 0.5em;
margin-top: 0.5em;
display: block;
}
.text_cell_render h5 {
font-weight: 300;
font-size: 16pt;
color: #CD2305;
font-style: italic;
margin-bottom: .5em;
margin-top: 0.5em;
display: block;
}

.warning{
color: rgb( 240, 20, 20 )
}
.floated_img
{
float: left;
}

MathJax.Hub.Config({
TeX: {
extensions: ["AMSmath.js"]
},
tex2jax: {
inlineMath: [ ['$','$'], ["\$","\$"] ],
displayMath: [ ['$$','$$'], ["\$","\$"] ]
},
displayAlign: 'center', // Change this to 'center' to center equations.
"HTML-CSS": {
styles: {'.MathJax_Display': {"margin": 4}}
}
});