In [4]:
from IPython.html.services.config import ConfigManager
from IPython.utils.path import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
              'theme': 'sky',
              'transition': 'zoom',
              'start_slideshow_at': 'selected',
})


Out[4]:
{u'start_slideshow_at': 'selected', u'theme': 'sky', u'transition': 'zoom'}

Lecture 12. Algebraic nature of the FMM

Previous lecture

  • Barnes-Hut algorithm, single tree
  • Barnes-Hut algorithm, double tree
  • Types of trees (octtree, KD-tree, recursive inertial bisection)
  • The bottom-top-left-right-top-bottom idea of the FMM
  • The concept of the multipole expansion

Todays lecture

  • Complexity analysis for the FMM
  • M2M, M2L, L2L operators via Taylor expansions
  • Separability criteria and low-rank approximation
  • What is the algebraic interpretation of the Barnes-Hut method
  • What is the algebraic interpretation of the FMM
  • The concept of equivalent charges
  • $\mathcal{H}$, $\mathcal{H}^2$-matrices
  • A brief recap on the uniform grid computations

N-body problem

Write down again the basic N-body problem.

$$V_i = \sum_{j} q_j F(x_i, y_j).$$
  • We construct trees for the sources $y_j$ and receivers $x_i$.
  • We construct M2M, M2L, L2L operators
  • We fill the "multipoles" in the sources tree using M2M operators (upward pass)
  • We fill the "local" coefficients in the receivers tree using M2L operators (translation operator)
  • We fill the "local" coefficients using L2L operators (downward pass)

Tree notation

We will denote the tree as $\mathcal{T}$, its nodes as $t$.

The set children(t) is the set of all children of a given node (is empty for the leaf)

For each $t$ we will denote all pairs $(t, s)$ for all nodes $s$ that belong to the interaction list.

Definition: For box $X$ the white squares are the interaction list: the fathers do not fullfil the separability criteria, but the children are separable)

M2M operator

M2M operator has a very simple form for the Barnes-Hut algorithm.

Each node stores exactly 1 number (total charge/mass),

Thus for a binary tree

$$q_{father} = \underbrace{\begin{bmatrix} 1 & 1 \end{bmatrix}}_{\mbox{M2M operator}} \begin{bmatrix} q_{leaf1} \\ q_{leaf2}\end{bmatrix}$$

In general,

$$ M_{father} = M2M \begin{bmatrix} M_{leaf1} \\ M_{leaf2}\end{bmatrix} $$

M2L operator

For a given node $s$ of the source cluster in the interaction list we store a matrix $\mbox{M2L(t, s)}$:

$$L(s) = \mbox{M2L}(t, s) M(t).$$

For the Barnes-Hut example, $M2L$ is a $1 \times 1$ matrix with the element $\frac{1}{\Vert R_t - R_s \Vert}$

L2L operator

L2L operator is also a $1 \times 1$ matrix for the BH!

Complexity estimate of the FMM

A simple complexity estimate for the FMM in 2D gives

$$\underbrace{N p}_{\mbox{upward pass}} + \underbrace{29 \frac{N}{s} p^2}_{\mbox{translation}} + \underbrace{N p}_{\mbox{downward pass}} + \underbrace{9 N s}_{\mbox{close interaction}}$$

A good choice is $s \approx p$ and that gives the complexity $40 N p$ ($p$ is the number of multipole coefficients involved)

The multipole method

In the multipole method, we approximate the interaction between two separated boxes with the Taylor series, or in other words, the function $f(x, y)$ is approximated in the strip

$$\Vert x - y \Vert \geq \delta.$$

Taylor series is not optimal

Separated representations

What we have from the Taylor series?

We have the separated representation, where the separation is of "sources" and "receivers"

$$f(x, y) \approx \sum_{\alpha=1}^r u_{\alpha}(x) v_{\alpha}(y),$$

where $x, y$ are in fact two-dimensional (or three-dimensional) vectors.

Separated representation and fast methods

If the interaction between two boxes is separable, then

$$V(x) = \sum_{j} f(x, y_j) q_j = \sum_{\alpha=1}^r u_{\alpha}(x) c_{\alpha},$$

where the coefficients $c_{\alpha}$ are given by the formula

$$c_{\alpha} = \sum_{j} v_{\alpha}(y_j) q_j.$$

Questions:

  • Is this enough for BH-type complexity?

  • Is this enough for the FMM-type complexity?

Answers

  • It is enough for the BH-type complexity on the matvec stage (when the decomposition is known)
  • It is not enough for the FMM-type complexity, since we have to compute the coefficients and then add to all the points inside the box.

Algebraic interpretation

Algebraic interpretation of the separated representation is simple.

Given two nodes $t, s$ of the cluster trees of sources and receivers, the interaction between boxes containing $t$ and $s$ is in fact the interaction between all particles in $t$ and all particles in $s$.

This is then a (sub)matrix of the total interaction matrix $A$.

It is convenient to denote it by $A(t, s)$. The separability property is then reduced on the discrete level to

$$A(t, s) \approx U(t, s) V(t, s)^{\top}, \quad U(t, s) \in \mathbb{R}^{M_t \times r}, \quad V(t, s) \in \mathbb{R}^{N_s \times r}$$

Algebraic interpretation (cont).

The low-rank factorization of the separated blocks leads to the hierarchical matrices ($\mathcal{H}$-matrices, which were actually introduced by Tyrtyshnikov in 1993, and later (re)introduced by Hackbusch, Khoromskij in 1999.

We split the matrix into blocks, and approximate blocks $A(t, s)$ that satisfy the admissability condition by low-rank matrices.

This is not enough!

The interpretation by "block low-rank structure" is viable, but not enough;

A closer step to the FMM algorithm is to consider the decomposition

$$A(t, s) = U(t) \Psi(t, s) V^{\top}(s),$$

where we have common factor matrices $U(t)$ and $V(s)$ for each node, and coefficient matrices $\Psi(t, s)$ which are small.

Still not an FMM: will have to store $n \times r$ matrices (there are nowhere in the FMM).

The final step: basis nestedness

To get the matrix structure, that is algebraically equivalent to the FMM structure, we have to make one additional assumption: the bases $U(t)$ are nested

In order to figure out what it means, we will have to define the block row of the matrix.

Block columns

All the blocks from the interaction list form "own far zone". If we add the far zone of the father of the node, and of grandfather, then it will be the block row of the full far zone (i.e. everything except for the close zone)

Algebraic characterization

The algebraic characterization is that the rank of the block row is equal to $r$.

Then the bases are nested, i.e. from $U(t_1)$ and $U(t_2)$ we can recompute the basis for the father:

$$U(t) = \begin{bmatrix} U_{t_1} & 0 \\ 0 & U_{t_2} \end{bmatrix} \Phi((t_1,t_2)\rightarrow t).$$

The transfer matrix $\Phi$ is of size $(r_{t_1} + r_{t_2}) \times r_t \approx 2r \times r$.

$H^2$-matrices

  • Thus, we only need the matrices to recompute the basises from the sons to the father.
  • The same holds for the columns.
  • The transfer matrices $\Phi$ are the M2M operators, the matrix $\Psi(t, s)$ are the M2L operators, and the same column matrices are the $L2L$ operators!

Let us do some whiteboard magic to show that it is indeed the case.

Speeding up translation operators

Now when we know the algebraic interpretation of the FMM, we can easily describe how to speedup the computation of translation operators.

The translation is just the multiplication by the matrices $\Psi(t, s)$.

Thus, if we select such basis $U'(t)$ and $V'(s)$ (maybe even larger, that the minimal one!) so $\Psi(t, s)$ is diagonal, we are fine.

Diagonal translation operators

Idea: the basis functions are created by charges located in the uniform mesh inside the box.

Then, the interaction becomes the interaction between two uniform grids in two boxes, and it can be computed with the help of fast Toeplitz matrix-by-vector product.

Final interpretation of $H^2$-matrices

If hierarchical matrices is the representation of a matrix as a set of non-intersecting low-rank blocks, the $\mathcal{H}^2$-matrix structure is the representation as the set of intesecting blocks (block rows and columns) which are of low-rank.

Other types of splittings

The splitting can be quite interesting, for example we can consider the simplest "interaction"

$$F(x, y) = 1, \quad \Vert x - y \Vert > \delta, \quad 0, \quad \mbox{otherwise}.$$

Then in 1D this is the excluded sum.

$$V_i = \sum_{j \ne i} q_j.$$

The most efficient way is to compute the suffix sum and the prefix sum.

This also can be interpreted as the splitting of the matrix into blocks!

Important questions

  • How you compute the factorization?
  • When such approximation exist?

H-matrix software

There is a well-known code HLIB for working with H-matrices. There is also kernel-independent variant.

Several "in-house" codes, we also develop one (http://bitbucket.org/muxas/h2tools)

Overview of todays lecture

  • Algebraic interpretation of the FMM as block-low rank matrices
  • The nestedness property

Next lecture

  • The computation of the H^2 factorization
  • Algebra on hierarchical matrices
  • Inversion of hierarchical matrices
  • Fast low-rank direct solvers

Bonus track

What happens, if the particles are on the uniform grid?


In [3]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/alex.css", "r").read()
    return HTML(styles)
css_styling()


Out[3]: