Lecture 5. FMM and its algebraic natrue

Previous lecture

Barnes-Hut algorithm, single tree
Barnes-Hut algorithm, double tree
Types of trees (octtree, KD-tree, recursive inertial bisection)
The bottom-top-left-right-top-bottom idea of the FMM
The concept of the multipole expansion

Todays lecture

Complexity analysis for the FMM
M2M, M2L, L2L operators via Taylor expansions
Separability criteria and low-rank approximation
What is the algebraic interpretation of the Barnes-Hut method
What is the algebraic interpretation of the FMM
The concept of equivalent charges
$\mathcal{H}$, $\mathcal{H}^2$-matrices
A brief recap on the uniform grid computations

N-body problem

Write down again the basic N-body problem.

$$V_i = \sum_{j} q_j F(x_i, y_j).$$

We construct trees for the sources $y_j$ and receivers $x_i$.
We construct S2M, M2M, M2L, L2L, L2T operators
We fill the "multipoles" in the sources tree using M2M operators (upward pass)
We fill the "local" coefficients in the receivers tree using M2L operators (translation operator)
We fill the "local" coefficients using L2L operators (downward pass)

Tree notation

We will denote the tree as $\mathcal{T}$, its nodes as $t$.

The set children(t) is the set of all children of a given node (is empty for the leaf)

For each $t$ we will denote all pairs $(t, s)$ for all nodes $s$ that belong to the interaction list.

Definition: For box $X$ the white squares are the interaction list: the fathers do not fullfil the separability criteria, but the children are separable)

M2M operator

M2M operator has a very simple form for the Barnes-Hut algorithm.

Each node stores exactly 1 number (total charge/mass).

Thus for a binary tree

$$q_{father} = \underbrace{\begin{bmatrix} 1 & 1 \end{bmatrix}}_{\mbox{M2M operator}} \begin{bmatrix} q_{leaf1} \\ q_{leaf2}\end{bmatrix}$$

In general,

$$ M_{father} = M2M \begin{bmatrix} M_{leaf1} \\ M_{leaf2}\end{bmatrix} $$

M2L operator

For a given node $s$ of the source cluster in the interaction list we store a matrix $\mbox{M2L(t, s)}$:

$$L(s) = \mbox{M2L}(t, s) M(t).$$

For the Barnes-Hut example, $M2L$ is a $1 \times 1$ matrix with the element $\frac{1}{\Vert R_t - R_s \Vert}$

L2L operator

L2L operator is also a $1 \times 1$ matrix for the BH!

M2M, M2L, L2L expansions in real life

In real life, these expansions are quite complicated and are based on so called addition theorems.

I do not even want to show them here (based on spherical harmonics).

Complexity estimate of the FMM

A simple complexity estimate for the FMM in 2D gives

$$\underbrace{N p}_{\mbox{upward pass}} + \underbrace{29 \frac{N}{s} p^2}_{\mbox{translation}} + \underbrace{N p}_{\mbox{downward pass}} + \underbrace{9 N s}_{\mbox{close interaction}}$$

A good choice is $s \approx p$ and that gives the complexity $40 N p$ ($p$ is the number of multipole coefficients involved)

In 3D the complexity is

$$189 \frac{N}{s} p^4 + 2 N p^3 + 27 N s,$$

i.e. for $s = 2p^2$ we have

$$150 N p^2.$$

This can be too much for $p$. Many $\mathcal{O}(p)$ schemes have been developed.

The multipole method

In the multipole method, we approximate the interaction between two separated boxes with the Taylor series, or in other words, the function $f(x, y)$ is approximated in the strip

$$\Vert x - y \Vert \geq \delta.$$

Taylor series is not optimal

Separated representations

What we have from the Taylor series?

We have the separated representation, where the separation is of "sources" and "receivers"

$$f(x, y) \approx \sum_{\alpha=1}^r u_{\alpha}(x) v_{\alpha}(y),$$

where $x, y$ are in fact two-dimensional (or three-dimensional) vectors.

Separated representation and fast methods

If the interaction between two boxes is separable, then

$$V(x) = \sum_{j} f(x, y_j) q_j = \sum_{\alpha=1}^r u_{\alpha}(x) c_{\alpha},$$

where the coefficients $c_{\alpha}$ are given by the formula

$$c_{\alpha} = \sum_{j} v_{\alpha}(y_j) q_j.$$

Questions:

Is this enough for BH-type complexity?
Is this enough for the FMM-type complexity?

Answers

It is enough for the BH-type complexity on the matvec stage (when the decomposition is known)
It is not enough for the FMM-type complexity, since we have to compute the coefficients and then add to all the points inside the box.

Algebraic interpretation

Algebraic interpretation of the separated representation is simple.

Given two nodes $s, t$ of the cluster trees of sources and receivers, the interaction between boxes containing $s$ and $t$ is in fact the interaction between all particles in $t$ and all particles in $s$.

This is then a (sub)matrix of the total interaction matrix $A$.

It is convenient to denote it by $A(t, s)$. The separability property is then reduced on the discrete level to

$$A(t, s) \approx U(t, s) V(t, s)^{\top}, \quad U(t, s) \in \mathbb{R}^{M_t \times r}, \quad V(t, s) \in \mathbb{R}^{N_s \times r},$$

where $N_s$ is number of sources and $M_t$ is number of receivers in corresponding boxes $s$ and $t$. $r$ is rank of $A(t, s)$

Algebraic interpretation (cont).

The low-rank factorization of the separated blocks leads to the hierarchical matrices ($\mathcal{H}$-matrices, which were actually introduced by Tyrtyshnikov in 1993, and later (re)introduced by Hackbusch, Khoromskij in 1999).

We split the matrix into blocks, and approximate blocks $A(t, s)$ that satisfy the admissability condition by low-rank matrices.

This is not enough!

The interpretation by "block low-rank structure" is viable, but not enough;

A closer step to the FMM algorithm is to consider the decomposition

$$A(t, s) = U(t) \Psi(t, s) V^{\top}(s),$$

where we have common factor matrices $U(t)$ and $V(s)$ for each node, and coefficient matrices $\Psi(t, s)$ which are small.

Still not an FMM: will have to store $n \times r$ matrices (there are nowhere in the FMM).

The final step: basis nestedness

To get the matrix structure, that is algebraically equivalent to the FMM structure, we have to make one additional assumption: the bases $U(t)$ are nested

In order to figure out what it means, we will have to define the block row of the matrix.

Block columns

All the blocks from the interaction list form "own far zone". If we add the far zone of the father of the node, and of grandfather, then it will be the block row of the full far zone (i.e. everything except for the close zone)

Algebraic characterization

The algebraic characterization is that the rank of the block row is equal to $r$.

Then the bases are nested, i.e. from $U(t_1)$ and $U(t_2)$ we can recompute the basis for the father:

$$U(t) = \begin{bmatrix} U_{t_1} & 0 \\ 0 & U_{t_2} \end{bmatrix} \Phi((t_1,t_2)\rightarrow t).$$

The transfer matrix $\Phi$ is of size $(r_{t_1} + r_{t_2}) \times r_t \approx 2r \times r$.

$\mathcal{H}^2$-matrices

Thus, we only need the matrices to recompute the basises from the sons to the father.
The same holds for the columns.
The transfer matrices $\Phi$ are the M2M operators, the matrix $\Psi(t, s)$ are the M2L operators, and the same column matrices are the $L2L$ operators!

Let us do some whiteboard magic to show that it is indeed the case.

Speeding up translation operators

Now when we know the algebraic interpretation of the FMM, we can easily describe how to speedup the computation of translation operators.

The translation is just the multiplication by the matrices $\Psi(t, s)$.

Thus, if we select such basis $U'(t)$ and $V'(s)$ (maybe even larger, that the minimal one!) so $\Psi(t, s)$ is diagonal, we are fine.

Diagonal translation operators

Idea: the basis functions are created by charges located in the uniform mesh inside the box.

Then, the interaction becomes the interaction between two uniform grids in two boxes, and it can be computed with the help of fast Toeplitz matrix-by-vector product.

Final interpretation of $H^2$-matrices

If hierarchical matrices is the representation of a matrix as a set of non-intersecting low-rank blocks, the $\mathcal{H}^2$-matrix structure is the representation as the set of intesecting blocks (block rows and columns) which are of low-rank.

Other types of splittings

The splitting can be quite interesting, for example we can consider the simplest "interaction"

$$F(x, y) = 1, \quad \Vert x - y \Vert > \delta, \quad 0, \quad \mbox{otherwise}.$$

Then in 1D this is the excluded sum.

$$V_i = \sum_{j \ne i} q_j.$$

The most efficient way is to compute the suffix sum and the prefix sum.

This also can be interpreted as the splitting of the matrix into blocks!

Important questions

How you compute the factorization?
When such approximation exist?

H-matrix software

There is a well-known code HLIB for working with H-matrices. There is also kernel-independent variant.

Several "in-house" codes, we also develop one (http://bitbucket.org/muxas/h2tools)

Going back to FMM

We can also try to work on the concept of kernel-independent FMM.

The idea is to use, instead of multipole expansion, the generalization of Barnes-Hut idea.

In Barnes-Hut the far field is represented by $1$ charge.

In KIFMM, the far field is represented by several charges.

The idea

We put some charges inside the box, and what to assign the equivalent charges in such a way,

that the far field at a certain test surface is the same as from the original charges,

The scheme

Select the position of charges inside the box (in the simplest scheme, fixed)
Select the test surface (for 1/r the sphere of fixed radius is ok).
Solve a rectangular least squares problem, from it determine the M2M, M2L and L2L coefficients.

What we have to compute

We can make use of translation invariance:

Use uniform subdivision
Only one computation for each level
Small precomputation

Selection of equivalent charges

The interaction between two boxes (clusters) is defined as a submatrix of the original matrix and "far field" loosely correponds to the low-rank of this matrix.

The equivalent charges representation follows from the skeleton decomposition of the form

$$ A(t, s)\approx \sum_{\widehat{t}, \widehat{s}} A(t, \widehat{s}) \widehat{A}^{-1}(\widehat{t}, \widehat{s}) A(\widehat{t}, s), $$

Summary

More on FMM
Idea of KIFMM



In [4]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[4]: