In [1]:
from IPython.html.services.config import ConfigManager
from IPython.utils.path import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
              'theme': 'sky',
              'transition': 'zoom',
              'start_slideshow_at': 'selected',
})


Out[1]:
{u'start_slideshow_at': 'selected', u'theme': 'sky', u'transition': 'zoom'}

Lecture 11. Going fast: the Barnes-Hut algorithm

Previous lecture

  • Discretization of the integral equations, Galerkin methods
  • Computation of singular integrals
  • Idea of the Barnes-Hut method

Todays lecture

  • Barnes-Hut in details
  • The road to the FMM
  • Algebraic versions of the FMM/Fast Multipole

The discretization of the integral equation leads to dense matrices.

The main question is how to compute the matrix-by-vector product,

i.e. the summation of the form:

$$\sum_{j=1}^M A_{ij} q_j = V_i, \quad i = 1, \ldots, N.$$

The matrix $A$ is dense, i.e. its element can not be omitted. The complexity is $\mathcal{O}(N^2)$.

Can we make it faster?

The simplest case is the computation of the potentials from the system of charges

$$V_i = \sum_{j} \frac{q_j}{\Vert r_i - r_j \Vert}$$

This summation appears in:

  • Modelling of large systems of charges
  • Astronomy (where instead of $q_j$ we have masses, i.e. start..)

It is called the N-body problem .

There is no problem with memory, since you only have two cycles.


In [34]:
import numpy as np
import math
from numba import jit
N = 10000
x = np.random.randn(N, 2);
y = np.random.randn(N, 2);
charges = np.ones(N)
res = np.zeros(N)


@jit
def compute_nbody_direct(N, x, y, charges, res):
    for i in xrange(N):
        res[i] = 0.0
        for j in xrange(N):
            dist = (x[i, 0] - y[i, 0]) ** 2 + (x[i, 1] - y[i, 1]) ** 2
            dist = math.sqrt(dist)
            res[i] += charges[j] / dist

In [35]:
%timeit compute_nbody_direct(N, x, y, charges, res)


1 loops, best of 3: 976 ms per loop

Question

What is the typical size of particle system?

Millenium run

One of the most famous N-body computations is the Millenium run

  • More than 10 billions particles ($2000^3$)
  • $>$ 1 month of computations, 25 Terabytes of storage
  • Each "particle" represents approximately a billion solar masses of dark matter
  • Study, how the matter is distributed through the Universy (cosmology)

In [23]:
from IPython.display import YouTubeVideo

YouTubeVideo('UC5pDPY5Nz4')


Out[23]:

Smoothed particle hydrodynamics

The particle systems can be used to model a lot of things.

For nice examples, see the website of Ron Fedkiw


In [22]:
from IPython.display import YouTubeVideo

YouTubeVideo('6bdIHFTfTdU')


Out[22]:

Applications

Where the N-body problem arises in different problems with long-range interactions`

  • Cosmology (interacting masses)
  • Electrostatics (interacting charges)
  • Molecular dynamics (more complicated interactions, maybe even 3-4 body terms).
  • Particle modelling (smoothed particle hydrodynamics)

Fast computation

$$ V_i = \sum_{j} \frac{q_j}{\Vert x_i - y_j \Vert} $$

Direct computation takes $\mathcal{O}(N^2)$ operations.

How to compute it fast?

The core idea: Barnes, Hut (Nature, 1986)

Use clustering of particles!

Idea on one slide

The idea was simple: If a charge is far from a cluster of sources, it they are seen as one big "particle".

Barnes-Hut

$$\sum_j q_j F(x, y_j) \approx Q F(x, y_C)$$$$Q = \sum_j q_j, \quad y_C = \frac{1}{J} \sum_{j} y_j$$

To compute the interaction, it is sufficient to replace by the a center-of-mass and a total mass!

The idea of Barnes and Hut was to split the sources into big blocks using the cluster tree

The algorithm is recursive. Let $\mathcal{T}$ be the tree, and $x$ is the point where we need to compute the potential.

  • Set $N$ to the root node
  • If $x$ and $N$ are separated , then set $V(x) = Q V(y_{\mathrm{center}})$
  • If $x$ and $N$ are not separated, compute $V(x) = \sum_{C \in \mathrm{sons}(N)} V(C, x)$ recursion

The complexity is $\mathcal{O}(\log N)$ for 1 point!

Trees

There are many options for the tree construction.

  • Quadtree/Octree
  • KD-tree
  • Recursive intertial bisection

Octtree

The simplest one: **quadtree/ octtree, when you split the square into 4 squares and do that until the number of points is less that a parameter.

It leads to the unbalanced tree, adding points is simple (but can unbalance it more).

KD-tree

Another popular choice of the tree is the KD-tree

The construction is simple as well:

Split along x-axis, then y-axis in such a way that the tree is balanced (i.e. the number of points in the left child/right child is similar).

The tree is always balanced, but biased towards the coordinate axis.

Recursive inertial bisection

Compute the center-of-mass and select a hyperplane such that sum of squares of distances to it is minimal.

$$\sum_{j} \rho^2(x_j, \Pi) \rightarrow \min.$$

Often gives best complexity, but adding/removing points can be difficult.

The scheme

You can actually code it from this description!

  • Construct the cluster tree
  • Fill the tree with charges
  • For any point we now can compute the potential in $\mathcal{O}(\log N)$ flops (instead of $\mathcal{O}(N)$).

Notes on the complexity

For each node of the tree, we need to compute its total mass and the center-of-mass. If we do it in a cycle, then the complexity will be $\mathcal{O}(N^2)$ for the tree constuction.

However, it is easy to construct it in a smarter way.

  • Start from the children (which contain only few particles) and fill then
  • Bottom-to-top graph traversal: if we know the charges for the children, we can cheaply compute the total charge/center of massfor the father

Now you can actually code this (minor things remaining are the bounding box and separation criteria).

Problems with Barnes-Hut

What are the problems with Barnes-Hut?

Well, several

  • The logarithmic term
  • Low accuracy $\varepsilon = 10^{-2}$ is ok, but if we want $\varepsilon=10^{-5}$ we have to take larger separation criteria

Solving problems with Barnes-Hut

  • Complexity: To avoid the logarithmic term, we need to store two trees, for the sources, and for receivers
  • Accuracy: instead of the piecewise-constant approximation which is inheritant in the BH algorithm, use more accurate representations.

Double tree Barnes-Hut

Principal scheme of the Double-tree BH:

  • Construct two trees for sources & receivers
  • Fill the tree for sources with charges (bottom-to-top)
  • Compute the interaction between nodes of the treess
  • Fill the tree for receivers with potentials (top-to-bottom)

The original BH method has low accuracy, and is based on the expansion

$$f(x, y) \approx f(x_0, y_0)$$

What to do?

Answer: Use higher-order expansions!

$$f(x + \delta x, y + \delta y) \approx f(x, y) + \sum_{k, l=0}^p (D^{k} D^{l} f) \delta ^k \delta y^l \frac{1}{k!} \frac{1}{l!} + \ldots $$

For the Coloumb interaction $\frac{1}{r}$ we have the multipole expansion

$$ v(R) = \frac{Q}{R} + \frac{1}{R^3} \sum_{\alpha} P_{\alpha} R_{\alpha} + \frac{1}{6R^5} \sum_{\alpha, \beta} Q_{\alpha \beta} (3R_{\alpha \beta} - \delta_{\alpha \beta}R^2) + \ldots, $$

where $P_{\alpha}$ is the dipole moment, $Q_{\alpha \beta}$ is the quadrupole moment (by actually, nothing more than the Taylor series expansion).

Fast multipole method

This combination is very powerful, and

Double tree + multipole expansion $\approx$ the Fast Multipole Method (FMM).

FMM

We will talk about the exact implementation and the complexity issues in the next lecture.

Problems with FMM

FMM has problems:

  • It relies on analytic expansions; maybe difficult to obtain for the integral equations
  • the higher is the order of the expansion, the larger is the complexity.
  • That is why the algebraic interpretation (or kernel-independent FMM) is of great importance.

FMM hardware

For cosmology this problem is so important, so that they have released a special hardware Gravity Pipe for solving the N-body problem

FMM software

Sidenote, When you Google for "FMM", you will also encounter fast marching method (even in the scikit).

Everyone uses its own in-house software, so a good Python open-source software is yet to be written.

This is also a perfect test for the GPU programming (you can try to take such project in the App Period, by the way).

Overview of todays lecture

  • The cluster tree
  • Barnes-Hut and its problems
  • Double tree / fast multipole method
  • Important difference: element evaluation is fast. In integral equations, it is slow.

Next lecture

  • More detailed overview of the FMM algorithm, along with complexity estimates.
  • Algebraic interpretation of the FMM
  • Application of the FMM to the solution of integral equations

In [2]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/alex.css", "r").read()
    return HTML(styles)
css_styling()


Out[2]: