Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

In [ ]:
%matplotlib inline
import numpy
import matplotlib.pyplot as plt

Multigrid Methods

Jacobi Revisited

Consider the Poisson problem with $$ f(x) = -20 + \frac{1}{2}\left ( \phi''(x) \cos \phi(x) - (\phi'(x))^2 \sin \phi(x) \right ) $$ with $$ \phi = 20 \pi x^3, $$ and boundary conditions $u(0) = 1$ and $u(1) = 3$.

Integrating twice we can find that the solution of this problem is $$ u(x) = 1 + 12 x - 10 x^3 + \frac{1}{2} \sin \phi(x). $$

Discretizing this problem in the standard way with second order, centered finite differences leads to the following code


In [ ]:
def jacobi_update(x, U, f, delta_x):
    """Update U with a single Jacobi iteration"""
    U_new = U.copy()
    for i in xrange(1, x.shape[0] - 1):
        U_new[i] = 0.5 * (U[i+1] + U[i-1]) - f(x[i]) * delta_x**2 / 2.0
    step_size = numpy.linalg.norm(U_new - U, ord=2)
    del(U)
    return U_new, step_size


# Problem setup
a = 0.0
b = 1.0
alpha = 1.0
beta = 3.0
phi = lambda x: 20.0 * numpy.pi * x**3
phi_prime = lambda x: 60.0 * numpy.pi * x**2
phi_dbl_prime = lambda x: 120.0 * numpy.pi * x
f = lambda x: -20.0 + 0.5 * (phi_dbl_prime(x) * numpy.cos(phi(x)) - (phi_prime(x))**2 * numpy.sin(phi(x)))
u_true = lambda x: 1.0 + 12.0 * x - 10.0 * x**2 + 0.5 * numpy.sin(phi(x))

# Descretization
m = 100
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# Expected iterations needed
iterations_J = int(2.0 * numpy.log(delta_x) / numpy.log(1.0 - 0.5 * numpy.pi**2 * delta_x**2))
# iterations_J = 100

# Solve system
# Initial guess for iterations
U = 1.0 + 2.0 * x_bc
U[0] = alpha
U[-1] = beta
convergence_J = numpy.zeros(iterations_J)
step_size = numpy.zeros(iterations_J)
for k in xrange(iterations_J):
    U, step_size[k] = jacobi_update(x_bc, U, f, delta_x)
    convergence_J[k] = numpy.linalg.norm(delta_x * (u_true(x_bc) - U), ord=2)

# Plot result
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 3)

axes = fig.add_subplot(1, 3, 1)
axes.plot(x_bc, U, 'o', label="Computed")
axes.plot(x_bc, u_true(x_bc), 'k', label="True")
axes.set_title("Solution to $u_{xx} = f(x)$")
axes.set_xlabel("x")
axes.set_ylabel("u(x)")

axes = fig.add_subplot(1, 3, 2)
axes.semilogy(range(iterations_J), convergence_J, 'o')
axes.set_title("Error")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - u(x)||_1$")

axes = fig.add_subplot(1, 3, 3)
axes.semilogy(range(iterations_J), step_size, 'o')
axes.set_title("Change Each Step")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")

plt.show()

We eventually converge although we see that there is a lower limit to the effectiveness of the Jacobi iterations. We also again observe the extremely slow convergence we expect. What if we could still take advantage of Jacobi though?


In [ ]:
# Problem setup
a = 0.0
b = 1.0
alpha = 1.0
beta = 3.0
phi = lambda x: 20.0 * numpy.pi * x**3
phi_prime = lambda x: 60.0 * numpy.pi * x**2
phi_dbl_prime = lambda x: 120.0 * numpy.pi * x
f = lambda x: -20.0 + 0.5 * (phi_dbl_prime(x) * numpy.cos(phi(x)) - (phi_prime(x))**2 * numpy.sin(phi(x)))
u_true = lambda x: 1.0 + 12.0 * x - 10.0 * x**2 + 0.5 * numpy.sin(phi(x))

# Descretization
m = 255
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

U = 1.0 + 2.0 * x_bc
U[0] = alpha
U[-1] = beta

num_steps = 1000
plot_frequency = 200

# Plot initial error
fig = plt.figure()
fig.set_figwidth(fig.get_figwidth() * 2)

axes = fig.add_subplot(1, 2, 1)
axes.plot(x_bc, U, 'o', label="Computed")
axes.plot(x_bc, u_true(x_bc), 'k', label="True")
axes.set_title("Solution to $u_{xx} = f(x)$, iterations = %s" % 0)
axes.set_xlabel("x")
axes.set_ylabel("u(x)")

axes = fig.add_subplot(1, 2, 2)
axes.plot(x_bc, U - u_true(x_bc), 'r-o')
axes.set_title("Error, iterations = %s" % 0)
axes.set_xlabel("x")
axes.set_ylabel("U - u")

# Start Jacobi iterations
for k in xrange(num_steps):
    U, step_size = jacobi_update(x_bc, U, f, delta_x)

    if (k+1)%plot_frequency == 0:
        print step_size
        fig = plt.figure()
        fig.set_figwidth(fig.get_figwidth() * 2)

        axes = fig.add_subplot(1, 2, 1)
        axes.plot(x_bc, U, 'o', label="Computed")
        axes.plot(x_bc, u_true(x_bc), 'k', label="True")
        axes.set_title("Solution to $u_{xx} = f(x)$, iterations = %s" % (k + 1))
        axes.set_xlabel("x")
        axes.set_ylabel("u(x)")

        axes = fig.add_subplot(1, 2, 2)
        axes.plot(x_bc, U - u_true(x_bc), 'r-o')
        axes.set_title("Error, iterations = %s" % (k + 1))
        axes.set_xlabel("x")
        axes.set_ylabel("U - u")

plt.show()

Note that higher frequency components of the error are removed first! Why might this be?

Recall that we found in general that the error $e^{(k)}$ from a matrix splitting iterative approach involves the matrix $G$ where $$ U^{(k+1)} = M^{-1} N U^{(k)} + M^{-1} b = G U^{(k)} + c. $$ We then know that $$ e^{(k)} = G e^{(k-1)}. $$

In the case for Jacobi the matrix $G$ can be written as $$ G = I + \frac{\Delta x^2}{2} A = \begin{bmatrix} 0 & 1/2 & \\ 1/2 & 0 & 1/2 \\ & 1/2 & 0 & 1/2 \\ & & \ddots & \ddots & \ddots \\ & & & 1/2 & 0 & 1/2 \\ & & & & 1/2 & 0 \end{bmatrix} $$ Note that this amounts to averaging the off diagonal terms $U_{i+1}$ and $U_{i-1}$. Averaging has the effect of smoothing, i.e. it damps out higher frequencies more quickly.

Recall that the eigenvectors of $A$ and $G$ are the same, if those eigenvectors are $$ u^p_j = \sin(\pi p x_j) ~~~ \text{with} ~~~ x_j = j \Delta x, ~~~ j = 1, 2, 3, \ldots, m. $$ with eigenvalues $$ \lambda_p = \cos(p \pi \Delta x). $$

We can project the initial error $e^{(0)}$ onto the eigenspace such that $$ e^{(0)} = c_1 u^1 + c_2 u^2 + \cdots + c_m u^m $$ and therefore $$ e^{(k)} = c_1 (\lambda_1)^k u^1 + c_2 (\lambda_2)^k u^2 + \cdots + c_m (\lambda_m)^k u^m. $$ This implies that the $p$th component of the vector $e^{(k)}$ decays as the corresponding eigenvalue.

Examining the eigenvalues we know that the 1st and $m$th eigenvalues will be closest to 1 so the terms $c_1 (\lambda_1)^k u^1$ and $c_m (\lambda_m)^k u^m$ will dominate the error as $$ \lambda_1 = -\lambda_m \approx 1- \frac{1}{2} \pi^2 \Delta x^2. $$ We saw this before as this determined the overall convergence rate for Jacobi.

For other components of the error we can approximately see how fast they will decay. If $m / 4 \leq p \leq 3m / 4$ then $$ |\lambda_p| \leq \frac{1}{\sqrt{2}} \approx 0.7 $$ implying that after 20 iterations we would have $|\lambda_p|^20 < 10^{-3}$.

Connecting this back our original supposition that higher order frequencies in the error are damped more quickly look at the form of the eigenvector components.

The original error was projected onto the eigenvectors so that $$ e^{(0)} = c_1 u^1 + c_2 u^2 + \cdots + c_m u^m, $$ plugging in the eigenvectors themselves we find $$ e^{(0)} = c_1 \sin(\pi x_j) + c_2 \sin(\pi 2 x_j) + \cdots + c_m \sin(\pi m x_j) $$ so that we have effectively broken down the original error in terms of a Fourier sine series.

Now considering our analysis on the eigenvalues we see that it is in fact the middle range of frequencies that decay the most quickly. The reason we did not see this in our example is that the solution did not contain high-order frequencies relative to our choice of $m$. If we picked an $m$ that was too small we would have been in trouble (for multiple reasons). Try this out and see what you observe.

It turns out that having only the middle ranges decay quickly is suboptimal in the context we are considering. Instead we will user underrelaxed Jacobi, similar to SOR from before, where $$ U^{(k+1)} = (1 - \omega) U^{(k)} + \omega G U^{(k)} $$ with $\omega = 2/3$ (where $G$ is Jacobi's iteration matrix). The new iteration matrix is $$ G_\omega = (1 - \omega) I + \omega G $$ that has eigenvalues $$ \lambda_p = (1-\omega)+\omega \cos (p \pi \Delta x). $$

This choice of $\omega$ in fact then minimizes the eigenvalues in the range $m/2 < p < m$. In fact $$ |\lambda_p| \leq 1/3 $$ for this range. As a standalone method this is actually worse than Jacobi as the lower frequency components of the error decay even more slowly than before but this behavior is perfect for a multigrid approach.

Multigrid Approach

The basic approach for multigrid is to do the following:

  1. Use a few iterations of underrelaxed Jacobi to damp high-frequency components of the error.
  2. Since the error is now smoother than before we can represent the solution (and error) as a coarser resolution grid. We then switch to a coarser resolution.
  3. On this coarser grid we then again apply a few iterations of underrelaxed Jacobi which now quickly removes a set of lower-frequency components of the error since we are on a coarser grid.

Consider the $p = m/4$ component that on the original grid will not be damped much since it is in the lower-frequency range. If we transfer the problem now to a grid with half as many points we then find that the previous frequency is now at the midpoint of the frequency range and therefore will be damped much more quickly!

One important point here is that instead of transferring the solution to the coarser grid we only transfer the error.

If we have taken $n$ iterations on the original grid we would now have $$ e^{(n)} = U^{(n)} - u $$ which has a residual vector of $$ r^{(n)} = f - A U^{(n)} $$ since $$ A e^{(n)} = -r^{(n)}. $$ If we can solve this system for $e^{(n)}$ then we could go back and simply subtract this from the equation relating our numerical solution $U^{(n)}$ to $u$. The system of equations relating the error and residual is the one we are interested in coarsening.

Basic Algorithm:

  1. Take $n$ iterations (where $n$ is fixed) of a simple iterative method on the original problem $A u = f$. This gives the approximate solution $U^{(n)} \in \mathbb R^m$.
  2. Compute the residual $r^{(n)} = f - A U^{(n)} \in \mathbb R^m$.
  3. Coarsen the residual problem, take $r^{(n)} \mathbb R^m$ to $\hat{r} \mathbb R^{m_c}$ where $m_c = (m - 1) / 2$.
  4. Approximately solve the new problem $\hat{A} \hat{e} = -\hat{r}$ where $\hat{A}$ is the appropriately scaled matrix $A$.
  5. We now have an approximation to the error $e^{(k)}$ in $\hat{e}$. To get back to $e^{(k)}$ we use an appropriate interpolation method to go back to $\mathbb R^m$. Now subtract this interpolated approximate error to get a new approximation $U$ to $u$.
  6. Using this new value of $U$ as an initial guess repeat the process.

There are many variations on this scheme, most notably that we do not need to stop at a single coarsening, we can continue to coarsen to dampen additional lower frequencies depending mostly on the number of grid points $m$ we started out with. The specification of levels of coarsening combined with the interpolation back up to the original problem is called a V-cycle.

Also note that although we are solving multiple linear systems, each coarsened system decreases the number of points used by half (this is also adjustable).

Work Estimates

So how much better is multigrid?

As a concrete example consider again the similar example from before where $m = 2^8 - 1 = 255$ and using $n = 3$. If we allow recursive coarsening down to 3 grid points (7 levels of grids). On each level we apply 3 iterations of Jacobi. If we apply these Jacobi iterations on the way "down" the V-cycle and on the way up (not necessary in theory) we would do 6 iterations of Jacobi per level. This leads to a total of 42 Jacobi iterations on variously coarsened grids. The total number of updated values then would be $$ 6 \sum^8_{j=2} 2^j \approx 3072. $$ This is about the same amount of work as 12 iterations on the original fine grid would take. The big difference here is that, due to the sweeping, we have in fact damped out error frequencies in a much larger range than would have been accomplished simply by Jacobi iterations.

Now consider the more general case with $m + 1 = 2^J$ points recursing all the way down to one point (about) and taking $n$ iterations of Jacobi at each level. The total work would be $$ 2 n \sum^J_{j=2}2^j \approx 4 n 2^J \approx 4 n m = \mathcal{O}(m) $$ assuming $n \ll m$. The work required work for one V-cycle, noting that the number of grids grows as $\log_2 m$, is still $\mathcal{O}(m)$. It can actually also be shown for the Poisson problem that the number of V-cycles required in our simple approach that $\mathcal{O}(m \log m)$ would be required to reach a given level of error determined by the original $\Delta x$.

We can of course play with all sorts of types of V-cycles and iteration counts and these variations lead to a multitude of approaches.

Full Multigrid

Instead of starting and solving the original PDE at the finest level we can also start at the coarsest. To do this we do a few iterations on the coarsest level or solve the problem directly since the cost should be low at the coarsest level. We then interpolate to the next finer level and solve the problem there. We can then cycle back down to the coarsest level or continue upwards until we get to the finest level where we wanted to be in the first place. We then switch back to solving for the error rather than the original problem and proceed as before. This approach is usually labeled as full multigrid (FMG).

It turns out that although there is a "startup" phase that this is mostly negligible and greatly reduces the error by the time we reach the finest level. It turns out using FMG takes about $\mathcal{O}(m)$ work, optimal given that we have $m$ unknowns to solve for in the 1-dimensional problem.

Take-Away

We have been discussing one-dimensional implementations of multigrid but these methods extend to higher-dimensional problems and continue to be optimal. For instance a two-dimensional Poisson problem can be solved in $\mathcal{O}(m^2)$ work, again optimal due to the number of unknowns. A Fourier transform approach would require $\mathcal{O}(m^2 \log m)$ and a direct method $\mathcal{O}(m^3)$.

This all being said, multigrid is hard. There are as many ways to do it as their are problems to be solved (at least). Luckily there are a number of areas where research has been done to determine optimal methods (in some sense) for a given problem and discretization. There are also variations that include more complex ways to "coarsen" the error and residual. In general these are called algebraic multigrid methods (AMG). These are especially useful when it is not obvious how to coarsen the residual problem.

Further Reading

  1. W. L. Briggs, V. Emden Henson, and S. F. McCormick. A Multigrid Tutorial, 2nd ed. SIAM, Philadelphia, 2000.
  2. D. C. Jesperson. Multigrid methods for partial differential equations. In Studies in Numerical Analysis, G.H. Golub, ed. MAA Studies in Mathematics, Vol. 24, 1984, pages 270-317.
  3. W. Hackbusch. Multigrid Methods and Applications. Springer-Verlag, Berlin, 1985.
  4. P. Wesseling. An Introduction to Multigrid Methods. John Wiley, New York, 1992.