And for illustration we will set $k = 1$,
and solve the equation
$$-\Delta u = f, \quad u_{\partial \Omega} = 0, \quad \Omega=[0,1]^d.$$In $1D$ we get $3$-point stencil, which is typically written as
$$A = \frac{1}{h^2} \begin{bmatrix} -1 & 2 & 1 \end{bmatrix}$$Stencil notation is very convenient for structured grids.
At each grid point we store $3$ numbers with indices $-1, 0, 1$, and the action of a linear operator is defined as
$$y = Ax, \rightarrow \quad y_j = \sum_{k=-1}^1 x_{j+k} A(j)_k,$$and out-of-bound access is evaluated to $0$.
The notion of stencil is naturally extended to the 2D case,
where at each point we store a $3 \times 3$ matrix, and the whole matrix is stored as a
$$n \times n \times 3 \times 3$$array.
The 2D Poisson in the stencil notation is written as
$$A = \frac{1}{h^2} \begin{bmatrix} 0 & -1 & 0 \\ -1 & 4 & -1\\ 0 & -1 & 0 \end{bmatrix}.$$In the same way we can write down non-constant coefficients.
The idea of multigrid was proposed by R. Fedorenko http://mi.mathnet.ru/rus/zvmmf/v1/i5/p922
"Релаксационный метод решения разностных эллиптических уравнений", 1961.
The theory was a little bit later provided by N. Bakhvalov.
However, he gave the implementation to his PhD student and the PhD student failed, and Bakhvalov decided that the method "does not work". More than 5 year after, Brandt re-discovered multigrid in the West, and it worked. It became popular.
Consider a discretized 1D diffusion equation with zero boundary conditions
$$A_h u_h = f_h,$$where $$ A_h = \frac{1}{h^2} \begin{pmatrix} 2 & -1 \\ -1 & 2 & -1 \\ & -1 & \ddots & \ddots \\ & & \ddots & \ddots & -1 \\ & & & -1 & 2 \end{pmatrix}\quad \text{of size $N\times N$, $\ $ $ h = N^{-1}$} $$ As we know $$\lambda_k(A_h) = \frac{4}{h^2} \sin^2 \frac{\pi kh}{2}>0, \quad \text{cond}(A_h) = \frac{\lambda_\max}{\lambda_\min} = \mathcal{O}\left(\frac{1}{h^2}\right)$$
It is known that (show why) the optimal tau is $\frac{\tau_{\text{opt}}h^2}{2} = \frac{2}{\lambda_{\min} + \lambda_{\max}}$.
Therefore, $ \|e^{(k)}\|_2 \leqslant \left(\frac{1 - \lambda_\min/\lambda_\max}{1 + \lambda_\min/\lambda_\max}\right)^{k}\|e^{(0)}\|_2 = \left(\frac{1 - \mathcal{O}(N^{-2})}{1 + \mathcal{O}(N^{-2})}\right)^{k}\|e^{(0)}\|_2\leqslant \epsilon \quad \Longrightarrow$ $\quad k = \mathcal{O}(N^2 \ln \epsilon)$.
In [ ]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg as spla
import scipy
from scipy.sparse import csc_matrix
n = 20
ex = np.ones(n);
lp1 = n**2*sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.ones(n)
ev1, vec = spla.eigs(lp1, k=2, which='LR')
ev2, vec = spla.eigs(lp1, k=2, which='SR')
lam_max = ev1[0]
lam_min = ev2[0]
tau_opt = 2.0/(lam_max + lam_min)
niters = 1000
x = np.zeros(n)
res_all = []
for i in xrange(niters):
rr = lp1.dot(x) - rhs
x = x - tau_opt * rr
res_all.append(np.linalg.norm(rr))
#Convergence of an ordinary Richardson (with optimal parameter)
plt.semilogy(res_all)
plt.xlabel('Iteration number')
plt.ylabel('error')
Let $$T \psi_i = \mu_i \psi_i,$$ where $\psi_i$ and $\mu_i$, $i=1,\dots,N$ are eigenvectors and eigenvalues of $T$. Since $\{\psi_i\}$ is orthonormal basis (explain why), we have $$e^{(0)} = \sum_{i=1}^N c_i \psi_i,$$ with some coefficients $c_i$. Therefore, $$ e^{(k+1)} = T^{k} e^{(0)} = \sum_{i} c_i\lambda_i^k \psi_i $$
Now the goal is to show that $\lambda_i^k$ decays fast enough with respect to $k$ when, for instance, $i>N/2$ (high-frequecy part of the spectrum). Indeed, $$ | \lambda_i(T)| = |1 - \frac{h^2}{4}\lambda_k(A_h)|<({\bf\text{show why}})<\frac{1}{2},\quad \text{when}\quad k>N/2 $$ So, the high-frequency part of the error ($k>N/2$) becomes at least 2 times smaller on each iteration .m
In [4]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg as spla
import scipy
from scipy.sparse import csc_matrix
n = 80
ex = np.ones(n);
lp1 = n**2 * sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.ones(n)
ev1, vec = spla.eigs(lp1, k=4, which='LR')
ev2, vec = spla.eigs(lp1, k=4, which='SR')
lam_max = ev1[0]
lam_min = ev2[0]
tau_opt = np.real(2.0/(lam_max + lam_min))
sol = spla.spsolve(lp1, rhs)
niters = 1000
x = 0.2*np.random.random(n) #random x0 to add high frequencies to the error
err_all = []
grid = np.linspace(0, 1, n)
for i in xrange(niters):
rr = lp1.dot(x) - rhs
x = x - tau_opt * rr
err = spla.spsolve(lp1, rr)
err_all.append(err)
init, = plt.plot(grid, err_all[0])
smooth, = plt.plot(grid, err_all[200])
plt.legend([init, smooth], ['No smoothing', '200 iterations'])
#x = np.linspace(0, 1, n)
#def set_cursor(i):
# plt.ylim(0, 1.1*np.max(err_all[0]))
# plt.plot(x, err_all[i])
# plt.xlabel('space')
# plt.ylabel('error[i]')
# plt.title('Error smoothing')
#from IPython.html.widgets import interact
#interact(set_cursor, i=(0, niters-1, 5))
Out[4]:
Now we know that in an iterative processes
But, we can approximate smooth error vector on coarser grids!
Equation on error: $A_h e_{h} = r_h $, where $r_h = f_h - A_h u$ is residual. Given $e_h$ we can simply get $u_h = e_h + u$
Here
For the 1D case, N = 7: $$ P = \frac{1}{2} \begin{pmatrix} 1& \\ 2& \\ 1& 1& \\ & 2& \\ & 1& 1 \\ & & 2 \\ & & 1 \end{pmatrix}, \quad R = \frac{1}{2} P^{T} $$
In the 2D case, which is analogous, we can use stencil notation and bilinear interpolation: $$ P_{2D} = P\otimes P, \quad R_{2D} = R\otimes R\equiv \frac{1}{4}P_{2D}^{T}, $$ which corresponds to the following stencils
$$ \text{Restriction:}\quad \begin{bmatrix} \frac{1}{16} & \frac{1}{8} & \frac{1}{16} \\ \frac{1}{8} & \frac{1}{4} & \frac{1}{8} \\ \frac{1}{16} & \frac{1}{8} & \frac{1}{16} \end{bmatrix}. \quad \text{Prolongation:} \quad \begin{bmatrix} \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\ \frac{1}{2} & 1 & \frac{1}{2} \\ \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \end{bmatrix}. $$$A_{2h}$ can be chosen either as an operator on a coarser grid (from differetial formulation) or as Galerkin projection $R A_h P$.
Multigrid need not stop at two grids!
To find error on $2h$ we can go to $4h$ and so on. However, there are several different ways to go through grid levels.
W-cycle is typically superior to a V-cycle. At the same time the F-cycle is asymptotically better than $V$ or $W$.
Let $d$ be dimension of a problem. Then comutational cost of one cycle is
What do you need to know to implement the multigrid?
First of all, you need to solve
$$A_h u_h = f_h.$$You also need to have $P_h$, $R_h$, $A_{2h}$ - prolongation, restriction, coarse grid operator.
In a geometric setting you know the grids, you know coarse-grid discretization, you know prolongation/restriction operators (the latter is typically a bilinear operator).
First of all, if we know the restriction operator (from fine to coarse mesh),
we can select
$$P_h = R^{\top}_h.$$Moreover, the coarse mesh operator can be written down as the Galerkin projection
$$A_{2h} = P_h A_h R_h = P_h A_h P^{\top}_h.$$The only requirement is that the sparsity pattern of $A_{2h}$ does not grow.
Thus, the crucial choice is the choice of restriction operator.
Note, that if we use uniform grid in 2D and $3 \times 3$ stencil, using
$3 \times 3$ stencil for the restriction operator gives for the Galerkin projection also the $3 \times 3$ stencil.
$$P= \frac{1}{4} \begin{bmatrix} 1 & 2 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix}. $$However, this scheme stops to work if we have the problem
$$A u = \nabla \cdot k \nabla u = f,$$with piecewise-constant $k$ with large jumps.
Another issue is the convection-diffusion problems, which lead to non-symmetric matrices.
Scheme: take the coefficient $k$, construct $P$.
One of the efficient schemes was proposed by de Zeeuw,
Matrix-dependent prolongations and restrictions in a blackbox multigrid solver
And it can be formulated as the representation of the prolongation operator in the Kronecker-product form
which in the stencil form can be written as
$$P \approx p q^{\top}.$$For a two-grid method, we have ($S$ is the smoother iteration).
$$u^{(new)}_h = S^{\nu_1} u_h + g_h,$$$$r_h = g' - A_h S^{\nu_1} u_h, \quad, r_{2h} = g'' - R A_h S^{\nu_1} u_h, \quad e_{2h} = g''' - A^{-1}_{2h} R A_h S^{\nu_1} u_h, $$and
the new iteration $$u^{(next)}_h = u_h + \widehat{g} - P A^{-1}_{2h} R A_h S^{\nu_1} u_h,$$
i.e. the iteration matrix has the form
$$M(\nu_1) = I - P A^{-1}_{2h} R A_h S^{\nu_1}.$$If this matrix has bounded norm, the whole process can be modified to get grid-independent convergence.
Two properties are needed:
Then we have $\Vert M(\nu_1) \Vert \leq C.$
The proof is trivial: $I - P A^{-1}_{2h} R A_h S^{\nu_1} = \left(A^{-1}_h - P A^{-1}_{2h} R \right) A_h S^{\nu_1}.$
The concept of geometric multigrid requires additional knowledge about the coarse spaces and the interpolation.
The concept of algebraic multigrid, first introduced by Ruge and Stuben in 1987, is to work only with the sparsity pattern in the spirit of sparse solvers.
The main question of AMG is how to construct coarse meshes and interpolation operators, since everything else is already there.
The operator $S$ is a smoother, if
$$\Vert S v \Vert_A \leq \Vert v \Vert_A - \sigma \Vert v \Vert_2. $$It means, it reduces the error when $\Vert v \Vert_2$ is relatively large with respect to $\Vert v \Vert_A$.
The error is called algebraically smooth, if $\Vert v \Vert_2 \ll \Vert v \Vert_A.$
A damped Jacobi method satisfied the smoothing property.