Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Kyle T. Mandli

In [ ]:
%matplotlib inline
import numpy
import matplotlib.pyplot as plt

Iterative Methods

In this lecture we will consider a number of classical and more modern methods for solving sparse linear systems like those we found from our consideration of boundary value problems.

Ways to Solve $A u = f$

We have proposed solving the linear system $A u = f$ which we have implemented naively above with the numpy.linalg.solve command but perhaps given the special structure of $A$ here that we can do better.

Direct Methods (Gaussian Elimination)

We could use Gaussian elimination to solve the system (or some factorization) which leads to a solution in a finite number of steps. For large, sparse methods however these direct solvers are much more expensive in general over iterative solvers. As was discussed for eigenproblems, iterative solvers start with an initial guess and try to improve on that guess.

Consider a 3D problem discretized with $m = 100$ points on an edge. This leads to $N = 100 \times 100 \times 100 = 10^6$ unknowns.

Gaussian Elimination - $\mathcal{O}(N^3)$ operations to solve, $(10^6)^3 = 10^18$ operations.

Suppose you have a machine that can perform 100 gigaflops (floating point operations per second): $$ \frac{10^{18}~ [\text{flop}]}{10^{11}~ [\text{flop / s}]} = 10^7~\text{s} \approx 115~\text{days}. $$

What about memory?

Require $N^2$ to store entire array. In double precision floating point we would require 8-bytes per entry leading to $$ (10^6)^2 ~[\text{entries}] \times 8 ~[\text{bytes / entry}] = 8 \times 10^{12} ~[\text{bytes}] = 8 ~[\text{terabytes}]. $$

The situation really is not as bad as we are making it out to be as long as we take advantage of the sparse nature of the matrices. In fact for 1 dimensional problems direct methods can be reduced to $\mathcal{O}(N)$ in the case for a tridiagonal system. The situation is not so great for higher-dimensional problems however unless most structure can be leveraged. Examples of these types of solvers include fast Fourier methods such as fast Poisson solvers.

Iterative Methods

Iterative methods take another tact that direct methods. If we have the system $A x = b$ we form an iterative procedure that applies a function, say $L$, such that $$ \hat{x}^{(k)} = L(\hat{x}^{(k-1)}) $$ where we want errot between the real solution $x$ and $\hat{x}^{(k)}$ goes to zero as $k \rightarrow \infty$. We will explore these methods in the next lecture.

Jacobi and Gauss-Seidel

The Jacobi and Gauss-Seidel methods are simple approaches to introducing an iterative means for solving the problem $Ax = b$ when $A$ is sparse. Consider again the Poisson problem $u_{xx} = f(x)$ and the finite difference approximation at the point $x_i$ $$ \frac{U_{i-1} - 2 U_i + U_{i+1}}{\Delta x^2} = f(x_i). $$

If we rearrange this expression to solve for $U_i$ we have $$ U_i = \frac{1}{2} (U_{i+1} + U_{i-1}) - f(x_i) \frac{\Delta x^2}{2}. $$

For a direct method we would simultaneously find the values of $U_i$, $U_{i+1}$ and $U_{i-1}$ but instead consider the iterative scheme computes an update to the equation above by using the past iterate (values we already know) $$ U_i^{(k+1)} = \frac{1}{2} (U_{i+1}^{(k)} + U_{i-1}^{(k)}) - f(x_i) \frac{\Delta x^2}{2}. $$

Since this allows us to evaluate $U_i^{(k + 1)}$ without knowing the values of $U_{i+1}^{(k)} + U_{i-1}^{(k)}$ we directly evaluate this expression! This process is called Jacobi iteration. It can be shown that for this particular problem Jacobi iteration will converge from any initial guess $U^{(0)}$ although slowly.

Advantages

  • Matrix $A$ is never stored or created
  • Storage is optimal
  • $\mathcal{O}(m^2)$ are required per iteration

Example

Let's try to solve the problem before in the BVP section but use Jacobi iterations to replace the direct solve $$ u_{xx} = e^x, ~~~~ x \in [0, 1] ~~~~ \text{with} ~~~~ u(0) = 0.0, \text{ and } u(1) = 3. $$


In [ ]:
# Problem setup
a = 0.0
b = 1.0
u_a = 0.0
u_b = 3.0
f = lambda x: numpy.exp(x)
u_true = lambda x: (4.0 - numpy.exp(1.0)) * x - 1.0 + numpy.exp(x)

# Descretization
m = 100
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# Expected iterations needed
iterations_J = int(2.0 * numpy.log(delta_x) / numpy.log(1.0 - 0.5 * numpy.pi**2 * delta_x**2))

# Solve system
# Initial guess for iterations
U_new = numpy.zeros(m + 2)
U_new[0] = u_a
U_new[-1] = u_b
convergence_J = numpy.zeros(iterations_J)
for k in xrange(iterations_J):
    U = U_new.copy()
    for i in xrange(1, m + 1):
        U_new[i] = 0.5 * (U[i+1] + U[i-1]) - f(x_bc[i]) * delta_x**2 / 2.0

    convergence_J[k] = numpy.linalg.norm(u_true(x_bc) - U_new, ord=2)
        
# Plot result
fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.plot(x_bc, U, 'o', label="Computed")
axes.plot(x_bc, u_true(x_bc), 'k', label="True")
axes.set_title("Solution to $u_{xx} = e^x$")
axes.set_xlabel("x")
axes.set_ylabel("u(x)")

fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.semilogy(range(iterations_J), convergence_J, 'o')
axes.semilogy(range(iterations_J), numpy.ones(iterations_J) * delta_x**2, 'r--')
axes.set_title("Convergence of Jacobi Iterations for Poisson Problem")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")

plt.show()

A slight modification to the above leads also to the Gauss-Seidel method. Programmtically it is easy to see the modification but in the iteration above we now will have $$ U_i^{(k+1)} = \frac{1}{2} (U_{i+1}^{(k)} + U_{i-1}^{(k+1)}) - f(x_i) \frac{\Delta x^2}{2}. $$


In [ ]:
# Problem setup
a = 0.0
b = 1.0
u_a = 0.0
u_b = 3.0
f = lambda x: numpy.exp(x)
u_true = lambda x: (4.0 - numpy.exp(1.0)) * x - 1.0 + numpy.exp(x)

# Descretization
m = 100
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# Expected iterations needed
iterations_GS = int(2.0 * numpy.log(delta_x) / numpy.log(1.0 - numpy.pi**2 * delta_x**2))

# Solve system
# Initial guess for iterations
U = numpy.zeros(m + 2)
U[0] = u_a
U[-1] = u_b
convergence_GS = numpy.zeros(iterations_GS)
success = False
for k in xrange(iterations_GS):
    for i in xrange(1, m + 1):
        U[i] = 0.5 * (U[i+1] + U[i-1]) - f(x_bc[i]) * delta_x**2 / 2.0

    convergence_GS[k] = numpy.linalg.norm(u_true(x_bc) - U, ord=2)

# Plot result
fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.plot(x_bc, U, 'o', label="Computed")
axes.plot(x_bc, u_true(x_bc), 'k', label="True")
axes.set_title("Solution to $u_{xx} = e^x$")
axes.set_xlabel("x")
axes.set_ylabel("u(x)")

fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.semilogy(range(iterations_GS), convergence_GS, 'o')
axes.semilogy(range(iterations_GS), numpy.ones(iterations_GS) * delta_x**2, 'r--')
axes.set_title("Convergence of Gauss-Seidel Iterations for Poisson Problem")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")

plt.show()

Matrix Splitting Methods

One way to view Jacobi and Gauss-Seidel is as a splitting of the matrix $A$ so that $$ A = M - N. $$

Then the system $A U = b$ can be viewed as $$ M U - N U = b \Rightarrow MU = NU + b. $$

Viewing this instead as an iteration we have then $$ M U^{(k+1)} = N U^{(k)} + b. $$ The goal then would be to pick $M$ and $N$ such that $M$ contains as much of $A$ as possible while remaining easier to solve than the original system.

The resulting update for each of these then becomes $$ U^{(k+1)} = M^{-1} N U^{(k)} + M^{-1} b = G U^{(k)} + c $$ where $G$ is called the iteration matrix and $c = M^{-1} b$. We also want $$ u = G u + c $$ where $u$ is the true solution of the original $A u = b$, in other words $u$ is the fixed point of the iteration. Is this fixed point stable though? If the spectral radius $\rho(G) < 1$ we can show that in fact the iteration is stable.

Note the similarity between our stability analysis dealing with $||A^{-1}||$ and now $G = M^{-1} N$ which is similar but not identical.

For Jacobi the splitting is $$ M = -\frac{2}{\Delta x^2} I, ~~\text{and}~~N = -\frac{1}{\Delta x^2} \begin{bmatrix} 0 & 1 & \\ 1 & 0 & 1 \\ & \ddots & \ddots & \ddots \\ & & 1 & 0 & 1 \\ & & & 1 & 0 \end{bmatrix} $$ (sticking to the Poisson problem). $M$ is now a diagonal matrix and easy to solve.

For Gauss-Seidel we have $$ M = \frac{1}{\Delta x^2} \begin{bmatrix} -2 & & \\ 1 & -2 & \\ & \ddots & \ddots \\ & & 1 & -2 & \\ & & & 1 & -2 \end{bmatrix} ~~~ \text{and} ~~~ N = -\frac{1}{\Delta x^2} \begin{bmatrix} 0 & 1 & \\ & 0 & 1 \\ & & \ddots & \ddots \\ & & & 0 & 1\\ & & & & 0 \end{bmatrix} $$

Stopping Criteria

How many iterations should we perform? Let $E^{(k)}$ represent the error present at step $k$. If we want to reduce the error at the first step $E^{(0)}$ to order $\epsilon$ then we have $$ ||E^{(k)}|| \approx \epsilon ||E^{(0)}||. $$

Under suitable assumption we can bound the error in the 2-norm as $$ ||E^{(k)}|| \leq \rho(G)^k ||E^{(0)}||. $$

Moving back to our estimate of the number of iterations we can combine our two expressions involving the error $E$ by taking $\Delta x \rightarrow 0$ which allows us to write $$ k \approx \frac{\log \epsilon}{\log \rho(G)} $$ taking into account error convergence.

Picking $\epsilon$ is a bit tricky but one natural criteria to use would be $\epsilon = \mathcal{O}(\Delta x^2)$ since our original discretization was 2nd-order accurate. This leads to $$ k = \frac{2 \log \Delta x}{\log \rho}. $$ This also allows us to estimate the total number of operations that need to be used.

For Jacobi we have the spectral radius of $G$ as $$ \rho_J \approx 1 - \frac{1}{2} \pi^2 \Delta x^2. $$ so that $$ k = \mathcal{O}(m^2 \log m) ~~~\text{as}~~~ m \rightarrow \infty $$ where $m$ here is now the number of points used.

Combining this with the previous operation count per iteration we find that Jacobi would lead to $\mathcal{O}(m^3 \log m)$ work which is not very promising.

For two dimensions we have $\mathcal{O}(m^4 \log m)$ so even compared to Gaussian elimination this approach is not ideal.

What about Gauss-Seidel? Here the spectral radius is approximately $$ \rho_{GS} \approx 1 - \pi^2 \Delta x^2 $$ so that $$ k = \frac{2 \times \log \Delta x}{\log (1 - \pi^2 \Delta x^2)} $$ which still does not lead to any advantage over direct solvers. It does show that Gauss-Seidel does actually converge faster due to the factor of 2 difference between $\rho_J$ and $\rho_{GS}$.

Successive Overrelaxation (SOR)

Well that's a bit dissapointing isn't it? These iterative schemes do not seem to be worth much but it turns out we can do better with a slight modification to Gauss-Seidel.

If you look at Gauss-Seidel iteration it turns out it moves $U$ in the correct direction to $u$ but is very conservative in the amount. If instead we do \begin{align*} U^{GS}_i &= \frac{1}{2} \left(U^{(k+1)}_{i-1} + U^{(k)}_{i+1} - \Delta x^2 f_i\right) \\ U^{(k+1)}_i &= U_i^{(k)} + \omega \left( U_i^{GS} - U_i^{(k)}\right ) \end{align*} where we get to pick $\omega$ we can do much better.

If $\omega = 1$ then we get back Gauss-Seidel.

If $\omega < 1$ we move even less and converges even more slowly (although is sometimes used for multigrid under the name underrelaxation).

If $\omega > 1$ then we move further than Gauss-Seidel suggests and any method where $\omega > 1$ is known as successive overrelaxation (SOR).

We can write this as a matrix splitting method as well. We can combine the two-step formula above to find $$ U^{(k+1)}_i = \frac{\omega}{2} \left( U^{(k+1)}_{i-1} + U^{(k)}_{i+1} - \Delta x^2 f_i \right ) + (1 - \omega) U_i^{(k)} $$ corresponding to a matrix splitting of $$ M = \frac{1}{\omega} (D - \omega L) ~~~\text{and}~~~ N = \frac{1}{\omega} ((1-\omega) D + \omega U) $$ where $D$ is the diagonal of the matrix $A$, and $L$ and $U$ are the lower and upper triangular parts without the diagonal of $A$.

It can be shown that picking an $\omega$ such that $0 < \omega < 2$ the SOR method converges.

It turns out we can also find an optimal $\omega$ for a wide class of problems. For Poisson problems in any number of space dimensions for instance it can be shown that SOR method converges optimaly if $$ \omega_{opt} = \frac{2}{1 + \sin(\pi \Delta x)} \approx 2 - 2 \pi \Delta x. $$

What about the number of iterations? We can follow the same tactic as before with the spectral radius of $G_{SOR}$ now $$ \rho = \omega_{opt} - 1 \approx 1 - 2 \pi \Delta x. $$

This leads to an iteration count of $$ k = \mathcal{O}(m \log m) $$ an order of magnitude better than Gauss-Seidel alone!


In [ ]:
# Problem setup
a = 0.0
b = 1.0
u_a = 0.0
u_b = 3.0
f = lambda x: numpy.exp(x)
u_true = lambda x: (4.0 - numpy.exp(1.0)) * x - 1.0 + numpy.exp(x)

# Descretization
m = 50
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# SOR parameter
omega = 2.0 / (1.0 + numpy.sin(numpy.pi * delta_x))

# Expected iterations needed
iterations_SOR = int(2.0 * numpy.log(delta_x) / numpy.log(omega - 1.0))

# Solve system
# Initial guess for iterations
U = numpy.zeros(m + 2)
U[0] = u_a
U[-1] = u_b
convergence_SOR = numpy.zeros(iterations_SOR)
for k in xrange(iterations_SOR):
    for i in xrange(1, m + 1):
        U_gs = 0.5 * (U[i-1] + U[i+1] - delta_x**2 * f(x_bc[i]))
        U[i] += omega * (U_gs - U[i])

    convergence_SOR[k] = numpy.linalg.norm(u_true(x_bc) - U, ord=2)
        
# Plot result
fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.plot(x_bc, U, 'o', label="Computed")
axes.plot(x_bc, u_true(x_bc), 'k', label="True")
axes.set_title("Solution to $u_{xx} = e^x$")
axes.set_xlabel("x")
axes.set_ylabel("u(x)")

fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.semilogy(range(iterations_SOR), convergence_SOR, 'o')
axes.semilogy(range(iterations_SOR), numpy.ones(iterations_SOR) * delta_x**2, 'r--')
axes.set_title("Convergence of SOR Iterations for Poisson Problem")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")
plt.show()

In [ ]:
# Plotting all the convergence rates
fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.semilogy(range(iterations_SOR), convergence_J[:iterations_SOR], 'r', label="Jacobi")
axes.semilogy(range(iterations_SOR), convergence_GS[:iterations_SOR], 'b', label="Gauss-Seidel")
axes.semilogy(range(iterations_SOR), convergence_SOR, 'k', label="SOR")
axes.semilogy(range(iterations_SOR), numpy.ones(iterations_SOR) * delta_x**2, 'r--')
axes.legend(loc=6)
axes.set_title("Comparison of Convergence Rates")
axes.set_xlabel("Iteration")
axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")
plt.show()

Descent Methods

One special case of matrices are amenable to another powerful way to iterate to the solution. A matrix is said to be symmetric positive definite (SPD) if $$ x^T A x > 0 ~~~ \forall ~~~ x \neq 0. $$

Check to see if $$ A = \begin{bmatrix} 2 &-1 &0 &0 \\ -1 & 2 & -1 & 0 \\ 0 & -1 & 2 & -1 \\ 0 & 0 & -1 & 2 \end{bmatrix} $$ is symmetric positive definite.

Now define a function $\phi: \mathbb R^m \rightarrow \mathbb R$ such that $$ \phi(u) = \frac{1}{2} u^T A u - u^T f. $$ This is a quadratic function in the variables $u_i$ and in the case where $m = 2$ forms a parabolic bowl. Since this is a quadratic function there is a unique minimum, $u^\ast$.

Lets see how approaching the problem like this helps us:

For the $m = 2$ case write the function $\phi(u)$ out.

$$ \phi(u) = \frac{1}{2} (A_{11} u_1^2 + A_{12} u_1 u_2 + A_{21} u_1 u_2 + A_{22} u^2_2) - u_1 f_1 - u_2 f_2 $$

What property of the matrix $A$ simplifies the expression above?

Symmetry! This implies that $A_{21} = A_{12}$ and the expression above simplifies to $$ \phi(u) = \frac{1}{2} (A_{11} u_1^2 + 2 A_{12} u_1 u_2 + A_{22} u^2_2) - u_1 f_1 - u_2 f_2 $$

Now write two expressions that when evaluated at $u^\ast$ are identically 0 that express that $u^\ast$ minimizes $\phi(u)$.

Since $u^\ast$ minimizes $\phi(u)$ we know that the first derivatives should be zero at the minimum: $$\begin{aligned} \frac{\partial \phi}{\partial u_1} &= A_{11} u_1 + A_{12} u_2 - f_1 = 0 \\ \frac{\partial \phi}{\partial u_1} &= A_{21} u_1 + A_{22} u_2 - f_2 = 0 \end{aligned}$$ Note that these equations can be rewritten as $$ A u = f. $$

Therefore $\min \phi$ is equivalent to solving $A u = f$!

This is a common type of reformulation for many problems where it may be easier to treat a given equation as a minimization problem rather than directly solve it.

Note that this is not quite the matrix that we have been using for our Poisson problem so far which is actually symmetric negative definite although these same methods work as well. In this case we actually want to find the maximum of $\phi$ instead, other than that everything is the same.

Also note that if $A$ is indefinite then the eigenvalues of $A$ will change sign and instead of a stable minimum or maximum we have a saddle point which are much more difficult to handle (GMRES can for instance).

Method of Steepest Descent

So now we turn to finding the $u^\ast$ that minimizes the function $\phi(u)$. The simplest approach to this is called the method of steepest descent which finds the direction of the largest gradient of $\phi(u)$ and goes in that direction.

Mathematically we then have $$ u^{(k+1)} = u^{(k)} - \alpha^{(k)} \nabla \phi(u^{(k)}) $$ where $\alpha^{(k)}$ will be the step size chosen in the direction we want to go.

We can find $\alpha$ by $$ \alpha^{(k)} = \min_{\alpha \in \mathbb R} \phi\left(u^{(k)} - \alpha \nabla \phi(u^{(k)}\right), $$ i.e. the $\alpha$ that takes us just far enough so that if we went any further $\phi$ would increase.

This implies that $\alpha^{(k)} \geq 0$ and $\alpha^{(k)} = 0$ only if we are at the minimum of $\phi$. We can compute the gradient of $\phi$ as $$ \nabla \phi(u^{(k)}) = A u^{(k)} - f \equiv -r^{(k)} $$ where $r^{(k)}$ is the residual defined as $$ r^{(k)} = f - A u^{(k)}. $$

Looking back at the definition of $\alpha^{(k)}$ then leads to the conclusion that the $\alpha$ that would minimize the expression would be the one that satisfies $$ \frac{\text{d} \phi(\alpha)}{\text{d} \alpha} = 0. $$

To find this note that $$ \phi(u + \alpha r) = \left(\frac{1}{2} u^T A u - u^T f \right) + \alpha(r^T A u - r^T f) + \frac{1}{2} \alpha^2 r^T A r $$ so that the derivative becomes $$ \frac{\text{d} \phi(\alpha)}{\text{d} \alpha} = r^T A u - r^T f + \alpha r^T A r $$

Setting this to zero than leads to $$ \alpha = \frac{r^T r}{r^T A r}. $$


In [ ]:
# Problem setup
a = 0.0
b = 1.0
u_a = 0.0
u_b = 3.0
f = lambda x: numpy.exp(x)
u_true = lambda x: (4.0 - numpy.exp(1.0)) * x - 1.0 + numpy.exp(x)

# Descretization
m = 50
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# Construct matrix A
A = numpy.zeros((m, m))
diagonal = numpy.ones(m) / delta_x**2
A += numpy.diag(diagonal * 2.0, 0)
A += numpy.diag(-diagonal[:-1], 1)
A += numpy.diag(-diagonal[:-1], -1)

# Construct right hand side
b = -f(x)
b[0] += u_a / delta_x**2
b[-1] += u_b / delta_x**2

# Algorithm parameters
MAX_ITERATIONS = 10000
tolerance = 1e-3

# Solve system
U = numpy.empty(m)
convergence_SD = numpy.zeros(MAX_ITERATIONS)
success = False
for k in xrange(MAX_ITERATIONS):
    r = b - numpy.dot(A, U)
    if numpy.linalg.norm(r, ord=2) < tolerance:
        success = True
        break
        
    alpha = numpy.dot(r, r) / numpy.dot(r, numpy.dot(A, r))
    U = U + alpha * r

    convergence_SD[k] = numpy.linalg.norm(u_true(x) - U, ord=2)
        
if not success:
    print "Iteration failed to converge!"
    print convergence_SD[-1]
else:
    # Plot result
    fig = plt.figure()
    axes = fig.add_subplot(1, 1, 1)
    axes.plot(x, U, 'o', label="Computed")
    axes.plot(x_bc, u_true(x_bc), 'k', label="True")
    axes.set_title("Solution to $u_{xx} = e^x$")
    axes.set_xlabel("x")
    axes.set_ylabel("u(x)")

    fig = plt.figure()
    axes = fig.add_subplot(1, 1, 1)
    axes.semilogy(range(k), convergence_SD[:k], 'o')
    axes.semilogy(range(k), numpy.ones(k) * delta_x**2, 'r--')
    axes.set_title("Convergence of Steepest Descent for Poisson Problem")
    axes.set_xlabel("Iteration")
    axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")
    plt.show()

Convergence of Steepest Descent

What controls the convergence of steepest descent? It turns out that the shape of the parabolic bowl formed by $\phi$ is the major factor determining the convergence of steepest descent.

For example, if $A$ is a scalar multiple of the identity than these ellipses are actually circles and steepest descent converges in $m$ steps. If $A$ does not lead to circles, the convergence is based on the ratio between the semi-major and semi-minor axis of the resulting ellipses $m$ dimensional ellipses.

This is controlled by the smallest and largest eigenvalues of the matrix $A$ hence why steepest descent grows increasingly difficult as $m$ increases for the Poisson problem. Note that this also relates to the condition number of the matrix in the $\ell_2$ norm.

Each of these ellipses is related to the eigenstructure of $A$ such that $$ A v_j - f = \lambda_j (v_j - u^\ast) $$ for some $\lambda_j$. Knowing that $A u^\ast = f$ leads to $$ A (v_j - u^\ast) = \lambda_j (v_j - u^\ast) $$ therefore $v_j - u^\ast$ form the eigenvectors of the matrix $A$ with corresponding eigenvalues $\lambda_j$.

If a particular set of $\lambda_j$s are not distinct than the ellipse is in fact a circle, demonstrating that any direction pointing to $u^\ast$ in this direction is an eigenvector (non-unique in this sub-space).

We can also relate this eigenstructure and geometric arguments to the matrix's condition number $\kappa$. Let $v_1$ and $v_2$ be vectors that lie along the curve $\phi(u) = 1$, we then have $$ \frac{1}{2} v^T_j A v_j - v_j^T A u^\ast = 1. $$ Combining this expression with our previous eigenvector expression and taking the inner-product with the eigenvector $v_j - u^\ast$ leads to $$ ||v_j - u^\ast||^2_2 = \frac{2 + (u^\ast)^T A u^\ast}{\lambda_j}. $$ Now turning back to $v_1$ and $v_2$ we have their ratios as $$ \frac{||v_1 - u^\ast||_2}{||v_2 - u^\ast||_2} = \sqrt{\frac{\lambda_2}{\lambda_1}} = \sqrt{\kappa_2(A)}. $$ This last expression tells us that the more ellipsoidal these sub-spaces are the more difficult it will be to solve $A u^\ast = f$.

$A$-Conjugate Search Directions and Conjugate Gradient

An alternative to steepest descent is to choose a slightly different direction to descend down. Generalizing our step from above let the iterative scheme be
$$ u^{(k+1)} = u^{(k)} - \alpha^{(k)} p^{(k)} $$ and as before we want to pick an $\alpha$ such that $$ \min_{\alpha} \phi(u^{(k)} - \alpha p^{(k)}) $$ leading again to the choice of $\alpha$ of $$ \alpha^{(k)} = \frac{(p^{(k)})^T p^{(k)}}{(p^{(k)})^T A p^{(k)}} $$ accept now we are also allowed to pick the search direction $p^{(k)}$.

Ways to choose $p^{(k)}$:

  • One bad choice for $p$ would be orthogonal to $r$ since this would then be tangent to the level set (ellipse) of $\phi(u^{(k)})$ and would cause it to only increase so we want to make sure that $p^T r \neq 0$ (the inner-product).
  • We also want to still move downwards so require that $\phi(u^{(k+1)}) < \phi(u^{(k)})$.

We know that $r^{(k)}$ is not always the best direction to go in but what might be better?

We could head directly for the minimum but how do we do that without first knowing $u^\ast$?

It turns out when $m=2$ we can do this an from any initial guess $u^{(0)}$ and initial direction $p^{(k)}$ we will arrive at the minimum in 2 steps if we choose the second search direction dependent on $$ (p^{(1)})^T A p^{(0)} = 0. $$ In general if two vectors satisfy this property they are said to be $A$-conjugate.

Note that if $A = I$ then these two vectors would be orthogonal to each other and in this sense $A$-conjugacy is a natural extension from orthogonality and the simple case from before where the ellipses are circles to the case where we can have very distorted ellipses.

In fact the vector $p^{(0)}$ is tangent to the level set that $u^{(1)}$ lies on and therefore choosing $p^{(1)}$ so that it is $A$-conjugate to $p^{(0)}$ then always heads to the center of the ellipse.

To show this, take the initial direction $p^{(0)}$ and note that it is tangent to the level set of $\phi$ at $u_1$ so $p^{(0)}$ is orthogonal to the residual $$ r_1 = f - A u_1 = A(u^\ast - u_1) \Rightarrow (p^{(0)})^T A (u^\ast - u_1) = 0. $$ The fact that the initial guess minimizes the residual in its direction is an important idea we will see again.

In other words, once we know a tangent to one of the ellipses we can always choose a direction that minimizes in one of the dimensions of the search space. Choosing the $p^{(k)}$ iteratively this way forms the basis of the conjugate gradient method.

Now to generalize beyond $m = 2$ consider the $m=3$ case. As stated before we are now in a three-dimensional space where the level-sets are concentric ellipsoids. Taking a slice through this space will lead to an ellipse on the slice.

  1. Start with an initial guess $u^{(0)}$ and choose a search direction $p^{(0)}$.
  2. Minimize $\phi(u)$ in the direction $u^{(0)} + \alpha p^{(0)}$ resulting in the choice $$ \alpha^{(0)} = \frac{(p^{(0)})^T p^{(0)}}{(p^{(0)})^T A p^{(0)}} $$ as we saw before. Now set $u^{(1)} = u^{(0)} + \alpha^{(0)} p^{(0)}$.
  3. Choose $p^{(1)}$ to be $A$-conjugate to $p^{(0)}$. In this case there are an infinite set of vectors that are possible that satisfy $(p^{(1)})^T A p^{(0)} = 0$. Beyond requiring $p^{(1)}$ to be $A$-conjugate we also want it to be linearly-independent to $p^{(0)}$.
  4. Again choose an $\alpha^{(1)}$ that minimizes the residual (again tangent to the level sets of $\phi$) in the direction $p^{(1)}$ and repeat the process

So why does this work so much better than steepest descent?

Since we have chosen $p^{(0)}$ and $p^{(1)}$ to linearly independent they span a two-dimensional subspace in $\mathbb R^m$. We then choose to travel through this subspace as to minimize the residual in this space. Geometrically we have cut out space with this plane leading to a two-dimensional subspace with concentric ellipses representing the level sets of $\phi$. We then choose an $\alpha$ so that we arrive at the center of these ellipses.


In [ ]:
# Problem setup
a = 0.0
b = 1.0
alpha = 0.0
beta = 3.0
f = lambda x: numpy.exp(x)
u_true = lambda x: (4.0 - numpy.exp(1.0)) * x - 1.0 + numpy.exp(x)

# Descretization
m = 100
x_bc = numpy.linspace(a, b, m + 2)
x = x_bc[1:-1]
delta_x = (b - a) / (m + 1)

# Construct matrix A
A = numpy.zeros((m, m))
diagonal = numpy.ones(m) / delta_x**2
A += numpy.diag(diagonal * 2.0, 0)
A += numpy.diag(-diagonal[:-1], 1)
A += numpy.diag(-diagonal[:-1], -1)

# Construct right hand side
b = -f(x)
b[0] += alpha / delta_x**2
b[-1] += beta / delta_x**2

# Algorithm parameters
MAX_ITERATIONS = 2 * m
tolerance = 1e-1

# Solve system
U = numpy.zeros(m)
convergence_CG = numpy.zeros(MAX_ITERATIONS)
residual_norm = numpy.zeros(MAX_ITERATIONS)
success = True
r = b - numpy.dot(A, U)
p = r
for k in xrange(MAX_ITERATIONS):
    w = numpy.dot(A, p)
    r_dot = numpy.dot(r, r)
    alpha =  r_dot / numpy.dot(p, w)
    U += alpha * p
    r -= alpha * w
    beta = numpy.dot(r, r) / r_dot
    p = r + beta * p

    convergence_CG[k] = numpy.linalg.norm(delta_x * (u_true(x) - U), ord=1)
    residual_norm[k] = numpy.linalg.norm(r)

if not success:
    print "Iteration failed to converge!"
    print residual_norm[-10:]
    print convergence_CG[-10:]
else:
    # Plot result
    fig = plt.figure()
    axes = fig.add_subplot(1, 1, 1)
    axes.plot(x, U, 'o', label="Computed")
    axes.plot(x_bc, u_true(x_bc), 'k', label="True")
    axes.set_title("Solution to $u_{xx} = e^x$")
    axes.set_xlabel("x")
    axes.set_ylabel("u(x)")

    fig = plt.figure()
    axes = fig.add_subplot(1, 1, 1)
    axes.semilogy(numpy.arange(1, MAX_ITERATIONS + 1), convergence_CG, 'o')
    axes.semilogy(numpy.arange(1, MAX_ITERATIONS + 1), numpy.ones(MAX_ITERATIONS) * delta_x**2, 'r--')
    axes.set_ylim((1e-6, 1.0))
    axes.set_title("Convergence of Conjugate Gradient for Poisson Problem")
    axes.set_xlabel("Iteration")
    axes.set_ylabel("$||U^{(k)} - U^{(k-1)}||_2$")

    fig = plt.figure()
    axes = fig.add_subplot(1, 1, 1)
    axes.semilogy(numpy.arange(1, MAX_ITERATIONS + 1), residual_norm, 'o')
#     axes.set_ylim((1e-6, 1.0))
    axes.set_title("Residual")
    axes.set_xlabel("Iteration")
    axes.set_ylabel("$||r||_2$")
    
    plt.show()

The conjugate gradient algorithm requires $\mathcal{O}(m^2)$ operations and in theory $m$ iterates to achieve the exact solution. In practice however it generally can converge much faster, especially when using preconditioning. This is especially true for the two-dimensional, 5-point Laplacian stencil we have seen already.

Krylov Subspaces

We can say a lot about the set of vectors $p^{(k)}$ that span the space our level set $\phi$ lies in starting with the following theorem.

Theorem The vectors generated in the CG algorithm have the following properties, provided $r^{(k)} \neq 0$:

  1. $p^{(k)}$ is $A$-conjugate to all the previous search directions
  2. The residual $r^{(k)}$ is orthogonal to all the previous residuals $(r^{(k)})^T r_j = 0 ~~\forall j=0,1,\ldots$
  3. The following three subspaces of $\mathbb R^m$ are identical $$\begin{aligned} &\text{span} (p^{(0)}, p^{(1)}, \ldots, p^{(k-1)}) \\ &\text{span} (r^{(0)}, A r^{(0)}, A^2 r^{(0)}, \ldots, A^{k-1} r^{(0)}) \\ &\text{span} (A e^{(0)}, A^2 e^{(0)}, A^3 e^{(0)}, \ldots, A^{k} e^{(0)}) \end{aligned}$$

The subspace $\mathcal{K}_k = \text{span} (r^{(0)}, A r^{(0)}, A^2 r^{(0)}, \ldots, A^{k-1} r^{(0)})$ spanned by the vector $r^{(0)}$ and the first $k-1$ powers of $A$ to $r^{(0)}$ is called a Krylov space of dimension $k$ associated with $r^{(0)}$.

The new update $u^{(k)}$ is formed by adding multiples of the $p^{(j)}$ to the initial guess $u^{(0)}$ and therefore lies in the subspace $u_0 + \mathcal{K}_k$. Another way to write this would be to say $u^{(k)} - u^{(0)} \in \mathcal{K}_k$.

Convergence of Conjugate Gradient

We now turn to the convergence of CG and to deriving estimates about the size of the error at a given $k$ and the rate of convergence.

First define the $A$-norm s.t. $$ ||e||_A = \sqrt{e^T A e} $$ relying on the fact that $A$ is SPD.

This leads to $$\begin{aligned} ||e||^2_A &= (u - u^\ast)^T A (u - u^\ast) \\ &=u^T A u - 2 u^T A u^\ast + (u^\ast)^T A u^\ast \\ &= 2 \phi(u) + (u^\ast)^T A u^\ast \end{aligned}$$

Given that $(u^\ast)^T A u^\ast$ is a constant minimizing the error $||e||_A$ is equivalent to minimizing $\phi(u)$.

We know we can expand $u^{(k)}$ in the subpsace $u_0 + \mathcal{K}_k$ so that $$ u^{(k)} = u^{(0)} + \alpha^{(0)} p^{(0)} + \alpha^{(1)} p^{(1)} + \cdots + \alpha^{(k-1)} p^{(k-1)} $$ and subtracting $u^\ast$ we find $$ u^{(k)} - u^\ast = e^{(k)} = e^{(0)} + \alpha^{(0)} p^{(0)} + \alpha^{(1)} p^{(1)} + \cdots + \alpha^{(k-1)} p^{(k-1)} $$ which relates $e^{(k)}$ and $e^{(0)}$ and implies that $e^{(k)} - e^{(0)} \in \mathcal{K}_k$.

By our theorem above we also know that $e^{(k)} - e^{(0)}$ lies in $$ \text{span} (A e^{(0)}, A^2 e^{(0)}, A^3 e^{(0)}, \ldots, A^{k} e^{(0)}) $$ so that $$ e^{(k)} = e^{(0)} + c_1 A e^{(0)} + c_2 A^2 e^{(0)} + \cdots + c_k A^k e^{(0)} $$ where $c_j$ are some coefficients (found by taking the projection of $e^{(k)}$ onto the span).

This then leads to $$ e^{(k)} = P_k(A) e^{(0)} $$ where $$ P_k(A) = I + c_1 A + c_2 A^2 + \cdots + c_k A^k. $$ Note that $P_k(A)$ is a polynomial in $A$. We can relate this to a scalar polynomial by letting $$ P_k(x) = 1 + c_1 x + c_2 x^2 + \cdots +c_k x^k $$ so that $P_k \in \mathcal{P}_k$ where $\mathcal{P}_k$ are polynomials of degree at most $k$ and $P(0) = 1.

CG implicitly constructs the polynomial $P_k$ solves the minimization problem $$ \min_{P \in \mathcal{P}_k} ||P(A) e^{(0)}||_A. $$

What is a polynomial of matrices?

If we know our matrix is diagonalizable we can write this diagonalization as $$ A = V \Lambda V^{-1}. $$

If we take powers of $A$ then we find $$ A^k = (V \Lambda V^{-1})^k = V \Lambda V^{-1} V \Lambda V^{-1} \cdots V \Lambda V^{-1} = V \Lambda^k V^{-1}. $$

Polynomials in $A$ then can be written as $$ P_k(A) = V P_k(\Lambda) V^{-1} $$ so that $$ P_k(\Lambda) = \begin{bmatrix} P_k(\lambda_1) \\ & P_k(\lambda_2) \\ & & P_k(\lambda_3) \\ & & & \ddots \\ & & & & P_k(\lambda_m) \end{bmatrix}. $$

Now going back to our expression for the error we have $$ e^{(k)} = P_k(A) e^{(0)}. $$ If the polynomial $P_k(A)$ has a root at each $\lambda_k$ then $P_k(\Lambda)$ is the zero matrix and the above expression for the error leads us to the conclusion that $e^{(k)} = 0$. If the eigenvalues are repeated (say we have $n$ unique eigenvalues, this is known as geometric multiplicity) then we can say that there is again a polynomial $P_n \in \mathcal{P}_n$ that also has these roots and we would expect $e^{(n)} = 0$, we converge faster than the dimension of $A$!

Say we want to know how big the error is before the iteration that guarantees convergence? We want to then know how $||e^{(0)}||_A$ behaves. It turns out for any $P \in \mathcal{P}$ we have $$ \frac{||P(A)e^{(0)}||_A}{||e^{(0)}||_A} \leq \max_{1 \leq j \leq m} |P(\lambda_j)|. $$ We then need to find one polynomial $\hat{P}_k \in \mathcal{P}_k$ that we can obtain a useful bound on $||e^{(k)}||_A / ||e^{(0)}||_A.

It turns out shifted and scaled versions of the Chebyshev polynomials $T_k(x)$ will work for this! Setting $$ \hat{P}_k(x) = \frac{T_k\left(\frac{\lambda_m + \lambda_1 - 2x}{\lambda_m - \lambda_1} \right )}{T_k\left(\frac{\lambda_m + \lambda_1}{\lambda_m - \lambda_1}\right ) } $$

and compute $\max_{1\leq j \leq m} |\hat{P}_k(\lambda_j)|$.

With some analysis (refer to LeVeque for a full derivation) we can show that $$ T_k(x) = \frac{1}{2} \left[ \left ( \frac{\sqrt{\kappa} + 1}{\sqrt{\kappa} - 1} \right)^k + \left ( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k \right ] $$

$$ \frac{||P(A) e^{(0)}||_A}{||e^{(0)}||_A} \leq 2 \left[ \left ( \frac{\sqrt{\kappa} + 1}{\sqrt{\kappa} - 1} \right)^k + \left ( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k \right ]^{-1} \leq 2 \left ( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k $$

If the condition number of the matrix $\kappa$ is large we can also simplify our bound to $$ 2 \left ( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k \approx 2 \left ( 1 - \frac{2}{\sqrt{\kappa}} \right)^k \approx 2 e^{-2 k / \sqrt{\kappa}}. $$ This implies that the expected number of iterations $k$ to reach a desired tolerance will be $k = \mathcal{O}(\sqrt{\kappa})$.

Preconditioning

One way to get around the difficulties with these types of methods due to the distortion of the ellipses (and consequently the conditioning of the matrix) is to precondition the matrix. The basic idea is that we take our original problem $A u = f$ and instead solve $$ M^{-1} A u = M^{-1} f. $$

Note that since we need to find the inverse of $M$, this matrix should be nice. A couple of illustrative examples may help to illustrate why this might be a good idea:

  • If $M = A$ then we essentially have solved our problem already although that does not help us much
  • If $M = \text{diag}(A)$, then $M^{-1}$ is easily computed and it turns out for some problems this can decrease the condition number of $M^{-1} A$ significantly. Note though that this is not actually helpful in the case of the Poisson problem.
  • If $M$ is based on another iterative method used on $A$, for instance Gauss-Seidel, these can be effective general use preconditioners for many problems.

So the next question then becomes how to choose a preconditioner. This is usually very problem specific and a number of papers suggest strategies for particular problems. In general we want to move the eigenvalues around so that the matrix is not as ill-conditioned. Combining a preconditioner with CG leads to the popular PCG method.

Generalized Minimum Residual (GMRES) Algorithm

What if our system is not symmetric positive (negative) definite?

Panic!

Not really. GMRES is one approach to solving this problem for us. Since we can no longer depend on the structure of the related scalar minimization problem we instead will minimize the residual in the same subspaces we had before using a least-squares approach.

In the $k$th iteration GMRES solves a least squares problem in a particular subspace of the full problem. In this case we will use the spaces we saw before, $u_0 + \mathcal{\kappa}_k$, where $\mathcal{\kappa}$ is defined as $$ \kappa_k = \text{span} (r^{(0)}, A r^{(0)}, A^2 r^{(0)}, \ldots, A^{k-1} r^{(0)}). $$

To do this we construct a matrix $Q$ that contains in its columns a set of orthonormal vectors that span the space $\kappa_k$. Since this is iterative each time we increment $k$ we only need to find one additional vector for $Q$ to span the next space.

Starting with the $k$th iterate we have $$ Q = [q_1~q_2~q_3~\cdots~q_k] \in \mathbb R^{m \times k}. $$ Pick a vector $v_j \notin \kappa_k$ and use Gram-Shmidt (or something similar) to orthogonalize this vector to all the rest of the $q_i$.

What should we choose for $v_j$?

There are some bad choices such as $v_k = A^k r^{(k)}$ (this works but leads to very small singular values).

Instead we will choose $v_k = A q_k$. In effect we are building up a factorization of the matrix $A$. This process is called the Arnoldi process.

Before diving into the specifics of the Arnoldi process let's show the general algorithm for Arnoldi and where GMRES sits.


In [ ]:
"""
k: iterations
m: size of A
"""

k = 10
m = 15
x = numpy.linspace(0, 1, m)
f = numpy.sin(x)

A = numpy.eye(m)
H = numpy.zeros((k + 1, k))
Q = numpy.zeros((m, k + 1))
U = numpy.ones(m)

r = f - numpy.dot(A, U)
Q[:, 0] = r / numpy.linalg.norm(r, ord=2)

for j in xrange(k):
    v = numpy.dot(A, Q[:, j])
    for i in xrange(j):
        H[i, j] = numpy.dot(Q[:, i], v)
        v = v - numpy.dot(H[i, j], Q[:, i])
    H[j + 1, j] = numpy.linalg.norm(v, ord=2)
    Q[:, j + 1] = v / H[j + 1, j]

for q in Q.T:
    print numpy.linalg.norm(q, ord=2)
    
    # GMRES - Check residual R[:, k] to see if it is small enough yet
    #         If it is then halt and compute U

Arnoldi Process

After $k$ iterations we have $$ Q_k = [q_1~q_2~\cdots~q_k] \in \mathbb R^{m \times k}, ~~~\text{and}~~~Q_{k+1} = [Q_k q_{k+1}] \in \mathbb R^{m \times k+1} $$ which form a basis for $\kappa_k$ and $\kappa_{k+1} respectively.

Consider the upper Hessenberg matrix consisting of the values $h_{i j} = q_i^T v_j$ $$ H_k = \begin{bmatrix} h_{11} & h_{12} & h_{13} & \cdots & h_{1,k-1} & h_{1k} \\ h_{21} & h_{22} & h_{23} & \cdots & h_{2,k-1} & h_{2k} \\ & h_{32} & h_{33} & \cdots & h_{3,k-1} & h_{3k} \\ & & \ddots & \ddots & & \vdots \\ & & & & h_{k,k-1} & h_{kk} \\ \end{bmatrix}. $$ Similarly we need the matrix $\hat{H}_k \in \mathbb R^{k+1\times k}$ which is $H_k$ with an additional row that is all zeros except for the $h_{k+1,k}$ value.

Multigrid