Iterative methods

We explore in this notebook some iterative techniques for linear algebra problems. The implementations are written to expose how the methods work and are not optimised. Production-level implementations typically involve some specialised optimisations.

This notebook illustrates:

  • Power iteration for eigenvalue problems
  • Stationary iterative methods
  • Conjugate gradient method

In [1]:
import numpy as np

Power iteration with the Rayleigh quotient

Define a complex matrix:


In [2]:
A = np.array([[3 + 1j, 4 - 2j, 1], [0, 0, 2], [1, 2, 1]])
print(A)


[[3.+1.j 4.-2.j 1.+0.j]
 [0.+0.j 0.+0.j 2.+0.j]
 [1.+0.j 2.+0.j 1.+0.j]]

Compute eigenvalues directly:


In [3]:
evals = np.linalg.eigvals(A)
print("Exact evals: {}".format(evals))


Exact evals: [ 4.16616079+0.30616886j  1.1450385 +1.07245406j -1.31119928-0.37862293j]

Perform power iterartions for the non-symmetric case:


In [4]:
x0 = np.ones(3) + 1j*np.ones(3) 
print(x0)
for i in range(10): 
    x0 = A.dot(x0)
    x0/np.linalg.norm(x0, 1)
    
    eig_est = np.vdot(x0, A.dot(x0)) / np.vdot(x0, x0)
    print("Eigen estimate: {}".format(eig_est))
    print("  Error: {}".format(np.abs(eig_est - evals[0])))


[1.+1.j 1.+1.j 1.+1.j]
Eigen estimate: (4.411764705882353+0.4823529411764706j)
  Error: 0.30226166782032027
Eigen estimate: (4.274985215848611+0.44115907746895333j)
  Error: 0.17339295024565596
Eigen estimate: (4.149734671696183+0.36011124608400996j)
  Error: 0.05638792535742764
Eigen estimate: (4.154693679802965+0.32321630219933056j)
  Error: 0.020545310054164014
Eigen estimate: (4.157275770957634+0.3066270836230514j)
  Error: 0.00889682253063115
Eigen estimate: (4.163817844426+0.30491147522930784j)
  Error: 0.0026590208161287095
Eigen estimate: (4.165704995467151+0.30499886001330195j)
  Error: 0.0012556459594032234
Eigen estimate: (4.166308905876026+0.3058039481081458j)
  Error: 0.0003938287015878552
Eigen estimate: (4.166294270115598+0.30606988902622023j)
  Error: 0.00016617337982425485
Eigen estimate: (4.1662199892726335+0.30617990019925745j)
  Error: 6.0224190780134954e-05

Now make the matrix Hermitian and perform power iterations:


In [5]:
A = 0.5*(A + A.T.conjugate())
print(A)

evals = np.linalg.eigvals(A)
print("Exact evals: {}".format(evals))
x0 = np.ones(3) + 1j*np.ones(3) 
for i in range(10): 
    x0 = A.dot(x0)
    x0/np.linalg.norm(x0, 1)
    
    eig_est = np.vdot(x0, A.dot(x0)) / np.vdot(x0, x0)
    print("Eigen estimate: {}".format(eig_est))
    print("  Error: {}".format(np.abs(eig_est - evals[0])))


[[3.+0.j 2.-1.j 1.+0.j]
 [2.+1.j 0.+0.j 2.+0.j]
 [1.+0.j 2.+0.j 1.+0.j]]
Exact evals: [ 5.03522526-7.84639221e-17j  0.91602974+1.05371511e-16j
 -1.951255  -8.24187401e-17j]
Eigen estimate: (5.014285714285714+0j)
  Error: 0.020939545834771423
Eigen estimate: (5.033371040723982+0j)
  Error: 0.0018542193965034315
Eigen estimate: (5.034989846696271+0j)
  Error: 0.00023541342421484757
Eigen estimate: (5.035191333401401+0j)
  Error: 3.392671908475364e-05
Eigen estimate: (5.035220212468097+0j)
  Error: 5.047652388512347e-06
Eigen estimate: (5.035224503664774+0j)
  Error: 7.564557114037029e-07
Eigen estimate: (5.035225146573403+0j)
  Error: 1.1354708284727622e-07
Eigen estimate: (5.035225243070554+0j)
  Error: 1.704993124462817e-08
Eigen estimate: (5.035225257560112+0j)
  Error: 2.560373246751625e-09
Eigen estimate: (5.03522525973599+0j)
  Error: 3.8449599060187103e-10

Matrix generators

We will test some methods with two common matrices - the stiffness matrix and the mass matrix. Both are symmetric positive definite (SPD), but otherwise have very different properties. One matrix will be easy to solve using iterative methods, and one is much more challenging. To work with these matrices, we implement functions to help generate matrices of a given size.

Stiffness matrix

The stiffness matrix (this matrix comes up in the module 3D7) tends to arise in elliptic differential equations. We will start with a function to build a $n \times n$ stiffness matrix:


In [6]:
def create_stiffness_matrix(n):
    "Create a stiffness matrix of size n x n"

    # Create a zero matrix
    A = np.zeros((n + 1, n + 1))

    # Add components of small matrix k to A
    k = np.array([[1.0, -1.0], [-1.0, 1.0]])
    for i in range(len(A) - 1):
        A[i:i+2, i:i+2] += k

    # We will remove the last row and column of the matrix. This will
    # make the matrix invetible (SPD)
    A = np.delete(A, -1, 0)    
    A = np.delete(A, -1, 1)   
    return A

Mass matrix

The second matrix we will look at is the mass matrix. It appears commonly in unsteady problems and when performing projections (least-squares fits of functions).


In [7]:
def create_mass_matrix(n):
    "Create a mass matrix of size n x n"
    A = np.zeros((n, n))
    m = np.array([[1.0/3.0, 1.0/6.0], [1.0/6.0, 1.0/3.0]])
    for i in range(len(A) - 1):
        A[i:i+2, i:i+2] += m
    return A

Power iteration for the maximum eigenvalue

A very simple algorithm that you have seen before in Part IB is power iteration for estimating the maximum (absolute) eigenvalue and corresponding eigenvector.

To test power iterations, we will create a small stiffness matrix. The stiffness matrix has eigenvalues that are close, and which get closer for large $n$. To measure the error in the estimated eigenvalues and eigenvectors, we will first add a function to compute the maximum eigenpair of a matrix directly:


In [8]:
def max_eigenpair(A, print_values=False):
    "Compute the eigenpair for the largest eigenvalue. Matrix A is assumed symmetric"
    
    # Compute eigenpairs
    evals, evecs = np.linalg.eig(A)

    # Get index of largest absolute eigenvalue
    index = np.argmax(np.abs(evals))

    # Get largest eigenvalue and corresponding eigenvector
    eval_max = evals[index]
    evec_max = evecs[:, index]/np.linalg.norm(evecs[:, index])

    if print_values:
        print("  Largest eigenvalue:        {}".format(eval_max))

        # Get second largest eigenvalue to compare to largest
        eval1 = evals[np.argsort(abs(evals))[-2]]
        print("  Second largest eigenvalue: {}".format(eval1))
    
    return eval_max, evec_max

Computing the largest and second largest eigenvalues for $5 \times 5$ and $20 \times 20$ matrices, we see that they get closer as the matrix size increases:


In [9]:
# Check largest and second eigenvalues of 5x5 stiffness matrix
print("Stiffness matrix 5 x 5:")
max_eigenpair(create_stiffness_matrix(5), print_values=True);

# Check largest and second eigenvalues of 10x10 stiffness matrix
print("Stiffness matrix 20 x 20:")
max_eigenpair(create_stiffness_matrix(20), print_values=True);


Stiffness matrix 5 x 5:
  Largest eigenvalue:        3.682507065662362
  Second largest eigenvalue: 2.830830026003774
Stiffness matrix 20 x 20:
  Largest eigenvalue:        3.976560847560696
  Second largest eigenvalue: 3.9067927841098586

We will now apply power iteration to estimate the largest eigenvalue and corresponding eigenvalue. We know that convergence will be poor as the matrix size increases because the largest and second largest eigenvalues become a close, so we will test with the small $10 \times 10$ matrix.

We perform a power iteration with a random starting vector $\boldsymbol{u}_{0}$ and 10 iterations. At each iteration we compare the error in the eigenvalue estimated by

  • Scaling of the solution from one iterate to the next; and
  • Rayleigh quotient $R = \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x}/(\boldsymbol{x}^{T} \boldsymbol{x})$

We also compute the error in eigenvector as $\| \boldsymbol{u}_{\text{exact}} - \boldsymbol{u}_{k} \|_{2}$, where $\boldsymbol{u}_{k}$ is the estimate of the eigenvector at the $k$th iterate.


In [10]:
# Create 10x10 matrix
A = create_stiffness_matrix(10)

# Get exact eigenpair to compute errors
lambda_max, u_max = max_eigenpair(A)

# Create random starting vector and normalise
np.random.seed(3)
u0 = np.random.rand(A.shape[1])
u0 = u0/np.linalg.norm(u0)

# Perform power iteration
for k in range(10):
    print("Step: {}".format(k))

    # Compute u_{k+1} = A u_{k}
    u1 = A.dot(u0)

    # Estimate eigenvalue (from scaling of each entry and by Rayleigh quotient)
    lambda_est   = np.divide(u1, u0)
    rayleigh_est = u1.dot(A.dot(u1))/(u1.dot(u1))

    # Normalise estimated eigenvector and assign to u0
    u0 = u1/np.linalg.norm(u1)

    # Print errors in eigenvalue
    print("  Relative errors")    
    print("    lambda (scaling):  {}".format(np.abs(lambda_max - np.average(lambda_est))/lambda_max))    
    print("    lambda (Rayleigh): {}".format(np.abs(lambda_max - rayleigh_est)/lambda_max))    

    # Get signs on eigenvectors (could be pointing in opposite directions) and print error
    dir0, dir_est = abs(u_max[0])/u_max[0], abs(u0[0])/u0[0]
    print("    u (l2):            {}".format(np.linalg.norm(dir0*u_max - dir_est*u0)))


Step: 0
  Relative errors
    lambda (scaling):  1.3846461170031097
    lambda (Rayleigh): 0.24789749264192812
    u (l2):            0.8938636048884423
Step: 1
  Relative errors
    lambda (scaling):  0.18763883877592
    lambda (Rayleigh): 0.11066172928531573
    u (l2):            0.6957399551984818
Step: 2
  Relative errors
    lambda (scaling):  1.117183868329417
    lambda (Rayleigh): 0.05589080104444408
    u (l2):            0.5718890915883991
Step: 3
  Relative errors
    lambda (scaling):  0.08024176314636662
    lambda (Rayleigh): 0.033830427008141324
    u (l2):            0.49019067158072127
Step: 4
  Relative errors
    lambda (scaling):  0.0015328144400005845
    lambda (Rayleigh): 0.022882330375570328
    u (l2):            0.42960224198554464
Step: 5
  Relative errors
    lambda (scaling):  0.016645293524981022
    lambda (Rayleigh): 0.016410042012238304
    u (l2):            0.38139828601289943
Step: 6
  Relative errors
    lambda (scaling):  0.019597524593252195
    lambda (Rayleigh): 0.012185475274593358
    u (l2):            0.3416763590432398
Step: 7
  Relative errors
    lambda (scaling):  0.018849461794163157
    lambda (Rayleigh): 0.009274660485906536
    u (l2):            0.3082813158620984
Step: 8
  Relative errors
    lambda (scaling):  0.016774926604069294
    lambda (Rayleigh): 0.007200475054899693
    u (l2):            0.27980558082852414
Step: 9
  Relative errors
    lambda (scaling):  0.014296221565264394
    lambda (Rayleigh): 0.005685894387366269
    u (l2):            0.2552395728531014

Some observations:

  • The first observation is that the method is slow to converge. It will take close to 100 iterations to get a good estimate of the eigenvector.
  • The error in the estimate of the largest eigenvalue via the Rayleigh quotient decreases monotonically.
  • For a given error in the estimated eigenvector, the error in the estimated eigenvalue via the Rayleigh quotient is much smaller.

Stationary iterative methods for $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$

We now look at methods for the very important problem of finding (approximate) solutions to $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$. We looks first at simple stationary methods, and then the more elaborate conjugate gradient method.

Stationary methods

We consider methods that involve splitting $\boldsymbol{A}$ such that $\boldsymbol{A} = \boldsymbol{N} - \boldsymbol{P}$, and then solving

$$ \boldsymbol{x}_{k+1} = \boldsymbol{N}^{-1} \left(\boldsymbol{b} + \boldsymbol{P}\boldsymbol{x}_{k} \right) $$

We assume that $\boldsymbol{A}$ is hard/expensive to solve, and it is split such that $\boldsymbol{N}$ is easy/cheap to solve. We will experiment with the Jacobi and Gauss-Seidel methods.

Jacobi method

A very simple method is the Jacobi method, in which $\boldsymbol{N} = \text{diag}(\boldsymbol{A})$ and which is trivial to invert. The method becomes:

$$ \boldsymbol{x}_{k+1} = \text{diag}(\boldsymbol{A})^{-1} \boldsymbol{b} + (\boldsymbol{I} - \text{diag}(\boldsymbol{A})^{-1} \boldsymbol{A}) \boldsymbol{x}_{k} $$

We will test first with a $50 \times 50$ stiffness matrix.


In [11]:
A = create_stiffness_matrix(50)

To implement the Jacobi method, we first create the $\boldsymbol{N} = \boldsymbol{D} = \text{diag}(\boldsymbol{A})$ matrix and its inverse:


In [12]:
D = np.diag(np.diag(A))
Dinv = np.diag(1.0/np.diag(A))

We have seen in the lectures that the stationary methods will only converge if $\rho(\boldsymbol{N}^{-1} \boldsymbol{P}) < 1$, where $\rho$ is the spectral radius (recall that spectral radius of a matrix is the largest absolute eigenvalue). Let's compute the spectral radius for this problem:


In [13]:
N = D
P = N - A
evals = np.linalg.eigvals(Dinv.dot(P))
print("Spectral radius (rho) is: {}".format(np.max(abs(evals))))


Spectral radius (rho) is: 0.9995065603657343

The largest eigenvalue is less that one (only just!), so we can expect the Jacobi method to converge. However, it is close to one so we expect the convergence to be slow.

Experiment: compare the spectral radius for different size matrices.

Let's try solving $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$ with Jacobi's method using 15 iterations and with $\boldsymbol{b} = \boldsymbol{1}$:


In [14]:
I = np.identity(A.shape[0])
b = np.ones(A.shape[1])
x = np.zeros(A.shape[1])
r0_norm = np.linalg.norm(b - A.dot(x), 2)
for k in range(15):
    x = Dinv.dot(b) + (I - Dinv.dot(A)).dot(x)
    r = b - A.dot(x)
    print("Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))


Step: 1, relative norm of residual (l2): 0.9974968671630001
Step: 2, relative norm of residual (l2): 0.9893179468704688
Step: 3, relative norm of residual (l2): 0.9886859966642594
Step: 4, relative norm of residual (l2): 0.983258295159517
Step: 5, relative norm of residual (l2): 0.982741700550048
Step: 6, relative norm of residual (l2): 0.9783895409038262
Step: 7, relative norm of residual (l2): 0.9779302921220918
Step: 8, relative norm of residual (l2): 0.9741919666025627
Step: 9, relative norm of residual (l2): 0.9737713236793906
Step: 10, relative norm of residual (l2): 0.9704418741121353
Step: 11, relative norm of residual (l2): 0.9700501214675351
Step: 12, relative norm of residual (l2): 0.9670176688180705
Step: 13, relative norm of residual (l2): 0.9666487837375989
Step: 14, relative norm of residual (l2): 0.9638444533367624
Step: 15, relative norm of residual (l2): 0.9634943337370382

We can see that the residual is decreasing, but extremely slowly. This is to be expected because $\rho(\boldsymbol{N}^{-1} \boldsymbol{P})$ is very close to one.

We now try with the mass matrix. For this we collect the above steps in a function, and then call the function for a $50 \times 50$ mass matrix.


In [15]:
def jacobi_solver(A):
    N = np.diag(np.diag(A))
    Dinv = np.diag(1.0/np.diag(A))    
    P = N - A
    evals = np.linalg.eigvals(Dinv.dot(P))
    print("Spectral radius (rho) is: {}".format(np.max(abs(evals))))
    
    I = np.identity(A.shape[0])

    b = np.ones(A.shape[1])
    x = np.zeros(A.shape[1])
    r0_norm = np.linalg.norm(b - A.dot(x), 2)
    for k in range(15):
        x = Dinv.dot(b) + (I - Dinv.dot(A)).dot(x)
        r = b - A.dot(x)
        print("  Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))
        
jacobi_solver(create_mass_matrix(50) )


Spectral radius (rho) is: 0.49999999999999944
  Step: 1, relative norm of residual (l2): 0.5049752469181039
  Step: 2, relative norm of residual (l2): 0.25062422069704277
  Step: 3, relative norm of residual (l2): 0.1260115322103497
  Step: 4, relative norm of residual (l2): 0.06270474276819099
  Step: 5, relative norm of residual (l2): 0.03148229095149664
  Step: 6, relative norm of residual (l2): 0.015681811999896492
  Step: 7, relative norm of residual (l2): 0.007867732312336664
  Step: 8, relative norm of residual (l2): 0.003921281978712691
  Step: 9, relative norm of residual (l2): 0.001966467523415862
  Step: 10, relative norm of residual (l2): 0.0009804592395103694
  Step: 11, relative norm of residual (l2): 0.0004915330591380955
  Step: 12, relative norm of residual (l2): 0.00024513992219384056
  Step: 13, relative norm of residual (l2): 0.00012286724847633643
  Step: 14, relative norm of residual (l2): 6.128977282684386e-05
  Step: 15, relative norm of residual (l2): 3.0713623368082544e-05

The spectral radius is considerably small than one and much smaller than it was for the stiffness matrix, which is then evident in the much faster convergence.

Gauss-Seidel

For the Gauss-Seidel, method, $\boldsymbol{N}$ is the lower triangular part of $\boldsymbol{A}$ and $\boldsymbol{P}$ is the strictly upper triangular part of $\boldsymbol{A}$ ($\boldsymbol{P}$ is zero on the diagonal). Solving $\boldsymbol{N} \boldsymbol{y} = \boldsymbol{f}$ is then analogous to the forward substitution step in a LU solver.

From the general expression for a stationary method, we multiply both sides by $\boldsymbol{N}$:

$$ \boldsymbol{N} \boldsymbol{x}_{k+1} = \boldsymbol{b} + \boldsymbol{P}\boldsymbol{x}_{k} $$

We will solve our problem using this formulation. We create a $50 \times 50$ stiffness matrix, and then form the $\boldsymbol{N}$ and $\boldsymbol{P}$ matrices:


In [16]:
# Create etiffness matrix
A = create_stiffness_matrix(50)

# Form lower-triangular part of A
N = np.tril(A)

# Form P
P = N - A

We now check the spectral radius for stiffness matrix and the Gauss-Seidel method:


In [17]:
evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))


Largest eigenvalue (rho) is: 0.9990133642141358

The spectral radius is very slightly smaller that for the Jacobi case, but still very close to one so we cannot expect good convergence.

Now the solver. We write a function to perform 15 Gauss-Seidel iterations, using SciPy for the forward substitution step involving $\boldsymbol{N}$:


In [18]:
import scipy.linalg as LA

def gauss_seidel_solver(A):
    "Compute spectral radius and perform 15 Gauss-Seidel iterations"
    N = np.tril(A)
    P = N - A

    evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
    print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))
    
    b = np.ones(A.shape[1])
    x = np.zeros(A.shape[1])
    r0_norm = np.linalg.norm(b - A.dot(x), 2)
    for k in range(15):
        c = b + P.dot(x)
        x = LA.solve_triangular(N, c, lower=True)

        # Compute residual to monitor convergence
        r = b - A.dot(x)
        print("Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))

Calling the function for a $50 \times 50$ stiffness matrix:


In [19]:
gauss_seidel_solver(create_stiffness_matrix(50))


Largest eigenvalue (rho) is: 0.9990133642141358
Step: 1, relative norm of residual (l2): 0.9899494936611665
Step: 2, relative norm of residual (l2): 0.982344135219425
Step: 3, relative norm of residual (l2): 0.9767612297793151
Step: 4, relative norm of residual (l2): 0.9721512870947607
Step: 5, relative norm of residual (l2): 0.9681229069134895
Step: 6, relative norm of residual (l2): 0.9644930726589129
Step: 7, relative norm of residual (l2): 0.9611590370425309
Step: 8, relative norm of residual (l2): 0.9580560147508117
Step: 9, relative norm of residual (l2): 0.9551399401005148
Step: 10, relative norm of residual (l2): 0.9523791996623238
Step: 11, relative norm of residual (l2): 0.949750191687984
Step: 12, relative norm of residual (l2): 0.9472347395577747
Step: 13, relative norm of residual (l2): 0.9448184904859701
Step: 14, relative norm of residual (l2): 0.9424898755792793
Step: 15, relative norm of residual (l2): 0.9402394077309429

We can see from the decrease in the residual that the Gauss-Seidel method converges somewhat faster than the Jacobi method, but it is still far too slow to be of any practical use on its own.

As for the Jacobi method, we will collect the above steps in sude a function and test for the $50\times 50$ mass matrix.


In [20]:
def gauss_seidel_solver(A):
    "Compute spectral radius and perform 15 Gauss-Seidel iterations"
    N = np.tril(A)
    P = N - A

    evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
    print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))
    
    b = np.ones(A.shape[1])
    x = np.zeros(A.shape[1])
    r0_norm = np.linalg.norm(b - A.dot(x), 2)
    for k in range(15):
        c = b + P.dot(x)
        x = LA.solve_triangular(N, c, lower=True)

        # Compute residual to monitor convergence
        r = b - A.dot(x)
        print("  Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))

gauss_seidel_solver(create_mass_matrix(50))


Largest eigenvalue (rho) is: 0.24999999999999994
  Step: 1, relative norm of residual (l2): 0.20307634032550426
  Step: 2, relative norm of residual (l2): 0.04128125078854402
  Step: 3, relative norm of residual (l2): 0.00856896620705433
  Step: 4, relative norm of residual (l2): 0.001894343129323628
  Step: 5, relative norm of residual (l2): 0.0004564612202658296
  Step: 6, relative norm of residual (l2): 0.0001230203863175529
  Step: 7, relative norm of residual (l2): 3.606056649858689e-05
  Step: 8, relative norm of residual (l2): 1.1153287455621036e-05
  Step: 9, relative norm of residual (l2): 3.538752507725012e-06
  Step: 10, relative norm of residual (l2): 1.1378279383620941e-06
  Step: 11, relative norm of residual (l2): 3.682601606194006e-07
  Step: 12, relative norm of residual (l2): 1.1967638560760684e-07
  Step: 13, relative norm of residual (l2): 3.8996251724102795e-08
  Step: 14, relative norm of residual (l2): 1.2732978973882517e-08
  Step: 15, relative norm of residual (l2): 4.164335777779275e-09

The spectral radius is only half that for the Jacobi method, and this is manifest in the faster observed convergence.

Applications of stationary methods

Stationary methods are of limited use of their own as they tend to converge very slowly. For matrices that they do work well for, e.g. mass matrix, there are other iterative methods that are faster.

Stationary methods are however very useful in combination with other methods, e.g. as preconditioners in more sophisticated iterative methods and as 'smoothers' in multigrid methods.

Conjugate gradient method for $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$

We now look at one of the most important algorithms for solving $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$, the conjugate gradient (CG) method. The CG method is applicable to symmetric positive-definite (SPD) matrices.

The CG method is technically a direct method since in exact arithmetic is solves an $n \times n$ system in $k$ iterations, where $k$ is the number of distinct eigenvalues of $\boldsymbol{A}$. In the worst case (all eigenvalues are distinct) it requires $n$ iterations.

In practice, the CG method is used as an iterative method to find approximate solutions. If solving an $n \times n$ system required $n$ steps it would generally not be competitive with other methods, and round-off error can spoil the party.

The error in the CG method decreases monotonically in the energy norm, i.e. if $\boldsymbol{x}$ is the exact solution then:

$$ (\boldsymbol{x} - \boldsymbol{x}_{k+1})^{T} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{x}_{k+1}) < (\boldsymbol{x} - \boldsymbol{x}_{k})^{T} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{x}_{k}) $$

This implies that if we stop iterating before the residual is (almost) zero, we will have a better solution than we started with.

Below we create below a function for the conjugate gradient (CG) method. It will also solve the system directly so we can compare the error at each iteration (this would of course not be done in practice, but we do it here to illustrate some properties of the CG method). The function will perform a maximum of 200 iterations, but will exist sooner if the residual is very small.


In [21]:
def cg(A):
    "Conjugate gradient solver"
    
    # RHS vector
    b = np.ones(A.shape[1])

    # Initial guess (zero vector)
    x0 = np.zeros(A.shape[1])

    # Compute exact solution (to use in computing error at each iterate)
    x_exact = np.linalg.solve(A, b)
    e = x_exact - x0
    e0_norm = np.sqrt(e.dot(A.dot(e)))
    print("Initial error (A-norm): {}".format(e.dot(A.dot(e)))) 

    # Convergence tolerance to exit solver
    tolerance = 1.0e-9

    # Create starting vectors
    r0 = b - A.dot(x0)
    r0_norm = np.linalg.norm(r0)
    p0 = r0.copy()

    # Start iterations    
    for k in range(200):
        print("Step: {}".format(k))

        alpha = r0.dot(r0)/(p0.dot(A.dot(p0)))
        x1 = x0 + alpha*p0
        r1 = r0 - alpha*A.dot(p0)

        # Compute error in x (this is for studying the algorithm, and of 
        # course would not be done in practice)
        e = x_exact - x1
        e_norm = np.sqrt(e.dot(A.dot(e)))
        print("  Relative error in x (A-norm): {}".format(e_norm/e0_norm)) 
    
        # Compute norm of residual and check for converge
        r_norm = np.linalg.norm(r1) 
        print("  Relative residual (l2-norm):  {}".format(r_norm/r0_norm)) 
        if r_norm < tolerance:
            break

        beta = r1.dot(r1)/r0.dot(r0)
        p1 = r1 + beta*p0    
    
        # Update for next step
        p0, r0, x0 = p1, r1, x1

Below we solve the $50 \times 50$ stiffness matrix problem with the conjugate gradient method. The iterations terminate once the residual drops below a specified threshold. The solver prints at each iteration the relative error in the solution in the $\boldsymbol{A}$-norm and the relative residual in the $l_{2}$ norm.


In [22]:
cg(create_stiffness_matrix(50))


Initial error (A-norm): 42925.0
Step: 0
  Relative error in x (A-norm): 0.9704426215756036
  Relative residual (l2-norm):  7.0
Step: 1
  Relative error in x (A-norm): 0.9411822946820383
  Relative residual (l2-norm):  6.858571279792899
Step: 2
  Relative error in x (A-norm): 0.9122220657617681
  Relative residual (l2-norm):  6.717142249498665
Step: 3
  Relative error in x (A-norm): 0.883565076944029
  Relative residual (l2-norm):  6.575712889109439
Step: 4
  Relative error in x (A-norm): 0.8552145711607609
  Relative residual (l2-norm):  6.434283176858164
Step: 5
  Relative error in x (A-norm): 0.8271738976538208
  Relative residual (l2-norm):  6.292853089020909
Step: 6
  Relative error in x (A-norm): 0.799446517912807
  Relative residual (l2-norm):  6.151422599691879
Step: 7
  Relative error in x (A-norm): 0.7720360120877756
  Relative residual (l2-norm):  6.009991680526688
Step: 8
  Relative error in x (A-norm): 0.7449460859268152
  Relative residual (l2-norm):  5.868560300448483
Step: 9
  Relative error in x (A-norm): 0.7181805782950318
  Relative residual (l2-norm):  5.727128425310541
Step: 10
  Relative error in x (A-norm): 0.6917434693391226
  Relative residual (l2-norm):  5.585696017507576
Step: 11
  Relative error in x (A-norm): 0.6656388893705926
  Relative residual (l2-norm):  5.444263035526479
Step: 12
  Relative error in x (A-norm): 0.6398711285510341
  Relative residual (l2-norm):  5.302829433425141
Step: 13
  Relative error in x (A-norm): 0.614444647475032
  Relative residual (l2-norm):  5.161395160225576
Step: 14
  Relative error in x (A-norm): 0.5893640887605548
  Relative residual (l2-norm):  5.019960159204453
Step: 15
  Relative error in x (A-norm): 0.5646342897735833
  Relative residual (l2-norm):  4.878524367060186
Step: 16
  Relative error in x (A-norm): 0.5402602966337725
  Relative residual (l2-norm):  4.737087712930804
Step: 17
  Relative error in x (A-norm): 0.5162473796718534
  Relative residual (l2-norm):  4.595650117230423
Step: 18
  Relative error in x (A-norm): 0.4926010505381244
  Relative residual (l2-norm):  4.454211490264018
Step: 19
  Relative error in x (A-norm): 0.46932708119589217
  Relative residual (l2-norm):  4.3127717305695645
Step: 20
  Relative error in x (A-norm): 0.4464315250755107
  Relative residual (l2-norm):  4.171330722922842
Step: 21
  Relative error in x (A-norm): 0.4239207407155879
  Relative residual (l2-norm):  4.029888335921976
Step: 22
  Relative error in x (A-norm): 0.4018014182803376
  Relative residual (l2-norm):  3.888444419044716
Step: 23
  Relative error in x (A-norm): 0.38008060941907756
  Relative residual (l2-norm):  3.7469987990390385
Step: 24
  Relative error in x (A-norm): 0.358765761029573
  Relative residual (l2-norm):  3.605551275463989
Step: 25
  Relative error in x (A-norm): 0.33786475360676677
  Relative residual (l2-norm):  3.4641016151377544
Step: 26
  Relative error in x (A-norm): 0.3173859450096858
  Relative residual (l2-norm):  3.3226495451672298
Step: 27
  Relative error in x (A-norm): 0.2973382206719067
  Relative residual (l2-norm):  3.181194744117373
Step: 28
  Relative error in x (A-norm): 0.2777310515284708
  Relative residual (l2-norm):  3.0397368307141326
Step: 29
  Relative error in x (A-norm): 0.25857456125348366
  Relative residual (l2-norm):  2.8982753492378874
Step: 30
  Relative error in x (A-norm): 0.23987960482441265
  Relative residual (l2-norm):  2.756809750418044
Step: 31
  Relative error in x (A-norm): 0.22165786098936555
  Relative residual (l2-norm):  2.6153393661244038
Step: 32
  Relative error in x (A-norm): 0.2039219419676107
  Relative residual (l2-norm):  2.473863375370596
Step: 33
  Relative error in x (A-norm): 0.18668552474285424
  Relative residual (l2-norm):  2.33238075793812
Step: 34
  Relative error in x (A-norm): 0.16996350973611107
  Relative residual (l2-norm):  2.1908902300206643
Step: 35
  Relative error in x (A-norm): 0.1537722146591016
  Relative residual (l2-norm):  2.0493901531919194
Step: 36
  Relative error in x (A-norm): 0.13812961424681666
  Relative residual (l2-norm):  1.9078784028338913
Step: 37
  Relative error in x (A-norm): 0.12305564082829443
  Relative residual (l2-norm):  1.7663521732655694
Step: 38
  Relative error in x (A-norm): 0.10857256711363478
  Relative residual (l2-norm):  1.624807680927192
Step: 39
  Relative error in x (A-norm): 0.09470550251879972
  Relative residual (l2-norm):  1.4832396974191324
Step: 40
  Relative error in x (A-norm): 0.08148305025070908
  Relative residual (l2-norm):  1.3416407864998738
Step: 41
  Relative error in x (A-norm): 0.06893819875457113
  Relative residual (l2-norm):  1.2
Step: 42
  Relative error in x (A-norm): 0.05710956680671217
  Relative residual (l2-norm):  1.058300524425836
Step: 43
  Relative error in x (A-norm): 0.04604320474893888
  Relative residual (l2-norm):  0.916515138991168
Step: 44
  Relative error in x (A-norm): 0.03579531535059218
  Relative residual (l2-norm):  0.7745966692414834
Step: 45
  Relative error in x (A-norm): 0.026436592419478728
  Relative residual (l2-norm):  0.6324555320336759
Step: 46
  Relative error in x (A-norm): 0.01805963072947595
  Relative residual (l2-norm):  0.4898979485566356
Step: 47
  Relative error in x (A-norm): 0.01079269366094211
  Relative residual (l2-norm):  0.3464101615137754
Step: 48
  Relative error in x (A-norm): 0.004826639337239525
  Relative residual (l2-norm):  0.2
Step: 49
  Relative error in x (A-norm): 0.0
  Relative residual (l2-norm):  0.0

We see that, as expected, the CG method for an $n \times n$ matrix solves the problem in $n$ iterations. However if, if we were satisfied with a less accurate solution we could have terminated the iterations sooner.

Note that the reduction of the error in $\boldsymbol{A}$-norm ($\sqrt{\boldsymbol{e}^{T}\boldsymbol{A}\boldsymbol{e}}$) decreases monotonically. The $l_{2}$-norm of the residual $\boldsymbol{r}_{k} = \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}_{k}$ increases in the first step (the relative norm of the residual is greater than one), and then starts to decrease. The monotonic convergence in the $\boldsymbol{A}$-norm is what we expect from analysis of the CG method.

We now test for the $50 \times 50$ mass matrix:


In [23]:
cg(create_mass_matrix(50))


Initial error (A-norm): 52.73205080756888
Step: 0
  Relative error in x (A-norm): 0.18016449890252687
  Relative residual (l2-norm):  0.09997917317482359
Step: 1
  Relative error in x (A-norm): 0.0655801444866632
  Relative residual (l2-norm):  0.04946248473461363
Step: 2
  Relative error in x (A-norm): 0.0181218413344505
  Relative residual (l2-norm):  0.014103099535049533
Step: 3
  Relative error in x (A-norm): 0.004864134800672649
  Relative residual (l2-norm):  0.0037942074598772625
Step: 4
  Relative error in x (A-norm): 0.0013027052574813938
  Relative residual (l2-norm):  0.001016310514848461
Step: 5
  Relative error in x (A-norm): 0.0003488109735467035
  Relative residual (l2-norm):  0.00027212380386149157
Step: 6
  Relative error in x (A-norm): 9.338892122885745e-05
  Relative residual (l2-norm):  7.285536671029193e-05
Step: 7
  Relative error in x (A-norm): 2.5001179307324556e-05
  Relative residual (l2-norm):  1.9503576950218716e-05
Step: 8
  Relative error in x (A-norm): 6.692349938843105e-06
  Relative residual (l2-norm):  5.220566613372581e-06
Step: 9
  Relative error in x (A-norm): 1.791186177458801e-06
  Relative residual (l2-norm):  1.3972110120425353e-06
Step: 10
  Relative error in x (A-norm): 4.793305996763896e-07
  Relative residual (l2-norm):  3.7388229415470185e-07
Step: 11
  Relative error in x (A-norm): 1.2824689935057262e-07
  Relative residual (l2-norm):  1.0002766413163705e-07
Step: 12
  Relative error in x (A-norm): 3.430488952248435e-08
  Relative residual (l2-norm):  2.67544587273984e-08
Step: 13
  Relative error in x (A-norm): 9.173516172674862e-09
  Relative residual (l2-norm):  7.153751393309936e-09
Step: 14
  Relative error in x (A-norm): 2.452162828314951e-09
  Relative residual (l2-norm):  1.9120205550942666e-09
Step: 15
  Relative error in x (A-norm): 6.551547166325977e-10
  Relative residual (l2-norm):  5.107567583615517e-10
Step: 16
  Relative error in x (A-norm): 1.7492108619935422e-10
  Relative residual (l2-norm):  1.3633647839665992e-10

The CG methods converges rapidly for the mass matrix. It has terminated for the mass matrix after just 17 iterations, fewer than the size of the matrix, and compared to the 50 for the stiffness matrix.

Exercise: Examine the condition number $\kappa_{2}$ for the mass and stiffness matrices as the matrix size increases.

Exercise: Test how the number of CG iterations changes for the mass and stiffness matrices as the matrix size is increased.