We explore in this notebook some iterative techniques for linear algebra problems. The implementations are written to expose how the methods work and are not optimised. Production-level implementations typically involve some specialised optimisations.
This notebook illustrates:
In [1]:
import numpy as np
In [2]:
A = np.array([[3 + 1j, 4 - 2j, 1], [0, 0, 2], [1, 2, 1]])
print(A)
Compute eigenvalues directly:
In [3]:
evals = np.linalg.eigvals(A)
print("Exact evals: {}".format(evals))
Perform power iterartions for the non-symmetric case:
In [4]:
x0 = np.ones(3) + 1j*np.ones(3)
print(x0)
for i in range(10):
x0 = A.dot(x0)
x0/np.linalg.norm(x0, 1)
eig_est = np.vdot(x0, A.dot(x0)) / np.vdot(x0, x0)
print("Eigen estimate: {}".format(eig_est))
print(" Error: {}".format(np.abs(eig_est - evals[0])))
Now make the matrix Hermitian and perform power iterations:
In [5]:
A = 0.5*(A + A.T.conjugate())
print(A)
evals = np.linalg.eigvals(A)
print("Exact evals: {}".format(evals))
x0 = np.ones(3) + 1j*np.ones(3)
for i in range(10):
x0 = A.dot(x0)
x0/np.linalg.norm(x0, 1)
eig_est = np.vdot(x0, A.dot(x0)) / np.vdot(x0, x0)
print("Eigen estimate: {}".format(eig_est))
print(" Error: {}".format(np.abs(eig_est - evals[0])))
We will test some methods with two common matrices - the stiffness matrix and the mass matrix. Both are symmetric positive definite (SPD), but otherwise have very different properties. One matrix will be easy to solve using iterative methods, and one is much more challenging. To work with these matrices, we implement functions to help generate matrices of a given size.
The stiffness matrix (this matrix comes up in the module 3D7) tends to arise in elliptic differential equations. We will start with a function to build a $n \times n$ stiffness matrix:
In [6]:
def create_stiffness_matrix(n):
"Create a stiffness matrix of size n x n"
# Create a zero matrix
A = np.zeros((n + 1, n + 1))
# Add components of small matrix k to A
k = np.array([[1.0, -1.0], [-1.0, 1.0]])
for i in range(len(A) - 1):
A[i:i+2, i:i+2] += k
# We will remove the last row and column of the matrix. This will
# make the matrix invetible (SPD)
A = np.delete(A, -1, 0)
A = np.delete(A, -1, 1)
return A
The second matrix we will look at is the mass matrix. It appears commonly in unsteady problems and when performing projections (least-squares fits of functions).
In [7]:
def create_mass_matrix(n):
"Create a mass matrix of size n x n"
A = np.zeros((n, n))
m = np.array([[1.0/3.0, 1.0/6.0], [1.0/6.0, 1.0/3.0]])
for i in range(len(A) - 1):
A[i:i+2, i:i+2] += m
return A
A very simple algorithm that you have seen before in Part IB is power iteration for estimating the maximum (absolute) eigenvalue and corresponding eigenvector.
To test power iterations, we will create a small stiffness matrix. The stiffness matrix has eigenvalues that are close, and which get closer for large $n$. To measure the error in the estimated eigenvalues and eigenvectors, we will first add a function to compute the maximum eigenpair of a matrix directly:
In [8]:
def max_eigenpair(A, print_values=False):
"Compute the eigenpair for the largest eigenvalue. Matrix A is assumed symmetric"
# Compute eigenpairs
evals, evecs = np.linalg.eig(A)
# Get index of largest absolute eigenvalue
index = np.argmax(np.abs(evals))
# Get largest eigenvalue and corresponding eigenvector
eval_max = evals[index]
evec_max = evecs[:, index]/np.linalg.norm(evecs[:, index])
if print_values:
print(" Largest eigenvalue: {}".format(eval_max))
# Get second largest eigenvalue to compare to largest
eval1 = evals[np.argsort(abs(evals))[-2]]
print(" Second largest eigenvalue: {}".format(eval1))
return eval_max, evec_max
Computing the largest and second largest eigenvalues for $5 \times 5$ and $20 \times 20$ matrices, we see that they get closer as the matrix size increases:
In [9]:
# Check largest and second eigenvalues of 5x5 stiffness matrix
print("Stiffness matrix 5 x 5:")
max_eigenpair(create_stiffness_matrix(5), print_values=True);
# Check largest and second eigenvalues of 10x10 stiffness matrix
print("Stiffness matrix 20 x 20:")
max_eigenpair(create_stiffness_matrix(20), print_values=True);
We will now apply power iteration to estimate the largest eigenvalue and corresponding eigenvalue. We know that convergence will be poor as the matrix size increases because the largest and second largest eigenvalues become a close, so we will test with the small $10 \times 10$ matrix.
We perform a power iteration with a random starting vector $\boldsymbol{u}_{0}$ and 10 iterations. At each iteration we compare the error in the eigenvalue estimated by
We also compute the error in eigenvector as $\| \boldsymbol{u}_{\text{exact}} - \boldsymbol{u}_{k} \|_{2}$, where $\boldsymbol{u}_{k}$ is the estimate of the eigenvector at the $k$th iterate.
In [10]:
# Create 10x10 matrix
A = create_stiffness_matrix(10)
# Get exact eigenpair to compute errors
lambda_max, u_max = max_eigenpair(A)
# Create random starting vector and normalise
np.random.seed(3)
u0 = np.random.rand(A.shape[1])
u0 = u0/np.linalg.norm(u0)
# Perform power iteration
for k in range(10):
print("Step: {}".format(k))
# Compute u_{k+1} = A u_{k}
u1 = A.dot(u0)
# Estimate eigenvalue (from scaling of each entry and by Rayleigh quotient)
lambda_est = np.divide(u1, u0)
rayleigh_est = u1.dot(A.dot(u1))/(u1.dot(u1))
# Normalise estimated eigenvector and assign to u0
u0 = u1/np.linalg.norm(u1)
# Print errors in eigenvalue
print(" Relative errors")
print(" lambda (scaling): {}".format(np.abs(lambda_max - np.average(lambda_est))/lambda_max))
print(" lambda (Rayleigh): {}".format(np.abs(lambda_max - rayleigh_est)/lambda_max))
# Get signs on eigenvectors (could be pointing in opposite directions) and print error
dir0, dir_est = abs(u_max[0])/u_max[0], abs(u0[0])/u0[0]
print(" u (l2): {}".format(np.linalg.norm(dir0*u_max - dir_est*u0)))
Some observations:
We now look at methods for the very important problem of finding (approximate) solutions to $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$. We looks first at simple stationary methods, and then the more elaborate conjugate gradient method.
We consider methods that involve splitting $\boldsymbol{A}$ such that $\boldsymbol{A} = \boldsymbol{N} - \boldsymbol{P}$, and then solving
$$ \boldsymbol{x}_{k+1} = \boldsymbol{N}^{-1} \left(\boldsymbol{b} + \boldsymbol{P}\boldsymbol{x}_{k} \right) $$We assume that $\boldsymbol{A}$ is hard/expensive to solve, and it is split such that $\boldsymbol{N}$ is easy/cheap to solve. We will experiment with the Jacobi and Gauss-Seidel methods.
A very simple method is the Jacobi method, in which $\boldsymbol{N} = \text{diag}(\boldsymbol{A})$ and which is trivial to invert. The method becomes:
We will test first with a $50 \times 50$ stiffness matrix.
In [11]:
A = create_stiffness_matrix(50)
To implement the Jacobi method, we first create the $\boldsymbol{N} = \boldsymbol{D} = \text{diag}(\boldsymbol{A})$ matrix and its inverse:
In [12]:
D = np.diag(np.diag(A))
Dinv = np.diag(1.0/np.diag(A))
We have seen in the lectures that the stationary methods will only converge if $\rho(\boldsymbol{N}^{-1} \boldsymbol{P}) < 1$, where $\rho$ is the spectral radius (recall that spectral radius of a matrix is the largest absolute eigenvalue). Let's compute the spectral radius for this problem:
In [13]:
N = D
P = N - A
evals = np.linalg.eigvals(Dinv.dot(P))
print("Spectral radius (rho) is: {}".format(np.max(abs(evals))))
The largest eigenvalue is less that one (only just!), so we can expect the Jacobi method to converge. However, it is close to one so we expect the convergence to be slow.
Experiment: compare the spectral radius for different size matrices.
Let's try solving $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$ with Jacobi's method using 15 iterations and with $\boldsymbol{b} = \boldsymbol{1}$:
In [14]:
I = np.identity(A.shape[0])
b = np.ones(A.shape[1])
x = np.zeros(A.shape[1])
r0_norm = np.linalg.norm(b - A.dot(x), 2)
for k in range(15):
x = Dinv.dot(b) + (I - Dinv.dot(A)).dot(x)
r = b - A.dot(x)
print("Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))
We can see that the residual is decreasing, but extremely slowly. This is to be expected because $\rho(\boldsymbol{N}^{-1} \boldsymbol{P})$ is very close to one.
We now try with the mass matrix. For this we collect the above steps in a function, and then call the function for a $50 \times 50$ mass matrix.
In [15]:
def jacobi_solver(A):
N = np.diag(np.diag(A))
Dinv = np.diag(1.0/np.diag(A))
P = N - A
evals = np.linalg.eigvals(Dinv.dot(P))
print("Spectral radius (rho) is: {}".format(np.max(abs(evals))))
I = np.identity(A.shape[0])
b = np.ones(A.shape[1])
x = np.zeros(A.shape[1])
r0_norm = np.linalg.norm(b - A.dot(x), 2)
for k in range(15):
x = Dinv.dot(b) + (I - Dinv.dot(A)).dot(x)
r = b - A.dot(x)
print(" Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))
jacobi_solver(create_mass_matrix(50) )
The spectral radius is considerably small than one and much smaller than it was for the stiffness matrix, which is then evident in the much faster convergence.
For the Gauss-Seidel, method, $\boldsymbol{N}$ is the lower triangular part of $\boldsymbol{A}$ and $\boldsymbol{P}$ is the strictly upper triangular part of $\boldsymbol{A}$ ($\boldsymbol{P}$ is zero on the diagonal). Solving $\boldsymbol{N} \boldsymbol{y} = \boldsymbol{f}$ is then analogous to the forward substitution step in a LU solver.
From the general expression for a stationary method, we multiply both sides by $\boldsymbol{N}$:
$$ \boldsymbol{N} \boldsymbol{x}_{k+1} = \boldsymbol{b} + \boldsymbol{P}\boldsymbol{x}_{k} $$We will solve our problem using this formulation. We create a $50 \times 50$ stiffness matrix, and then form the $\boldsymbol{N}$ and $\boldsymbol{P}$ matrices:
In [16]:
# Create etiffness matrix
A = create_stiffness_matrix(50)
# Form lower-triangular part of A
N = np.tril(A)
# Form P
P = N - A
We now check the spectral radius for stiffness matrix and the Gauss-Seidel method:
In [17]:
evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))
The spectral radius is very slightly smaller that for the Jacobi case, but still very close to one so we cannot expect good convergence.
Now the solver. We write a function to perform 15 Gauss-Seidel iterations, using SciPy for the forward substitution step involving $\boldsymbol{N}$:
In [18]:
import scipy.linalg as LA
def gauss_seidel_solver(A):
"Compute spectral radius and perform 15 Gauss-Seidel iterations"
N = np.tril(A)
P = N - A
evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))
b = np.ones(A.shape[1])
x = np.zeros(A.shape[1])
r0_norm = np.linalg.norm(b - A.dot(x), 2)
for k in range(15):
c = b + P.dot(x)
x = LA.solve_triangular(N, c, lower=True)
# Compute residual to monitor convergence
r = b - A.dot(x)
print("Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))
Calling the function for a $50 \times 50$ stiffness matrix:
In [19]:
gauss_seidel_solver(create_stiffness_matrix(50))
We can see from the decrease in the residual that the Gauss-Seidel method converges somewhat faster than the Jacobi method, but it is still far too slow to be of any practical use on its own.
As for the Jacobi method, we will collect the above steps in sude a function and test for the $50\times 50$ mass matrix.
In [20]:
def gauss_seidel_solver(A):
"Compute spectral radius and perform 15 Gauss-Seidel iterations"
N = np.tril(A)
P = N - A
evals = np.linalg.eigvals(np.linalg.inv(N).dot(P))
print("Largest eigenvalue (rho) is: {}".format(np.max(abs(evals))))
b = np.ones(A.shape[1])
x = np.zeros(A.shape[1])
r0_norm = np.linalg.norm(b - A.dot(x), 2)
for k in range(15):
c = b + P.dot(x)
x = LA.solve_triangular(N, c, lower=True)
# Compute residual to monitor convergence
r = b - A.dot(x)
print(" Step: {}, relative norm of residual (l2): {}".format(k + 1, np.linalg.norm(r, 2)/r0_norm))
gauss_seidel_solver(create_mass_matrix(50))
The spectral radius is only half that for the Jacobi method, and this is manifest in the faster observed convergence.
Stationary methods are of limited use of their own as they tend to converge very slowly. For matrices that they do work well for, e.g. mass matrix, there are other iterative methods that are faster.
Stationary methods are however very useful in combination with other methods, e.g. as preconditioners in more sophisticated iterative methods and as 'smoothers' in multigrid methods.
We now look at one of the most important algorithms for solving $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$, the conjugate gradient (CG) method. The CG method is applicable to symmetric positive-definite (SPD) matrices.
The CG method is technically a direct method since in exact arithmetic is solves an $n \times n$ system in $k$ iterations, where $k$ is the number of distinct eigenvalues of $\boldsymbol{A}$. In the worst case (all eigenvalues are distinct) it requires $n$ iterations.
In practice, the CG method is used as an iterative method to find approximate solutions. If solving an $n \times n$ system required $n$ steps it would generally not be competitive with other methods, and round-off error can spoil the party.
The error in the CG method decreases monotonically in the energy norm, i.e. if $\boldsymbol{x}$ is the exact solution then:
$$ (\boldsymbol{x} - \boldsymbol{x}_{k+1})^{T} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{x}_{k+1}) < (\boldsymbol{x} - \boldsymbol{x}_{k})^{T} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{x}_{k}) $$This implies that if we stop iterating before the residual is (almost) zero, we will have a better solution than we started with.
Below we create below a function for the conjugate gradient (CG) method. It will also solve the system directly so we can compare the error at each iteration (this would of course not be done in practice, but we do it here to illustrate some properties of the CG method). The function will perform a maximum of 200 iterations, but will exist sooner if the residual is very small.
In [21]:
def cg(A):
"Conjugate gradient solver"
# RHS vector
b = np.ones(A.shape[1])
# Initial guess (zero vector)
x0 = np.zeros(A.shape[1])
# Compute exact solution (to use in computing error at each iterate)
x_exact = np.linalg.solve(A, b)
e = x_exact - x0
e0_norm = np.sqrt(e.dot(A.dot(e)))
print("Initial error (A-norm): {}".format(e.dot(A.dot(e))))
# Convergence tolerance to exit solver
tolerance = 1.0e-9
# Create starting vectors
r0 = b - A.dot(x0)
r0_norm = np.linalg.norm(r0)
p0 = r0.copy()
# Start iterations
for k in range(200):
print("Step: {}".format(k))
alpha = r0.dot(r0)/(p0.dot(A.dot(p0)))
x1 = x0 + alpha*p0
r1 = r0 - alpha*A.dot(p0)
# Compute error in x (this is for studying the algorithm, and of
# course would not be done in practice)
e = x_exact - x1
e_norm = np.sqrt(e.dot(A.dot(e)))
print(" Relative error in x (A-norm): {}".format(e_norm/e0_norm))
# Compute norm of residual and check for converge
r_norm = np.linalg.norm(r1)
print(" Relative residual (l2-norm): {}".format(r_norm/r0_norm))
if r_norm < tolerance:
break
beta = r1.dot(r1)/r0.dot(r0)
p1 = r1 + beta*p0
# Update for next step
p0, r0, x0 = p1, r1, x1
Below we solve the $50 \times 50$ stiffness matrix problem with the conjugate gradient method. The iterations terminate once the residual drops below a specified threshold. The solver prints at each iteration the relative error in the solution in the $\boldsymbol{A}$-norm and the relative residual in the $l_{2}$ norm.
In [22]:
cg(create_stiffness_matrix(50))
We see that, as expected, the CG method for an $n \times n$ matrix solves the problem in $n$ iterations. However if, if we were satisfied with a less accurate solution we could have terminated the iterations sooner.
Note that the reduction of the error in $\boldsymbol{A}$-norm ($\sqrt{\boldsymbol{e}^{T}\boldsymbol{A}\boldsymbol{e}}$) decreases monotonically. The $l_{2}$-norm of the residual $\boldsymbol{r}_{k} = \boldsymbol{b} - \boldsymbol{A}\boldsymbol{x}_{k}$ increases in the first step (the relative norm of the residual is greater than one), and then starts to decrease. The monotonic convergence in the $\boldsymbol{A}$-norm is what we expect from analysis of the CG method.
We now test for the $50 \times 50$ mass matrix:
In [23]:
cg(create_mass_matrix(50))
The CG methods converges rapidly for the mass matrix. It has terminated for the mass matrix after just 17 iterations, fewer than the size of the matrix, and compared to the 50 for the stiffness matrix.
Exercise: Examine the condition number $\kappa_{2}$ for the mass and stiffness matrices as the matrix size increases.
Exercise: Test how the number of CG iterations changes for the mass and stiffness matrices as the matrix size is increased.