In [1]:
using Plots,LaTeXStrings
default(markersize=3,linewidth=1.5)
using Images,TestImages
using MatrixDepot,JLD
using LinearAlgebra
using SparseArrays,IncompleteLU,Arpack
using IterativeSolvers,LinearMaps,Preconditioners
include("FNC.jl");


include group.jl for user defined matrix generators
verify download of index files...
used remote site is https://sparse.tamu.edu/?per_page=All
populating internal database...

Example 8.1.1

Here we load the adjacency matrix of a graph with 2790 nodes. Each node is a web page referring to Roswell, NM, and the edges represent links between web pages.


In [2]:
vars = load("roswelladj.jld")       # get from the book's website
i = vars["i"];  j = vars["j"];

A = sparse(i,j,fill(1.0,size(i)),2790,2790)
varinfo(r"A")                       # to see memory consumption


Out[2]:
name size summary
A 110.516 KiB 2790×2790 SparseMatrixCSC{Float64,Int32}

We may define the density of $\mathbf{A}$ as the number of nonzeros divided by the total number of entries.


In [3]:
m,n = size(A)
@show density = nnz(A) / (m*n);


density = nnz(A) / (m * n) = 0.0010902994565845762

We can compare the storage space needed for the sparse $\mathbf{A}$ with the space needed for its dense or full counterpart. This ratio can never be as small as the density of nonzeros, because of the need to store locations as well as data. However, it's still quite small here, even though the matrix is not really large.


In [4]:
F = Matrix(A)
varinfo(r"F")


Out[4]:
name size summary
F 59.388 MiB 2790×2790 Array{Float64,2}
FNC 172.025 KiB Module

In [5]:
@show storageratio = 111000/59e6;


storageratio = 111000 / 5.9e7 = 0.00188135593220339

Matrix-vector products are also much faster using the sparse form, because operations with structural zeros are skipped.


In [6]:
x = randn(n)
@elapsed for i = 1:200; A*x; end


Out[6]:
0.035085828

In [7]:
@elapsed for i = 1:200; F*x; end


Out[7]:
0.779797328

However, the sparse storage format is column-oriented. Operations on rows may take a lot longer than similar ones on columns. (Note: Such behavior is dramatic here for MATLAB, but not Julia.)


In [8]:
v = A[:,1000]
println("time for replacing columns:")
for i = 1:n; A[:,i]=v; end     # run once to improve timing accuracy
@elapsed for i = 1:n; A[:,i]=v; end


time for replacing columns:
Out[8]:
0.395109955

In [9]:
r = v'
println("time for replacing rows:")
for i = 1:n; A[i,:]=r; end     # run once to improve timing accuracy
@elapsed for i = 1:n; A[i,:]=r; end


time for replacing rows:
Out[9]:
0.428002265

Example 8.1.2

Here is the adjacency matrix of a graph representing a "small world" network featuring connections to neighbors and a small number of strangers.


In [10]:
A = matrixdepot("smallworld",100,4,0.2);

In [11]:
spy(A,title="Nonzeros in A",color=:bluesreds)


Out[11]:
20 40 60 80 100 20 40 60 80 100 Nonzeros in A - 0.100 - 0.075 - 0.050 - 0.025 0 0.025 0.050 0.075 0.100

The number of vertex pairs connected by a path of length $k>1$ grows with $k$, as can be seen here for $k=4$. (This would be "four degrees of separation."


In [12]:
spy(A^4,title="Nonzeros in A^4",color=:bluesreds)


Out[12]:
20 40 60 80 100 20 40 60 80 100 Nonzeros in A^4 50 100 150 200 250 300 350

Example 8.1.3

Here is a matrix with both lower and upper bandwidth equal to one. Such a matrix is called tridiagonal. The spdiagm command creates a sparse matrix given its diagonal elements. The main or central diagonal is numbered zero, above and to the right of that is positive, and below and to the left is negative.


In [13]:
n = 50;
A = spdiagm(-3=>fill(n,n-3),0=>ones(n),1=>-(1:n-1))
Matrix( A[1:7,1:7] )


Out[13]:
7×7 Array{Float64,2}:
  1.0  -1.0   0.0   0.0   0.0   0.0   0.0
  0.0   1.0  -2.0   0.0   0.0   0.0   0.0
  0.0   0.0   1.0  -3.0   0.0   0.0   0.0
 50.0   0.0   0.0   1.0  -4.0   0.0   0.0
  0.0  50.0   0.0   0.0   1.0  -5.0   0.0
  0.0   0.0  50.0   0.0   0.0   1.0  -6.0
  0.0   0.0   0.0  50.0   0.0   0.0   1.0

Without pivoting, the LU factors have the same lower and upper bandwidth as the orignal matrix.


In [14]:
L,U = FNC.lufact(A)
spy(sparse(L),layout=2,subplot=1,markersize=2,title="L")
spy!(sparse(U),layout=2,subplot=2,markersize=2,title="U")


Out[14]:
10 20 30 40 50 10 20 30 40 50 L 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 10 20 30 40 50 U 0 50 100 150 200 250 300

However, if we introduce row pivoting, bandedness may be expanded or destroyed.


In [15]:
fact = lu(A)
spy(sparse(fact.L),layout=2,subplot=1,markersize=2,title="L")
spy!(sparse(fact.U),layout=2,subplot=2,markersize=2,title="U")


Out[15]:
10 20 30 40 50 10 20 30 40 50 L - 6 - 5 - 4 - 3 - 2 - 1 0 1 10 20 30 40 50 10 20 30 40 50 U - 0.5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Example 8.1.4

The following generates a random sparse matrix with prescribed eigenvalues.


In [16]:
n = 4000
density = 1.23e-3
lambda = @. 1/(1:n)
A = FNC.sprandsym(n,density,lambda);

In [17]:
spy(A,title="Sparse symmetric matrix",color=:bluesreds)


Out[17]:
1000 2000 3000 4000 1000 2000 3000 4000 Sparse symmetric matrix - 0.25 0 0.25 0.50

In [18]:
lambda,V = eigs(A,nev=5,which=:LM)    # largest magnitude
lambda


Out[18]:
5-element Array{Complex{Float64},1}:
   0.999999999999999 + 0.0im
                 0.5 + 0.0im
  0.3333333333333345 + 0.0im
 0.24999999999999944 + 0.0im
  0.2000000000000004 + 0.0im

In [19]:
lambda,V = eigs(A,nev=5,sigma=0)    # closest to zero
1 ./ lambda


Out[19]:
5-element Array{Complex{Float64},1}:
 4000.0000000000528 - 0.0im
 3999.0000000000437 - 0.0im
 3997.9999999999673 - 0.0im
 3996.9999999999673 - 0.0im
 3996.0000000000027 - 0.0im

The scaling of time to solve a sparse linear system is not easy to predict unless you have some more information about the matrix (such as bandedness). But it will typically be a great deal faster than the dense or full matrix case.


In [20]:
x = @. 1/(1:n);  b = A*x;
@elapsed sparse_err = norm(x - A\b)


Out[20]:
1.025542645

In [21]:
A = Matrix(A)  # convert to regular matrix
x = @. 1/(1:n);  b = A*x;
@elapsed dense_err = norm(x - A\b)


Out[21]:
1.637381834

In [22]:
@show sparse_err,dense_err;


(sparse_err, dense_err) = (5.93535007111627e-15, 6.068388344273622e-15)

Example 8.2.1

Here we let $\mathbf{A}$ be a $5\times 5$ matrix. We also choose a random 5-vector.


In [23]:
A = rand(1.:9.,5,5)
A = A./sum(A,dims=1)
x = randn(5)


Out[23]:
5-element Array{Float64,1}:
  1.5190434457022512
 -1.0048106042505094
 -1.855619393509681 
  0.6462151410181369
 -1.069795021230948 

Applying matrix-vector multiplication once doesn't do anything recognizable.


In [24]:
y = A*x


Out[24]:
5-element Array{Float64,1}:
 -0.6185029241139731 
  0.402917325288195  
 -0.664638994284783  
 -0.05461520330049485
 -0.830126635859694  

Repeating the multiplication still doesn't do anything obvious.


In [25]:
z = A*y


Out[25]:
5-element Array{Float64,1}:
 -0.2117031175511901 
 -0.302275006673409  
 -0.4816101088547412 
 -0.45687873833179127
 -0.31249946085961844

But if we keep repeating the matrix-vector multiplication, something remarkable happens: $\mathbf{A}\mathbf{x}\approx \mathbf{x}$.


In [26]:
for j = 1:8;  x = A*x;  end
[x A*x]


Out[26]:
5×2 Array{Float64,2}:
 -0.343991  -0.343971
 -0.327046  -0.327044
 -0.406943  -0.406948
 -0.385992  -0.386021
 -0.300995  -0.300983

This seems to occur regardless of the starting value of $x$.


In [27]:
x = randn(5)
for j = 1:8;  x = A*x;  end
[x A*x]


Out[27]:
5×2 Array{Float64,2}:
 0.133468  0.13344 
 0.126853  0.126882
 0.157877  0.157878
 0.149747  0.149761
 0.116781  0.116765

Example 8.2.2

We set up a $5\times 5$ matrix with prescribed eigenvalues, then apply the power iteration.


In [28]:
lambda = [1,-0.75,0.6,-0.4,0]
A = triu(ones(5,5),1) + diagm(0=>lambda)   # triangular matrix, eigs on diagonal


Out[28]:
5×5 Array{Float64,2}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run the power iteration 60 times. The best estimate of the dominant eigenvalue is the last entry of gamma.


In [29]:
gamma,x = FNC.poweriter(A,60)
eigval = gamma[end]


Out[29]:
1.0000000039769024

We check linear convergence using a log-linear plot of the error. We use our best estimate in order to compute the error at each step.


In [30]:
err = @. eigval - gamma
plot(0:59,abs.(err),m=:o,label="", 
    title="Convergence of power iteration",
    xlabel=L"k",yaxis=(L"|\lambda_1 - \gamma_k|",:log10,[1e-10,1]) )


Out[30]:
0 10 20 30 40 50 60 10 - 10.0 10 - 7.5 10 - 5.0 10 - 2.5 10 0.0 Convergence of power iteration

The trend is clearly a straight line asymptotically. We can get a refined estimate of the error reduction in each step by using the exact eigenvalues.


In [31]:
@show theory = lambda[2]/lambda[1];
@show observed = err[40]/err[39];


theory = lambda[2] / lambda[1] = -0.75
observed = err[40] / err[39] = -0.7474001045295228

Note that the error is supposed to change sign on each iteration. An effect of these alternating signs is that estimates oscillate around the exact value.


In [32]:
gamma[36:40]


Out[32]:
5-element Array{Float64,1}:
 1.0000039723613892
 0.999997032800801 
 1.0000022326420288
 0.9999983298638231
 1.0000012552091928

Example 8.3.1

We set up a $5\times 5$ triangular matrix with prescribed eigenvalues on its diagonal.


In [33]:
lambda = [1,-0.75,0.6,-0.4,0]
A = triu(ones(5,5),1) + diagm(0=>lambda)   # triangular matrix, eigs on diagonal


Out[33]:
5×5 Array{Float64,2}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We run inverse iteration with the shift $s=0.7$ and take the final estimate as our ``exact'' answer to observe the convergence.


In [34]:
gamma,x = FNC.inviter(A,0.7,30)
eigval = gamma[end]


Out[34]:
0.5999999999999988

As expected, the eigenvalue that was found is the one closest to $0.7$. The convergence is again linear.


In [35]:
err = @. eigval - gamma
plot(0:29,abs.(err),m=:o,label="", 
    title="Convergence of inverse iteration",
    xlabel=L"k",yaxis=(L"|\lambda_3 - \gamma_k|",:log10,[1e-16,1]) )


Out[35]:
0 5 10 15 20 25 10 - 15 10 - 10 10 - 5 10 0 Convergence of inverse iteration

The observed linear convergence rate is found from the data.


In [36]:
@show observed_rate = err[22]/err[21];


observed_rate = err[22] / err[21] = -0.33326822443491533

In the numbering of this example, the eigenvalue closest to $s=0.7$ is $\lambda_3$ and the next-closest is $\lambda_1$.


In [37]:
@show theoretical_rate = (lambda[3]-0.7) / (lambda[1]-0.7);


theoretical_rate = (lambda[3] - 0.7) / (lambda[1] - 0.7) = -0.3333333333333332

Example 8.3.2


In [38]:
lambda = [1,-0.75,0.6,-0.4,0]
A = triu(ones(5,5),1) + diagm(0=>lambda)   # triangular matrix, eigs on diagonal


Out[38]:
5×5 Array{Float64,2}:
 1.0   1.0   1.0   1.0  1.0
 0.0  -0.75  1.0   1.0  1.0
 0.0   0.0   0.6   1.0  1.0
 0.0   0.0   0.0  -0.4  1.0
 0.0   0.0   0.0   0.0  0.0

We begin with a shift $s=0.7$, which is closest to the eigenvalue 0.6.


In [39]:
s = 0.7
x = ones(5)
y = (A-s*I)\x
gamma = x[1]/y[1] + s


Out[39]:
0.7034813925570228

Note that the result is not yet any closer to the targeted $0.6$. But we proceed (without being too picky about normalization here).


In [40]:
s = gamma
x = y/y[1]
y = (A-s*I)\x;  gamma = x[1]/y[1] + s


Out[40]:
0.5612761406172997

Still not much apparent progress. However, in just a few more iterations the results are dramatically better.


In [41]:
for k = 1:4
    s = gamma  
    x = y/y[1]
    y = (A-s*I)\x  
    gamma = x[1]/y[1] + s
    @show gamma
end


gamma = 0.5964312884753865
gamma = 0.5999717091820104
gamma = 0.5999999978556353
gamma = 0.6

Example 8.4.1

First we define a triangular matrix with known eigenvalues and a random vector $b$.


In [42]:
lambda = @. 10 + (1:100)
A = triu(rand(100,100),1) + diagm(0=>lambda)
b = rand(100);

Next we build up the first ten Krylov matrices iteratively, using renormalization after each matrix-vector multiplication.


In [43]:
Km = [b zeros(100,29)]
for m = 1:29      
    v = A*Km[:,m]
    Km[:,m+1] = v/norm(v)
end

Now we solve a least squares problem for Krylov matrices of increasing dimension.


In [44]:
resid = zeros(30)
for m = 1:30  
    z = (A*Km[:,1:m])\b
    x = Km[:,1:m]*z
    resid[m] = norm(b-A*x)
end

The linear system approximations show smooth linear convergence at first, but the convergence stagnates after only a few digits have been found.


In [45]:
plot(0:29,resid,m=:o,
    xaxis=(L"m"),yaxis=(:log10,L"\| b-Ax_m \|"), 
    title="Residual for linear systems",leg=:none)


Out[45]:
0 5 10 15 20 25 10 - 4 10 - 3 10 - 2 10 - 1 10 0 Residual for linear systems

Example 8.4.2

We illustrate a few steps of the Arnoldi iteration for a small matrix.


In [46]:
A = rand(1.:9.,6,6)


Out[46]:
6×6 Array{Float64,2}:
 8.0  5.0  6.0  1.0  7.0  6.0
 3.0  9.0  1.0  9.0  1.0  3.0
 8.0  2.0  7.0  5.0  1.0  6.0
 6.0  5.0  2.0  8.0  3.0  9.0
 5.0  4.0  8.0  8.0  7.0  9.0
 7.0  8.0  2.0  6.0  5.0  4.0

The seed vector determines the first member of the orthonormal basis.


In [47]:
u = randn(6)
Q = u/norm(u);

Multiplication by $\mathbf{A}$ gives us a new vector in $\mathcal{K}_2$.


In [48]:
Aq = A*Q[:,1];

We subtract off its projection in the previous direction. The remainder is rescaled to give us the next orthonormal column.


In [49]:
v = Aq - dot(Q[:,1],Aq)*Q[:,1]
Q = [Q v/norm(v)];

On the next pass, we have to subtract off the projections in two previous directions.


In [50]:
Aq = A*Q[:,2]
v = Aq - dot(Q[:,1],Aq)*Q[:,1] - dot(Q[:,2],Aq)*Q[:,2]
Q = [Q v/norm(v)];

At every step, $Q_m$ is an ONC matrix.


In [51]:
opnorm( Q'*Q - I )


Out[51]:
7.87506378657921e-16

And $Q_m$ spans the same space as the 3-dimensional Krylov matrix.


In [52]:
K = [ u A*u A*A*u ];
@show rank( [Q K] )


rank([Q K]) = 3
Out[52]:
3

Example 8.5.1

We define a triangular matrix with known eigenvalues and a random vector $b$.


In [53]:
lambda = @. 10 + (1:100)
A = triu(rand(100,100),1) + diagm(0=>lambda)
b = rand(100);

Instead of building up the Krylov matrices, we use the Arnoldi iteration to generate equivalent orthonormal vectors.


In [54]:
Q,H = FNC.arnoldi(A,b,60);

The Arnoldi bases are used to solve the least squares problems defining the GMRES iterates.


In [55]:
resid = [norm(b);zeros(60)]
for m = 1:60  
    s = [norm(b); zeros(m)]
    z = H[1:m+1,1:m]\s
    x = Q[:,1:m]*z
    resid[m+1] = norm(b-A*x)
 end

The approximations converge smoothly, practically all the way to machine epsilon.


In [56]:
plot(0:60,resid,m=:o,
    xaxis=(L"m"),yaxis=(:log10,L"\| b-Ax_m \|"), 
    title="Residual for GMRES",leg=:none)


Out[56]:
0 10 20 30 40 50 60 10 - 15 10 - 10 10 - 5 10 0 Residual for GMRES

Example 8.5.2

The following experiments are based on a matrix resulting from discretization of a partial differential equation.


In [57]:
d = 50;
A = d^2*matrixdepot("poisson",d)
@show n = size(A,1)
b = ones(n);


n = size(A, 1) = 2500

We compare unrestarted GMRES with three different thresholds for restarting.


In [58]:
maxit = 120;  rtol = 1e-8;
rest = [maxit,20,40,60]
plt = plot([],[],label="",title="Convergence of restarted GMRES",leg=:bottomleft,
    xaxis=("m"), yaxis=(:log10,"residual norm",[1e-8,100]))
for j = 1:4
    x,hist = gmres(A,b,restart=rest[j],tol=rtol,maxiter=maxit,log=true)
    plot!(hist[:resnorm],label="restart = $(rest[j])")
end
display(plt)


0 25 50 75 100 10 - 7.5 10 - 5.0 10 - 2.5 10 0.0 Convergence of restarted GMRES m residual norm restart = 120 restart = 20 restart = 40 restart = 60

The "pure" curve is the lowest one. All of the other curves agree with it until they encounter their first restart.

Example 8.6.1

In this example we compare MINRES and CG on some pseudorandom SPD problems. The first matrix has a condition number of 100.


In [59]:
n = 2000
density = 0.005
A = FNC.sprandsym(n,density,1e-2)
@show nnz(A);


nnz(A) = 20014

We cook up a linear system whose solution we happen to know exactly.


In [60]:
x = (1:n)/n
b = A*x;

Now we apply both methods and compare the convergence of the system residuals, using the built-in function pcg in the latter case.


In [61]:
xMR,histMR = minres(A,b,tol=1e-12,maxiter=101,log=true)
xCG,histCG = cg(A,b,tol=1e-12,maxiter=101,log=true)

plot(0:100,[histMR[:resnorm] histCG[:resnorm]]/norm(b),m=:o,label=["MINRES" "CG"], 
    title="Convergence of MINRES and CG",
    xaxis=("Krylov dimension \$m\$"), yaxis=(:log10,L"\|r_m\| / \|b\|") )


Out[61]:
0 25 50 75 100 10 - 8 10 - 6 10 - 4 10 - 2 Convergence of MINRES and CG Krylov dimension $m$ MINRES CG

There is virtually no difference between the two methods here when measuring the residual. We see little difference in the errors as well.


In [62]:
@show errorMR = norm( xMR - x ) / norm(x);
@show errorCG = norm( xCG - x) / norm(x);


errorMR = norm(xMR - x) / norm(x) = 3.7249045318228777e-9
errorCG = norm(xCG - x) / norm(x) = 2.3645594512800837e-9

Next we use a system matrix whose condition number is $10^4$.


In [63]:
A = FNC.sprandsym(n,density,1e-4);

Now we find that the CG residual jumps unexpectedly, but overall both methods converge at about the same linear rate. Note from the scales that both methods have actually made very little progress after 100 iterations, though.


In [64]:
xMR,histMR = minres(A,b,tol=1e-12,maxiter=101,log=true)
xCG,histCG = cg(A,b,tol=1e-12,maxiter=101,log=true)

plot(0:100,[histMR[:resnorm] histCG[:resnorm]]/norm(b),m=:o,label=["MINRES" "CG"], 
    title="Convergence of MINRES and CG",
    xaxis=("Krylov dimension \$m\$"), yaxis=(:log10,L"\|r_m\| / \|b\|") )


Out[64]:
0 25 50 75 100 10 - 0.8 10 - 0.6 10 - 0.4 10 - 0.2 10 0.0 10 0.2 10 0.4 Convergence of MINRES and CG Krylov dimension $m$ MINRES CG

The errors confirm that we are nowhere near the correct solution in either case.


In [65]:
@show errorMR = norm( xMR - x ) / norm(x);
@show errorCG = norm( xCG - x) / norm(x);


errorMR = norm(xMR - x) / norm(x) = 516.5992587885062
errorCG = norm(xCG - x) / norm(x) = 603.3633521404208

Example 8.7.1

We use a readily available test image.


In [66]:
img = testimage("mandrill");
@show m,n = size(img)
plot(Gray.(img),title="Original image",aspect_ratio=1)


(m, n) = size(img) = (512, 512)
Out[66]:
100 200 300 400 500 100 200 300 400 500 Original image

We define the one-dimensional tridiagonal blurring matrices.


In [67]:
X = @. Float64(Gray(img))

B = spdiagm(0=>fill(0.5,m),1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
C = spdiagm(0=>fill(0.5,n),1=>fill(0.25,n-1),-1=>fill(0.25,n-1));

Finally, we show the results of using $k=12$ repetitions of the blur in each direction.


In [68]:
blur = X -> B^12 * X * C^12;
plot(Gray.(blur(X)),title="Blurred image",aspect_ratio=1)


Out[68]:
100 200 300 400 500 100 200 300 400 500 Blurred image

Example 8.7.2

We repeat the earlier process to blur the original image $X$ to get $Z$.


In [69]:
img = testimage("mandrill");
m,n = size(img)

X = @. Float64(Gray(img))

B = spdiagm(0=>fill(0.5,m),1=>fill(0.25,m-1),-1=>fill(0.25,m-1));
C = spdiagm(0=>fill(0.5,n),1=>fill(0.25,n-1),-1=>fill(0.25,n-1));
blur = X -> B^12 * X * C^12;
Z = blur(X);

Now we imagine that $X$ is unknown and that the blurred $Z$ is given. We want to invert the blur transformation using the transformation itself. But we have to translate between vectors and images each time.


In [70]:
unvec = z -> reshape(z,m,n)
T = LinearMap(x -> vec(blur(unvec(x))),m*n);

Now we apply gmres to the composite blurring transformation T.


In [71]:
y = gmres(T,vec(Z),maxiter=50,tol=1e-5);
Y = unvec(@.max(0,min(1,y)));

In [72]:
plot(Gray.(X),layout=2,subplot=1,title="Original",aspect_ratio=1)
plot!(Gray.(Y),layout=2,subplot=2,title="Deblurred",aspect_ratio=1)


Out[72]:
100 200 300 400 500 100 200 300 400 500 Original 100 200 300 400 500 100 200 300 400 500 Deblurred

The reconstruction isn't perfect because the condition number of repeated blurring happens to be very large.

Example 8.8.1

Here is a large sparse matrix.


In [73]:
A = 2.8I + sprand(10000,10000,0.002);

Without a preconditioner, GMRES takes a large number of iterations.


In [74]:
b = rand(10000)
gmres(A,b,maxiter=300,tol=1e-10,restart=50,log=true);
time_plain = @elapsed x,hist1 = gmres(A,b,maxiter=300,tol=1e-10,restart=50,log=true)


Out[74]:
0.142286662

In [75]:
plot(hist1[:resnorm],label="", 
    title="GMRES with no preconditioning",
    xaxis=("iteration number"), yaxis=(:log10,"residual norm") )


Out[75]:
0 50 100 150 200 250 10 - 8 10 - 6 10 - 4 10 - 2 10 0 GMRES with no preconditioning iteration number residual norm

This version of incomplete $LU$ factorization drops all sufficiently small values (i.e., replaces them with zeros). This keeps the number of nonzeros in the factors under control.


In [76]:
iLU = ilu(A,τ=0.2);
@show nnz(A),nnz(iLU.L);


(nnz(A), nnz(iLU.L)) = (209999, 157678)

It does not produce a true factorization of $A$. However, it's close enough to serve as "approximate inverse" in a preconditioner.

The actual preconditioning matrix is $\mathbf{M}=\mathbf{L}\mathbf{U}$. However, we just supply the factorization as a left preconditioner, since the preconditioning step is to solve a system with the matrix $\mathbf{M}$.


In [77]:
time_prec = @elapsed x,hist2 = gmres(A,b,Pl=iLU,maxiter=300,tol=1e-10,restart=50,log=true)


Out[77]:
0.409312086

The preconditioning is fairly successful in this case.


In [78]:
plot(hist1[:resnorm],label="no prec.", 
    xaxis=("iteration number"), yaxis=(:log10,"residual norm") )
plot!(hist2[:resnorm],label="iLU prec.")


Out[78]:
0 50 100 150 200 250 10 - 8 10 - 6 10 - 4 10 - 2 10 0 iteration number residual norm no prec. iLU prec.

We probably made each GMRES iteration slower because of the need to apply the preconditioner (here, by solving sparse triangular systems). However, there are a lot fewer iterations needed, and there is a modest gain overall.

Example 8.8.2

First we create a large, sparse, positive definite matrix that arises in the solution of differntial equations.


In [79]:
n = 100
D2 = spdiagm(0=>fill(2,n-1),1=>-ones(n-2),-1=>-ones(n-2))
IA = spdiagm(0=>ones(n-1))
A = kron(IA,D2) + kron(D2,IA);

Now we solve a linear system with a random right-hand side, without preconditioner.


In [80]:
b = rand(size(A,1))
cg(A,b,maxiter=4,tol=1e-10,log=true);  # make timing more accurate
time_plain = @elapsed x,hist1 = cg(A,b,maxiter=400,tol=1e-10,log=true)


Out[80]:
0.025621319

In [81]:
plot(hist1[:resnorm],label="", 
    title="CG with no preconditioning",
    xaxis=("iteration number"), yaxis=(:log10,"residual norm") )


Out[81]:
0 100 200 300 10 - 7.5 10 - 5.0 10 - 2.5 10 0.0 10 2.5 CG with no preconditioning iteration number residual norm

For an SPD matrix we can use an incomplete Cholesky factorization. (It uses a lower triangular $\mathbf{L}=\mathbf{R}^T$ rather than an upper triangular $\mathbf{R}$.)


In [82]:
P = CholeskyPreconditioner(A,1);

Now we apply CG using this preconditioner.


In [83]:
cg(A,b,Pl=P,maxiter=4,tol=1e-10,log=true);  # make timing more accurate
time_prec = @elapsed x,hist2 = cg(A,b,Pl=P,maxiter=400,tol=1e-10,log=true)


Out[83]:
0.021515163

In [84]:
plot(hist1[:resnorm],label="no prec.", 
    xaxis=("iteration number"), yaxis=(:log10,"residual norm"),
    title="CG with incomplete Cholesky preconditioning")
plot!(hist2[:resnorm],label="iChol prec.")


Out[84]:
0 100 200 300 10 - 7.5 10 - 5.0 10 - 2.5 10 0.0 10 2.5 CG with incomplete Cholesky preconditioning iteration number residual norm no prec. iChol prec.

You can see we got a little improvement with this preconditioner. It saved enough iterations to more than make up for the fact that each iteration now involves extra work.