Before $$A_k = Q_k R_k, \quad A_{k+1} = R_k Q_k,$$
there was $LU-UL$ algorithm by Heinz Rutishauser
$$A_k = L_k U_k, \quad A_{k+1} = U_k L_k.$$This generates a sequence of similar matrices, but is less stable than QR algorithm.
Int. fact: Heinz Rutishauser has proposed the for keyword in programming language.
Recent paper Strassen algorithm reloaded claim to break conventional wisdom that Strassen algorithm is not very practical.
For dense linear algebra problems, we are limited by the memory to store the full matrix, it is $N^2$ parameters.
The class of sparse matrices where most of the elements are zero, allows us at least to store such matrices.
The question if we can:
with sparse matrices.
After the discretization of the one-dimensional Laplace equation with Dirichlet boundary conditions we have $$\frac{u_{i+1} + u_{i-1} - 2u_i}{h^2} = f_i,\quad i=1,\dots,N-1$$ $$ u_{0} = u_N = 0$$ or in the matrix form
$$ A u = f, $$ and ($n = 5$ illustration)
\begin{equation}
A=-\frac{1}{h^2}\begin{bmatrix}
2& -1 & 0 & 0 & 0\\
-1 & 2 & -1 & 0 &0 \\
0 & -1 & 2& -1 & 0 \\
0 & 0 & -1 & 2 &-1\\
0 & 0 & 0 & -1 & 2
\end{bmatrix}
\end{equation}
The matrix is triadiagonal and sparse
(and also Toeplitz: all elements on the diagonal are the same)
In two dimensions, we get equation of the form
$$-\frac{4u_{ij} -u_{(i-1)j} - u_{(i+1)j} - u_{i(j-1)}-u_{i(j+1)}}{h^2} = f_{ij},$$
or in the Kronecker product form
$$\Delta_{2D} = \Delta_{1D} \otimes I + I \otimes \Delta_{1D},$$where $\Delta_{1D}$ is a 1D Laplacian, and $\otimes$ is a Kronecker product of matrices.
For matrices $A\in\mathbb{R}^{n\times m}$ and $B\in\mathbb{R}^{l\times k}$ its Kronecker product is defined as a block matrix of the form $$ A\otimes B = \begin{bmatrix}a_{11}B & \dots & a_{1m}B \\ \vdots & \ddots & \vdots \\ a_{n1}B & \dots & a_{nm}B\end{bmatrix}\in\mathbb{R}^{nl\times mk}. $$
In the block matrix form the 2D-Laplace matrix can be written in the following form:
$$ A = -\frac{1}{h^2}\begin{bmatrix} \Delta_1 + 2I & -I & 0 & 0 & 0\\ -I & \Delta_1 + 2I & -I & 0 &0 \\ 0 & -I & \Delta_1 + 2I & -I & 0 \\ 0 & 0 & -I & \Delta_1 + 2I &-I\\ 0 & 0 & 0 & -I & \Delta_1 + 2I \end{bmatrix} $$We can create this matrix using scipy.sparse package (actually this is not the best sparse matrix package)
We can go to really large sizes (at least, to store this matrix in the memory).
Please note the following functions
spdiags
kron
In [12]:
import numpy as np
import scipy as sp
import scipy.sparse
from scipy.sparse import csc_matrix, csr_matrix
import matplotlib.pyplot as plt
import scipy.linalg
import scipy.sparse.linalg
%matplotlib inline
n = 5
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
plt.spy(A, aspect='equal', marker='.', markersize=5)
Out[12]:
The spy
command plots the sparsity pattern of the matrix: the $(i, j)$ pixel is drawn, if the corresponding matrix element is non-zero.
Sparsity pattern is really important for the understanding the complexity of the sparse linear algebra algorithms.
Often, only the sparsity pattern is needed for the analysis of "how complex" the matrix is.
The definition of "sparse matrix" is that the number of non-zero elements is much less than the total number of
elements, so you can do the basic linear algebra operations (like solving linear systems at the first place) can be done faster,
than if working for with the full matrix.
The scipy.sparse package has tools for solving sparse linear systems:
In [13]:
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A)
rhs = np.ones(n * n)
sol = sp.sparse.linalg.spsolve(A, rhs)
_, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(sol)
ax1.set_title('Not reshaped solution')
ax2.contourf(sol.reshape((n, n), order='f'))
ax2.set_title('Reshaped solution')
Out[13]:
The simplest format is to use coordinate format to represent the sparse matrix as positions and values of non-zero elements.
i, j, val
where i, j
are integer array of indices, val
is the real array of matrix elements.
So we need to store $3\cdot$ nnz elements, where nnz denotes number of nonzeroes in the matrix.
Q: What is good and what is bad in this format?
In the CSR format a matrix is stored as 3 different arrays:
ia, ja, sa
where:
So, we got $2\cdot{\bf nnz} + n+1$ elements.
Let us do a short timing test
In [14]:
import numpy as np
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg
from scipy.sparse import csc_matrix, csr_matrix, coo_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 400
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A)
rhs = np.ones(n * n)
B = coo_matrix(A)
%timeit A.dot(rhs)
%timeit B.dot(rhs)
As you see, CSR is faster, and for more unstructured patterns the gain will be larger.
The standard way to measure the efficiency of linear algebra operations on a particular computing architecture is to
use flops (number of floating point operations per second)
The peak performance is determined as
frequency $\times$ number of cores $\times$ pipeline size $\times$ 2.
We can measure peak efficiency of an ordinary matrix-by-vector product.
In [15]:
import numpy as np
import time
n = 4000
a = np.random.randn(n, n)
v = np.random.randn(n)
t = time.time()
np.dot(a, v)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, ((2 * n ** 2)/t) / 10 ** 9))
In [16]:
n = 4000
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.random.randn(n)
t = time.time()
a.dot(rhs)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, (3 * n) / t / 10 ** 9))
Sparse matrix computations dominate linear algebra computations nowadays.
They allow us to work with much larger matrices, but they utilize only $10\%-15\%$ percent of the peak computer performance.
It means, that our computer architecture is not well suited for standard sparse matrix algorithms.
There are many possible solutions of the problem, for example:
In [17]:
n = 1000000
k = 1
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.random.randn(n, k)
t = time.time()
a.dot(rhs)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, (3 * n * k) / t / 10 ** 9))
There are many other types of matrices besides tridiagonal/block tridiagonal
Florida sparse matrix collection which contains all sorts of matrices for different applications.
It also allows for finding test matrices as well!
We can have a look.
In [18]:
from IPython.display import HTML
HTML('<iframe src=http://yifanhu.net/GALLERY/GRAPHS/search.html width=700 height=450></iframe>')
Out[18]:
Sparse matrices and fast algorithms (especially for linear systems) have deep connection with graph theory.
First of all, sparse matrix can be treated as an adjacency matrix of a certain graph:
The vertices $(i, j)$ are connected, if the corresponding matrix element is non-zero.
If the matrix is symmetric the graph can be undirected, otherwise it is directed.
Why sparse linear systems can be solved faster, what is the technique?
In the LU-factorization of the matrix $A$ the factors $L$ and $U$ can be also sparse:
$$A = L U$$And solving linear systems with sparse triangular matrices is very easy.
Note that the inverse matrix is not sparse!
In [19]:
#Indeed, it is not sparse
n = 7
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
a = a.todense()
b = np.array(np.linalg.inv(a))
print a
print b
In [25]:
from scipy.sparse.linalg import splu
T = splu(a, permc_spec="NATURAL")
print T.L.todense()
Interesting to note that splu
without permc_spec
will produce permutations which will not yield the bidiagonal factor:
In [21]:
from scipy.sparse.linalg import splu
T = splu(a)
print T.L.todense()
print T.perm_c
In [27]:
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
T = scipy.sparse.linalg.spilu(A)
plt.spy(T.L, marker='.', color='k', markersize=8)
Out[27]:
For correct permutation in 2D case the number of nonzeros in $L$ factor grows as $\mathcal{O}(N \log N)$. But complexity is $\mathcal{O}(N^{3/2})$.
The number of nonzeros in LU decomposition has a deep connection to the graph theory.
(I.e., there is an edge between $(i, j)$ if $a_{ij} \ne 0$.
networkx package
can be used to visualize graphs, given only the adjacency matrix.
It may even recover to some extend the graph structure.
In [26]:
import networkx as nx
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
G = nx.Graph(A)
nx.draw(G, pos=nx.spring_layout(G), node_size=10)
The fill-in of a matrix are those entries which change from an initial zero to a nonzero value during the execution of an algorithm.
The fill-in is different for different permutations. So, before factorization we need to find reordering which produces the smallest fill-in.
Example $$A = \begin{bmatrix}
* & * & * & * & *\\
* & * & 0 & 0 & 0 \\
* & 0 & * & 0 & 0 \\
* & 0 & 0& * & 0 \\
* & 0 & 0& 0 & *
\end{bmatrix}
$$ If we elimate elements from the top to the bottom, then we will obtain dense matrix. However, we could maintain sparsity if elimination was done from the bottom to the top.
Reordering the rows and the columns of the sparse matrix in order to reduce the number of nonzeros in $L$ and $U$ factors is called fill-in minimization.
Examples of algorithms for fill-in minimization:
In [24]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[24]: