Lecture 8: Sparse matrices

Syllabus

Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues
Week 4: Singular value decomposition + test + homework seminar
Week 5: Sparse & structured matrices

Recap of the previous lecture

  • SVD, best low-rank approximations
  • Applications
    (Eigenfaces, collaborative filtering, integral equations, latent semantic indexing)

Today lecture

Today we will talk about sparse matrices.

  • Formats: list of lists and compressed sparse row format, relation to sparse graph
  • Fast matrix-by-vector product
  • Fast direct solvers for Gaussian elimination

Why sparse matrices are important

  • Describe local physical nature
  • Provide storage reduction and matrix-by-vector product reduction
  • Describe graphs: $a_{ij} \ne 0$ if there is an edge between vertex $i$ and vertex $j$.

Sparse matrices are ubiqitous in PDE

Consider the simplest partial differential equation (PDE), called Poisson equation:
$$ \Delta T = \frac{\partial^2 T}{\partial x^2} + \frac{\partial^2 T}{\partial y^2} = f. $$

Discretization

$$\frac{\partial^2 T}{\partial x^2} \approx \frac{T(x+h) + T(x-h) - 2T(x)}{h^2} + \mathcal{O}(h^2),$$

same for $\frac{\partial^2 T}{\partial y^2},$ and we get a linear system.
First, let us do one-dimensional case:

$$ \frac{u_{i+1} + u_{i-1} - 2u_i}{h^2} = f_i,$$

or in the matrix form

$$ A u = f, $$

and ($n = 5$ illustration) $$ A = -\frac{1}{h^2}\begin{pmatrix} 2& -1 & 0 & 0 & 0\\ -1 & 2 & -1 & 0 &0 \\ 0 & -1 & 2& -1 & 0 \\ 0 & 0 & -1 & 2 &-1\\ 0 & 0 & 0 & -1 & 2 \end{pmatrix} $$

The matrix is triadiagonal and sparse
(and also Toeplitz: all elements on the diagonal are the same)

Block structure in 2D

In two dimensions, we get equation of the form
$$\frac{4u_{ij} -u_{(i-1)j} - u_{(i+1)j} - u_{i(j-1)}-u_{i(j+1)}}{h^2} = f_{ij},$$ or in the Kronecker product form

$$\Delta_2 = \Delta_1 \otimes I + I \otimes \Delta_1,$$

where $\Delta_1$ is a 1D Laplacian, and $\otimes$ is a Kronecker product of matrices.
For matrices $A$ and $B$ its Kronecker product is defined as a block matrix of the form $$ [a_{ij} B] $$

$$ A = -\frac{1}{h^2}\begin{pmatrix} \Delta_1 + 2I & -I & 0 & 0 & 0\\ -I & \Delta_1 + 2I & -I & 0 &0 \\ 0 & -I & \Delta_1 + 2I & -I & 0 \\ 0 & 0 & -I & \Delta_1 + 2I &-I\\ 0 & 0 & 0 & -I & \Delta_1 + 2I \end{pmatrix} $$

We can create this matrix using scipy.sparse package


In [5]:
import numpy as np
import scipy as sp
import scipy.sparse
from scipy.sparse import csc_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
plt.spy(A, aspect='equal', marker='.', markersize=1)


Out[5]:
<matplotlib.lines.Line2D at 0x10ecb85d0>

In [3]:
#np.vstack((ex,  -2*ex, ex)).shape
print ex.shape


(20,)

We can now solve the system


In [33]:
import numpy as np
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg
from scipy.sparse import csc_matrix, csr_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 256
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A) * (n + 1) ** 2
rhs = np.ones(n * n)
sol = sp.sparse.linalg.spsolve(A, rhs)

And plot the solution


In [34]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
t = np.linspace(0, 1, n)
x, y = np.meshgrid(t, t)
sol = np.reshape(sol, (n, n))
ax.plot_wireframe(x, y, -sol)


Out[34]:
<mpl_toolkits.mplot3d.art3d.Line3DCollection at 0x111207610>

In [41]:
print A.shape
v256 = (np.max(-sol))
vex = (4 * v256 - v128) / 3
print vex - v256


(65536, 65536)
2.74259855287e-06

How to store a sparse matrix

At first glance it is easy: you can use the coordinate format as

   (i, j, value)

i.e. store two integer arrays and one real array. This is not optimal.
And how to multiply a matrix by vector?

Compressed sparse row

A matrix is stored as 3 different arrays:

ia, ja, sa

where:

  • ia is an integer array of length $n+1$
  • ja is an integer array of length nnz
  • sa is an real-value array of length nnz
  • nnz is the total number of non-zeros for the matrix

Idea behind CSR.

  • For each row $i$ we store the column number of the non-zeros (and their) values
  • We stack this all together into ja and sa arrays
  • We save the location of the first non-zero element in each row

CSR helps for matrix-by-vector product as well

   for i in xrange(n):
       for k in xrange(ia(i):ia(i+1)-1):
           y(i) += sa(k) * x(ja(k))

Let us do a short timing test


In [44]:
import numpy as np
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg
from scipy.sparse import csc_matrix, csr_matrix, coo_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 256
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A)
rhs = np.ones(n * n)
B = coo_matrix(A)
%timeit A.dot(rhs)
%timeit B.dot(rhs)


1000 loops, best of 3: 542 µs per loop
1000 loops, best of 3: 758 µs per loop

As you see, CSR is faster, and for more unstructured patterns the gain will be larger.

Sparse matrices and efficiency

Sparse matrix give complexity reduction.
But they are not very good for parallel implementation.
And they do not give maximal efficiency due to random data access.
Typically, peak efficiency of 10-15% is considered good.
We can measure peak efficiency of an ordinary dot product:


In [45]:
import numpy as np
import time
n = 1000
a = np.random.randn(n, n)
v = np.random.randn(n)
t = time.time()
np.dot(a, v)
t = time.time() - t
print 'Efficiency:', ((2 * n ** 2)/t) / 10 ** 9, 'Gflops'


Efficiency: 2.3723438914 Gflops

And for the sparse one:


In [46]:
n = 1000
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
rhs = np.random.randn(n)
t = time.time()
a.dot(rhs)
t = time.time() - t
print 'Efficiency:', (3 * n) / t / 10 ** 9, 'Gflops'


Efficiency: 0.0164913656619 Gflops

How to solve linear systems?

How to solve linear systems with sparse matrices?
The direct methods use sparse Gaussian elimination, i.e.
they eliminate variables while trying to keep
the matrix as sparse as possible. And often, the inverse of a sparse matrix is not sparse (but $L$ and $U$ factors can be).


In [55]:
n = 5
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
a = a.todense()
b = np.array(np.linalg.inv(a))
#print b
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
t = np.linspace(0, 1, n)
x, y = np.meshgrid(t, t)
ax.plot_wireframe(x, y, -b)


Out[55]:
<mpl_toolkits.mplot3d.art3d.Line3DCollection at 0x115e30610>

The L and U factors are typically better


In [56]:
p, l, u = scipy.linalg.lu(a)
print l


[[ 1.          0.          0.          0.          0.        ]
 [-0.5         1.          0.          0.          0.        ]
 [-0.         -0.66666667  1.          0.          0.        ]
 [-0.         -0.         -0.75        1.          0.        ]
 [-0.         -0.         -0.         -0.8         1.        ]]

However, for a 2D case everything is much worser:


In [58]:
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
T = scipy.sparse.linalg.splu(A)

In [60]:
plt.spy(T.U, marker='.', markersize=0.4)


Out[60]:
<matplotlib.lines.Line2D at 0x1140d3590>

For two-dimensional case the number of nonzeros in the L factor grows as $\mathcal{O}(N^{3/2})$.

Sparse matrices and graph ordering

The number of non-zeros in LU decomposition has a deep connection to the graph theory.
(I.e., there is an edge between $(i, j)$ if $a_{ij} \ne 0$.


In [61]:
import networkx as nx
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
G = nx.Graph(A)
nx.draw(G, pos=nx.spring_layout(G), node_size=10)


Strategies for elimination

The reordering that minimizes the fill-in is important, so we can you graph theory to find one.

  • Minimum degree ordering - order by the degree of the vertex
  • Cuthill–McKee algorithm (and reverse Cuthill-McKee) -- order for a small bandwidth
  • Nested dissection: split the graph into two with minimal number of vertices on the separator

In [65]:
import networkx as nx
from networkx.utils import reverse_cuthill_mckee_ordering, cuthill_mckee_ordering
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
G = nx.Graph(A)
#rcm = list(reverse_cuthill_mckee_ordering(G))
rcm = list(reverse_cuthill_mckee_ordering(G))
A1 = A[rcm, :][:, rcm]
plt.spy(A1, marker='.', markersize=3)
#p, L, U = scipy.linalg.lu(A1.todense())
#plt.spy(L, marker='.', markersize=0.8)
#nx.draw(G, pos=nx.spring_layout(G), node_size=10)


Out[65]:
<matplotlib.lines.Line2D at 0x116f0d8d0>

The most efficient one is nested dissection algorithm, which finds the ordering recursively by computing the separator. No good implementation was found by me at the moment.

Florida sparse matrix collection

Florida sparse matrix collection which contains all sorts of matrices for different applications. It also allows for finding test matrices as well! We can have a look.


In [66]:
from IPython.display import HTML
HTML('<iframe src=http://yifanhu.net/GALLERY/GRAPHS/search.html width=700 height=450></iframe>')


Out[66]:

Test some

Let us check some sparse matrix (and its LU).


In [67]:
fname = 'crystm02.mat'
!wget http://www.cise.ufl.edu/research/sparse/mat/Boeing/$fname


--2014-10-02 10:32:43--  http://www.cise.ufl.edu/research/sparse/mat/Boeing/crystm02.mat
Resolving www.cise.ufl.edu... 128.227.205.222
Connecting to www.cise.ufl.edu|128.227.205.222|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 797624 (779K) [text/plain]
Saving to: ‘crystm02.mat’

100%[======================================>] 797,624      362KB/s   in 2.2s   

2014-10-02 10:32:46 (362 KB/s) - ‘crystm02.mat’ saved [797624/797624]


In [68]:
from scipy.io import loadmat
import scipy.sparse
q = loadmat(fname)
print q
mat = q['Problem']['A'][0, 0]
T = scipy.sparse.linalg.splu(mat)


{'Problem': array([[ ([u'FEM CRYSTAL FREE VIBRATION MASS MATRIX'], <13965x13965 sparse matrix of type '<type 'numpy.float64'>'
	with 322905 stored elements in Compressed Sparse Column format>, [u'Boeing/crystm02'], [[354]], [u'1995'], [u'R. Grimes'], [u'T. Davis'], [u'materials problem'])]], 
      dtype=[('title', 'O'), ('A', 'O'), ('name', 'O'), ('id', 'O'), ('date', 'O'), ('author', 'O'), ('ed', 'O'), ('kind', 'O')]), '__version__': '1.0', '__header__': 'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sat Sep  6 04:12:47 2008', '__globals__': []}

In [69]:
#Compute its LU
%matplotlib inline
import matplotlib.pyplot as plt
plt.spy(T.L, markersize=0.1)


Out[69]:
<matplotlib.lines.Line2D at 0x1142977d0>

Take home message

  • CSR format for storage
  • Sparse matrices & graphs ordering
  • Ordering is important for LU fill-in

Next lecture

  • Structured matrices: the idea
  • Circulant matrices and the Fast Fourier Transform
  • Toeplitz matrices and shift-invariant structure
  • Displacement structure, solution of linear systems with Toeplitz matrices
  • Toeplitz and low-rank matrices
Questions?

In [134]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[134]: