Lecture 11: Matrix functions

Syllabus

Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues
Week 4: Singular value decomposition + test + homework seminar
Week 5: Sparse & structured matrices
Week 6: Iterative methods, preconditioners, matrix functions
Week 7: Test week
Week 8: App Period

Recap of the previous lecture

Concept of iterative methods
Richardson iteration, Chebyshev acceleration
Idea of Krylov methods
Conjugate gradient methods for symmetric positive definite matrices
Generalized minimal residual methods
The Zoo of iterative methods: BiCGStab, QMR, IDR, ...
The idea of preconditioners
Jacobi/Gauss-Seidel methods Incomplete ILU

Today lecture

Today we will talk about matrix functions.

What is a matrix function
Matrix exponential
(Some) applications

Book to read: Nick Higham, Functions of matrices

Where matrix functions come from

Solution of PDEs and ODEs (matrix exponential, matrix sine function, matrix cosine function, square root,
sign function)
Matrix exponential has application in localized PageRank, graph centrality
Quantum physics
Inverse is matrix function as well!

What is a matrix function?

A matrix function is a function of a matrix. A matrix function is not a function, applied to the elements!

I.e., $B = \sqrt{A}$ does not mean that $B_{ij} = \sqrt{A}_{ij}$.



In [3]:

    
import numpy as np
import scipy.linalg as sla 

A = 2 * np.eye(2)
A = np.array([[4, 4], [4, 4]])
sla.funm(A, lambda x: np.sqrt(x) )









    Out[3]:





array([[ 1.41421356,  1.41421356],
       [ 1.41421356,  1.41421356]])

The simplest matrix function: matrix polynomial

It is very easy to define a matrix polynomial as
$$ P(A) = \sum_{k=0}^n c_k A^k. $$ Side-note: Hamilton-Cayley theorem states that $F(A) = 0$ where $F(\lambda) = \det(A - \lambda I)$.

Matrix polynomials as building blocks

We can define a function of the matrix by Taylor series:
$$ f(A) = \sum_{k=0}^{\infty} c_k A^k. $$ The convergence is understood as the convergence in some matrix norm.
Example of such series is the Neumann series
$$ (I - F)^{-1} = \sum_{k=0}^{\infty} F^k, $$ which is well defined for $\rho(F) < 1$.

Matrix exponential series

The most well-known matrix function is matrix exponential. In the scalar case,
$$ e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots = \sum_{k=0}^{\infty} \frac{x^k}{k!}, $$ and it directly translates to the matrix case:
$$ e^A = \sum_{k=0}^{\infty} \frac{A^k}{k!}, $$ the series that always converges (why?).

Why matrix exponential is important

A lot of practical problems are reduced to a system of linear ODEs of the form
$$ \frac{dy}{dt} = Ay, \quad y(0) = y_0. $$ The formal solution is given by $y(t) = e^{At} y_0$, so if we know
$e^{At}$ (or can compute matrix-by-vector product fast) there is a big gain over the
time-stepping schemes.

How to compute matrix functions?

See C. Van Loan, C. Moler, Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later

The simplest way is to diagonalize the matrix:
$$ A = S \Lambda S^{-1}. $$ where the columns of $S$ are eigenvectors of the matrix $A$, then

$$ F(A) = S F(\Lambda) S^{-1}. $$

Problem: diagonalization can be unstable! (and not every matrix is diagonalizable)

Let us look how matrices are diagonalizable:



In [21]:

    
import numpy as np
eps = 0.5
p = 4
a = np.eye(p)
for i in xrange(p-1):
    a[i, i+1] = 1
    
a[p-1, 2] = eps
a = np.array(a)
val, vec = np.linalg.eig(a)
#print a
print np.linalg.norm(a - vec.dot(np.diag(val)).dot(np.linalg.inv(vec)))
#print 'S * D * S^{-1}:' 
#print vec.dot(np.diag(val)).dot(np.linalg.inv(vec))
print a









    



2.2360679775
[[ 1.   1.   0.   0. ]
 [ 0.   1.   1.   0. ]
 [ 0.   0.   1.   1. ]
 [ 0.   0.   0.5  1. ]]

Now we can compute a function



In [23]:

    
import numpy as np
eps = 1e-10
p = 5
a = np.eye(p)
for i in xrange(p-1):
    a[i, i+1] = 1
    
a[p-1, 0] = eps
a = np.array(a)
val, vec = np.linalg.eig(a)
print np.linalg.norm(a - vec.dot(np.diag(val)).dot(np.linalg.inv(vec)))

fun = lambda x: np.exp(x)

#Using diagonalization
fun_diag = vec.dot(np.diag(fun(val))).dot(np.linalg.inv(vec))


#Using Schur
import scipy
fun_m = scipy.linalg.expm(a)
np.linalg.norm(fun_m - fun_diag)









    



4.84290066277e-08






    Out[23]:





1.3563815727960955e-07

How funm function works

The exponential of a matrix is a special function, so there are special methods for its computation. For a general function,
there is a beautiful Schur-Parlett algorithm, which is based on the Schur theorem

Schur-Parlett algorithm

Given a matrix $A$ we want to compute $F(A)$, and we only can evaluate $F$ at scalar points.
First, we reduce $A$ to the triangular form as
$$ A = U T U^*. $$

Therefore, $F(A)=U F(T) U^*$

Computing functions of triangular matrices

We know values on the diagonals:
and $$F_{ii} = F(T_{ii}),$$ and also we know that $$F T = T F,$$ and we get

$$f_{ij} = t_{ij} \frac{f_{ii} - f_{jj}}{t_{ii} - t_{jj}} + \sum_{k=i+1}^{j-1} \frac{f_{ik} t_{kj} - t_{ki}f_{kj}}{t_{ii} - t_{jj}}.$$

Pade approximations

Matrix exponential is well approximated by rational function:

$$\exp(x) \approx \frac{p(x)}{q(x)},$$

where $p(x)$ and $q(x)$ are polynomials and computation of a rational function of a matrix is reduced to matrix-matrix products and matrix inversions.
The rational form is also very useful when only a product of a matrix exponential by vector is needed, since
evaluation reduces to matrix-by-vector products and linear systems solvers

A short demo on Pade approximation via sympy package



In [31]:

    
import sympy.mpmath
%matplotlib inline
from sympy.mpmath import pade, taylor, polyval
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 128)
a = taylor(sympy.mpmath.exp, 0, 100)
p, q = pade(a, 4, 2)
plt.plot(x, polyval(p[::-1], x)/polyval(q[::-1], x) - np.exp(x))









    Out[31]:





[<matplotlib.lines.Line2D at 0x111e40390>]

Application to graph centrality

One of the recent application is the computation. This example was taken from here. Take the following network.



In [32]:

    
import networkx as nx
import numpy as np
%matplotlib inline
Adj = np.array([[0, 1, 1, 0, 0, 0], 
                [1, 0, 1, 1, 1, 0], 
                [1, 1, 0, 0, 0, 0], 
                [0, 1, 0, 0, 0, 1], 
                [0, 1, 0, 0, 0, 0],
                [0, 0, 0, 1, 0, 0]])

G = nx.from_numpy_matrix(Adj)
nx.draw(G, node_color='y', node_size=1000, with_labels=True )

One measure of the importance of each node is its centrality. We can count the number of paths of different lengths from $i$ to $i$,
and add them up to get centrality of the node $$c(i) = \alpha_1 A_{ii} + \alpha_2 A^2_{ii} + \alpha_3 A^3_{ii} + \cdots,$$ where the coefficients $\alpha_k$ remain to be chosen.
With $\alpha_k = \frac{1}{k!}$ we get
$$ c = \mathrm{diag}(\exp(A)) $$



In [33]:

    
centralities = np.diag(sla.expm(np.array(Adj, dtype=np.double)))
nodeorder = np.argsort(centralities)[::-1]

print np.array([nodeorder, centralities[nodeorder]])

# Note: This is already built into networkx using the following command
# print nx.communicability_centrality_exp(G)









    



[[ 1.          0.          2.          3.          4.          5.        ]
 [ 4.44723536  2.86427609  2.86427609  2.36018456  1.71615913  1.59432922]]

Some other matrix functions

Sign function: $F(A) = \mathrm{sign}(A)$ (no Taylor series), used for spectrum partititioning problems.
Iteration to compute sign-function: $$A_{k+1} = \frac{1}{2} (A_k + A^{-1}_k), A_0 = A$$
Cosine/sine functions, used to solve $$\frac{d^2 y}{dt^2} = A y$$
Square root $\sqrt{A}$, used to solve Lyapunov equations
$$A X + X A^{\top} = B,$$ and Lyapunov equation is used in control theory, model order reduction and many other things.

Take home message

Matrix exponential
Schur-Parlett
Pade approximation
Graphs & PDEs

Next lecture

We are done!
Friday - homework seminar

Questions?



In [29]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[29]:



In [ ]: