Lecture 11: Matrix functions

Syllabus

Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues
Week 4: Singular value decomposition + test + homework seminar
Week 5: Sparse & structured matrices
Week 6: Iterative methods, preconditioners, matrix functions
Week 7: Test week
Week 8: App Period

Recap of the previous lecture

  • Concept of iterative methods
  • Richardson iteration, Chebyshev acceleration
  • Idea of Krylov methods
  • Conjugate gradient methods for symmetric positive definite matrices
  • Generalized minimal residual methods
  • The Zoo of iterative methods: BiCGStab, QMR, IDR, ...
  • The idea of preconditioners
  • Jacobi/Gauss-Seidel methods Incomplete ILU

Today lecture

Today we will talk about matrix functions.

  • What is a matrix function
  • Matrix exponential
  • (Some) applications

Book to read: Nick Higham, Functions of matrices

Where matrix functions come from

  • Solution of PDEs and ODEs (matrix exponential, matrix sine function, matrix cosine function, square root,
    sign function)
  • Matrix exponential has application in localized PageRank, graph centrality
  • Quantum physics
  • Inverse is matrix function as well!

What is a matrix function?

A matrix function is a function of a matrix. A matrix function is not a function, applied to the elements!

I.e., $B = \sqrt{A}$ does not mean that $B_{ij} = \sqrt{A}_{ij}$.


In [3]:
import numpy as np
import scipy.linalg as sla 

A = 2 * np.eye(2)
A = np.array([[4, 4], [4, 4]])
sla.funm(A, lambda x: np.sqrt(x) )


Out[3]:
array([[ 1.41421356,  1.41421356],
       [ 1.41421356,  1.41421356]])

The simplest matrix function: matrix polynomial

It is very easy to define a matrix polynomial as
$$ P(A) = \sum_{k=0}^n c_k A^k. $$ Side-note: Hamilton-Cayley theorem states that $F(A) = 0$ where $F(\lambda) = \det(A - \lambda I)$.

Matrix polynomials as building blocks

We can define a function of the matrix by Taylor series:
$$ f(A) = \sum_{k=0}^{\infty} c_k A^k. $$ The convergence is understood as the convergence in some matrix norm.
Example of such series is the Neumann series
$$ (I - F)^{-1} = \sum_{k=0}^{\infty} F^k, $$ which is well defined for $\rho(F) < 1$.

Matrix exponential series

The most well-known matrix function is matrix exponential. In the scalar case,
$$ e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots = \sum_{k=0}^{\infty} \frac{x^k}{k!}, $$ and it directly translates to the matrix case:
$$ e^A = \sum_{k=0}^{\infty} \frac{A^k}{k!}, $$ the series that always converges (why?).

Why matrix exponential is important

A lot of practical problems are reduced to a system of linear ODEs of the form
$$ \frac{dy}{dt} = Ay, \quad y(0) = y_0. $$ The formal solution is given by $y(t) = e^{At} y_0$, so if we know
$e^{At}$ (or can compute matrix-by-vector product fast) there is a big gain over the
time-stepping schemes.

How to compute matrix functions?

See C. Van Loan, C. Moler, Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later

The simplest way is to diagonalize the matrix:
$$ A = S \Lambda S^{-1}. $$ where the columns of $S$ are eigenvectors of the matrix $A$, then

$$ F(A) = S F(\Lambda) S^{-1}. $$

Problem: diagonalization can be unstable! (and not every matrix is diagonalizable)

Let us look how matrices are diagonalizable:


In [21]:
import numpy as np
eps = 0.5
p = 4
a = np.eye(p)
for i in xrange(p-1):
    a[i, i+1] = 1
    
a[p-1, 2] = eps
a = np.array(a)
val, vec = np.linalg.eig(a)
#print a
print np.linalg.norm(a - vec.dot(np.diag(val)).dot(np.linalg.inv(vec)))
#print 'S * D * S^{-1}:' 
#print vec.dot(np.diag(val)).dot(np.linalg.inv(vec))
print a


2.2360679775
[[ 1.   1.   0.   0. ]
 [ 0.   1.   1.   0. ]
 [ 0.   0.   1.   1. ]
 [ 0.   0.   0.5  1. ]]

Now we can compute a function


In [23]:
import numpy as np
eps = 1e-10
p = 5
a = np.eye(p)
for i in xrange(p-1):
    a[i, i+1] = 1
    
a[p-1, 0] = eps
a = np.array(a)
val, vec = np.linalg.eig(a)
print np.linalg.norm(a - vec.dot(np.diag(val)).dot(np.linalg.inv(vec)))

fun = lambda x: np.exp(x)

#Using diagonalization
fun_diag = vec.dot(np.diag(fun(val))).dot(np.linalg.inv(vec))


#Using Schur
import scipy
fun_m = scipy.linalg.expm(a)
np.linalg.norm(fun_m - fun_diag)


4.84290066277e-08
Out[23]:
1.3563815727960955e-07

How funm function works

The exponential of a matrix is a special function, so there are special methods for its computation. For a general function,
there is a beautiful Schur-Parlett algorithm, which is based on the Schur theorem

Schur-Parlett algorithm

Given a matrix $A$ we want to compute $F(A)$, and we only can evaluate $F$ at scalar points.
First, we reduce $A$ to the triangular form as
$$ A = U T U^*. $$

Therefore, $F(A)=U F(T) U^*$

Computing functions of triangular matrices

We know values on the diagonals:
and $$F_{ii} = F(T_{ii}),$$ and also we know that $$F T = T F,$$ and we get

$$f_{ij} = t_{ij} \frac{f_{ii} - f_{jj}}{t_{ii} - t_{jj}} + \sum_{k=i+1}^{j-1} \frac{f_{ik} t_{kj} - t_{ki}f_{kj}}{t_{ii} - t_{jj}}.$$

Pade approximations

Matrix exponential is well approximated by rational function:

$$\exp(x) \approx \frac{p(x)}{q(x)},$$

where $p(x)$ and $q(x)$ are polynomials and computation of a rational function of a matrix is reduced to matrix-matrix products and matrix inversions.
The rational form is also very useful when only a product of a matrix exponential by vector is needed, since
evaluation reduces to matrix-by-vector products and linear systems solvers

A short demo on Pade approximation via sympy package


In [31]:
import sympy.mpmath
%matplotlib inline
from sympy.mpmath import pade, taylor, polyval
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 128)
a = taylor(sympy.mpmath.exp, 0, 100)
p, q = pade(a, 4, 2)
plt.plot(x, polyval(p[::-1], x)/polyval(q[::-1], x) - np.exp(x))


Out[31]:
[<matplotlib.lines.Line2D at 0x111e40390>]

Application to graph centrality

One of the recent application is the computation. This example was taken from here. Take the following network.


In [32]:
import networkx as nx
import numpy as np
%matplotlib inline
Adj = np.array([[0, 1, 1, 0, 0, 0], 
                [1, 0, 1, 1, 1, 0], 
                [1, 1, 0, 0, 0, 0], 
                [0, 1, 0, 0, 0, 1], 
                [0, 1, 0, 0, 0, 0],
                [0, 0, 0, 1, 0, 0]])

G = nx.from_numpy_matrix(Adj)
nx.draw(G, node_color='y', node_size=1000, with_labels=True )


One measure of the importance of each node is its centrality. We can count the number of paths of different lengths from $i$ to $i$,
and add them up to get centrality of the node $$c(i) = \alpha_1 A_{ii} + \alpha_2 A^2_{ii} + \alpha_3 A^3_{ii} + \cdots,$$ where the coefficients $\alpha_k$ remain to be chosen.
With $\alpha_k = \frac{1}{k!}$ we get
$$ c = \mathrm{diag}(\exp(A)) $$


In [33]:
centralities = np.diag(sla.expm(np.array(Adj, dtype=np.double)))
nodeorder = np.argsort(centralities)[::-1]

print np.array([nodeorder, centralities[nodeorder]])

# Note: This is already built into networkx using the following command
# print nx.communicability_centrality_exp(G)


[[ 1.          0.          2.          3.          4.          5.        ]
 [ 4.44723536  2.86427609  2.86427609  2.36018456  1.71615913  1.59432922]]

Some other matrix functions

  • Sign function: $F(A) = \mathrm{sign}(A)$ (no Taylor series), used for spectrum partititioning problems.
    Iteration to compute sign-function: $$A_{k+1} = \frac{1}{2} (A_k + A^{-1}_k), A_0 = A$$
  • Cosine/sine functions, used to solve $$\frac{d^2 y}{dt^2} = A y$$
  • Square root $\sqrt{A}$, used to solve Lyapunov equations
    $$A X + X A^{\top} = B,$$ and Lyapunov equation is used in control theory, model order reduction and many other things.

Take home message

  • Matrix exponential
  • Schur-Parlett
  • Pade approximation
  • Graphs & PDEs

Next lecture

We are done!
Friday - homework seminar

Questions?

In [29]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[29]:

In [ ]: