Lecture 11 - diagonalising matrices

By an appropriate transformation, most matrices can be made diagonal. The diagonal form of a matrix can be convenient for a number of purposes. For one, it is easy to interpret the action of a diagonal matrix on a vector - it simply rescales the $i$th vector component by the $(i, i)$ component of the matrix. Some operations, such as raising a matrix to a power or computing the determinant, are trivial for diagonal matrices.

Diagonalisation

Most matrices can be diagonalised. It is shown in the lecture notes that

$$ \boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{-1} $$

where the $i$th column of $\boldsymbol{U}$ is the $i$th eigenvector of $\boldsymbol{U}$, and $\boldsymbol{\Lambda}$ is a diagonal matrix where $\Lambda_{ii}$ is the $i$th eigenvalue.

If $\boldsymbol{A}$ is invertible (no zero eigenvalues), then $\boldsymbol{U}^{-1}$ exists. Therefore

$$ \boldsymbol{\Lambda} = \boldsymbol{U}^{-1} \boldsymbol{A} \boldsymbol{U} $$

We can explore this property with a concrete examples. We first create a matrix:


In [1]:
# Import NumPy and seed random number generator to make generated matrices deterministic
import numpy as np
np.random.seed(2)

# Create a matrix with random entries
A = np.random.rand(4, 4)
print(A)


[[0.4359949  0.02592623 0.54966248 0.43532239]
 [0.4203678  0.33033482 0.20464863 0.61927097]
 [0.29965467 0.26682728 0.62113383 0.52914209]
 [0.13457995 0.51357812 0.18443987 0.78533515]]

We can compute the eigenvectors and eigenvalues using the NumPy function linalg.eig


In [2]:
# Compute eigenvectors of A
evalues, evectors = np.linalg.eig(A)

print("Eigenvalues: {}".format(evalues))
print("Eigenvectors: {}".format(evectors))


Eigenvalues: [1.60005805+0.j         0.07591753+0.13825201j 0.07591753-0.13825201j
 0.42090561+0.j        ]
Eigenvectors: [[ 0.45781186+0.j         -0.33035733-0.48998366j -0.33035733+0.48998366j
  -0.66963039+0.j        ]
 [ 0.48624863+0.j         -0.61697886+0.j         -0.61697886-0.j
  -0.08306861+0.j        ]
 [ 0.54605694+0.j         -0.01299484+0.11318301j -0.01299484-0.11318301j
  -0.44437871+0.j        ]
 [ 0.50575922+0.j          0.48202011+0.15746263j  0.48202011-0.15746263j
   0.58925572+0.j        ]]

The matrix A is non-symmetric, hence it is no surprise that the eigenvalues and eigenvectors are complex. The $i$th column of evectors (A[:,i]) is the $i$th eigenvector.

We can now verify that $\boldsymbol{\Lambda} = \boldsymbol{U}^{-1} \boldsymbol{A} \boldsymbol{U}$


In [3]:
Lambda = np.linalg.inv(evectors).dot(A.dot(evectors))
print(Lambda)


[[ 1.60005805e+00-2.85887993e-17j -1.18657001e-16+1.99707178e-16j
  -1.19787281e-16-1.89699473e-16j  1.68390188e-16+7.12642078e-18j]
 [ 3.22689803e-16+3.82717446e-16j  7.59175252e-02+1.38252015e-01j
   7.63278329e-17-1.94289029e-16j  3.11594203e-16+6.87454656e-17j]
 [ 3.01511645e-16-5.14917906e-16j  4.85722573e-17+2.08166817e-16j
   7.59175252e-02-1.38252015e-01j  3.66885645e-16-2.73318111e-17j]
 [-4.27202655e-16-1.22305447e-17j -1.11865415e-16-1.60527316e-17j
  -1.27399364e-16+2.63251409e-17j  4.20905607e-01-8.21244250e-18j]]

Note that the matrix Lambda ($\boldsymbol{\Lambda}$) is diagonal, and the diagonal entries are the eigenvalues.

Diagonalisation of symmetric matrices

In the above, we have only required that the eigenvectors be linearly independent. In this case, the matrix $\boldsymbol{U}$ can be inverted. For a symmetric matrix, we have proved in the lecture notes that the eigenvectors are orthogonal. Therefore, for a real, symmetric matrix $\boldsymbol{S}$, if the eigenvectors have been normalised the matrix $\boldsymbol{U}$ is an orthogonal matrix. In this case

$$ \boldsymbol{S} = \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{T} $$

In terms of the notation used for the rotation of matrices, $\boldsymbol{R} = \boldsymbol{U}^{T}$, where in this case $\boldsymbol{R}$ is used to change the basis to one aligned with the eigenvectors:

$$ \boldsymbol{S} = \boldsymbol{R}^{T} \boldsymbol{\Lambda} \boldsymbol{R} $$

Since $\boldsymbol{R}$ is an orthogonal matrix,

$$ \boldsymbol{\Lambda} = \boldsymbol{R} \boldsymbol{S} \boldsymbol{R}^{T} $$

We can test this for a given matrix:


In [4]:
# Create a symmetric matrix
S = A + A.T

# Compute eigenvectors of S and print eigenvalues
lmbda, U = np.linalg.eig(S)
print(lmbda)

# R matrix
R = U.T

# Diagonalise S
Lambda = R.dot(S.dot(R.T))
print(Lambda)


[ 3.26926434  0.9944455   0.19425012 -0.11236256]
[[ 3.26926434e+00 -8.72873947e-17  4.75656397e-18 -5.79016057e-17]
 [-2.81794239e-16  9.94445502e-01  1.51350239e-18  1.14273841e-16]
 [-1.04638202e-16 -1.38293617e-17  1.94250119e-01  1.17155540e-17]
 [ 5.23814681e-17  1.09491250e-16 -2.67482965e-17 -1.12362557e-01]]

The above matrix is diagonal, and the diagonal entries are the eigenvalues.

Power iteration

It was shown the lectures that the maximum eigenvalue, and associated eigenvector, can be estimated via repeated matrix-vector products. The method is known as power iterations. Below is sample code for the power iteration.


In [5]:
# Create starting vector
x0 = np.random.rand(S.shape[0])

# Perform power iteration
for i in range(10):
    x0 = S.dot(x0)
    x0 = x0/np.linalg.norm(x0)
x1 = S.dot(x0)

# Get maxiumum exact eigenvalue (absolute value)
eval_max_index = abs(lmbda).argmax()
max_eig = lmbda[eval_max_index]

# Print estimated max eigenvalue and error 
max_eig_est = np.sign(x1.dot(x0))*np.linalg.norm(x1)/np.linalg.norm(x0)
print("Estimate of largest eigenvalue: {}".format(max_eig_est))
print("Error: {}".format(abs(max_eig - max_eig_est)))


Estimate of largest eigenvalue: 3.26926434315769
Error: 3.1603608618979706e-11

After just 10 iterations, the estimated eigenvalue is very accurate.

We now demonstrate a case where the power iteration fails - namely, when the starting vector is orthogonal to the eigenvector associated with the largest eigenvalue. We compute a starting vector that is orthogonal to the eigenvector associated with the largest eigenvector, and then perform power iterations:


In [6]:
# Create starting vector
x0 = np.random.rand(S.shape[0])

# Get eigenvector associated with maxiumum eigenvalue
eval_max_index = abs(lmbda).argmax()
evec_max = U[:,eval_max_index]

# Make starting vector orthogonal to eigenvector associated with maximum 
x0 = x0 - x0.dot(evec_max)*evec_max

# Perform power iteration
for i in range(10):
    x0 = S.dot(x0)
    x0 = x0/np.linalg.norm(x0)
x1 = S.dot(x0)

# Print estimated max eigenvalue and error
max_eig_est = np.sign(x1.dot(x0))*np.linalg.norm(x1)/np.linalg.norm(x0)
print("Estimate of largest eigenvalue: {}".format(max_eig_est))
print("Error: {}".format(abs(max_eig - max_eig_est)))   

# Get second largest eigenvalue
print("Second largest eigenvalue (exact): {}".format(lmbda[np.argsort(abs(lmbda))[-2]]))


Estimate of largest eigenvalue: 0.9944455023193194
Error: 2.2748188408699743
Second largest eigenvalue (exact): 0.9944455023193202

It is clear that in this case we have approached the second largest eigenvalue.

Computing the eigenvalue

A better way to compute the largest eigenvalue, given corresponding eigenvectors, is using the Rayleigh quotient. We want to find $\mu$ that minimises

$$ \| \boldsymbol{A} \boldsymbol{x}_{n} - \mu \boldsymbol{x}_{n} \| $$

It turns out that

$$ \mu = \frac{\boldsymbol{x}_{n}^{T} \boldsymbol{A} \boldsymbol{x}_{n}}{\boldsymbol{x}_{n}^{T} \boldsymbol{x}_{n}} $$

which is known at the Rayleigh quotient. We can use this for the preceding power iteration:


In [7]:
rayleigh_quotient = x1.dot(S).dot(x1)/(x1.dot(x1))
print("Rayleigh_quotient: {}".format(rayleigh_quotient))


Rayleigh_quotient: 0.9944455023193196

Sheep flock example

In the lecture, we considered the growth rate and composition of a sheep flock over a large number of farming cycles. The equation of interests was

$$ \begin{bmatrix} x_{n+1} \\ y_{n+1} \\ z_{n+1} \end{bmatrix} = \begin{bmatrix} 0 & 2 & 0.9633 \\ 0.545 & 0 & 0 \\ 0 & 0.8 & 0 \end{bmatrix} \begin{bmatrix} x_{n} \\ y_{n} \\ z_{n} \end{bmatrix} $$

we can use power iterations to estimate the maximum eigenvalue and the corresponding eigenvector. First we, create the matrix:


In [8]:
A = np.array([[0, 2, 0.9663], [0.545, 0 ,0], [0, 0.8, 0]])

Next, we use power iterations:


In [9]:
# Create starting vector
x0 = np.random.rand(A.shape[0])

# Perform power iteration
for i in range(10):
    x0 = A.dot(x0)
    x0 = x0/np.linalg.norm(x0)

# Normalise eigenvector using l1 norm
x0 = x0/np.linalg.norm(x0, 1)

# Print estimated eigenvector associated with largest eigenvalue
print("Estimate of eigenvector for the largest eigenvalue: {}".format(x0))

# Print estimated max eigenvalue (using Rayleigh quotient)
print("Estimate of largest eigenvalue: {}".format(x0.dot(A).dot(x0)/x0.dot(x0)))


Estimate of eigenvector for the largest eigenvalue: [0.56772176 0.26294235 0.1693359 ]
Estimate of largest eigenvalue: 1.2101959188653684

The computed flock distribution and growth rate are nearly identical to what was computed in the lectures.