By an appropriate transformation, most matrices can be made diagonal. The diagonal form of a matrix can be convenient for a number of purposes. For one, it is easy to interpret the action of a diagonal matrix on a vector - it simply rescales the $i$th vector component by the $(i, i)$ component of the matrix. Some operations, such as raising a matrix to a power or computing the determinant, are trivial for diagonal matrices.
Most matrices can be diagonalised. It is shown in the lecture notes that
$$ \boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{-1} $$where the $i$th column of $\boldsymbol{U}$ is the $i$th eigenvector of $\boldsymbol{U}$, and $\boldsymbol{\Lambda}$ is a diagonal matrix where $\Lambda_{ii}$ is the $i$th eigenvalue.
If $\boldsymbol{A}$ is invertible (no zero eigenvalues), then $\boldsymbol{U}^{-1}$ exists. Therefore
$$ \boldsymbol{\Lambda} = \boldsymbol{U}^{-1} \boldsymbol{A} \boldsymbol{U} $$We can explore this property with a concrete examples. We first create a matrix:
In [1]:
# Import NumPy and seed random number generator to make generated matrices deterministic
import numpy as np
np.random.seed(2)
# Create a matrix with random entries
A = np.random.rand(4, 4)
print(A)
We can compute the eigenvectors and eigenvalues using the NumPy function linalg.eig
In [2]:
# Compute eigenvectors of A
evalues, evectors = np.linalg.eig(A)
print("Eigenvalues: {}".format(evalues))
print("Eigenvectors: {}".format(evectors))
The matrix A
is non-symmetric, hence it is no surprise that the eigenvalues and eigenvectors are complex. The $i$th column of evectors
(A[:,i]
) is the $i$th eigenvector.
We can now verify that $\boldsymbol{\Lambda} = \boldsymbol{U}^{-1} \boldsymbol{A} \boldsymbol{U}$
In [3]:
Lambda = np.linalg.inv(evectors).dot(A.dot(evectors))
print(Lambda)
Note that the matrix Lambda
($\boldsymbol{\Lambda}$) is diagonal, and the diagonal entries are the eigenvalues.
In the above, we have only required that the eigenvectors be linearly independent. In this case, the matrix $\boldsymbol{U}$ can be inverted. For a symmetric matrix, we have proved in the lecture notes that the eigenvectors are orthogonal. Therefore, for a real, symmetric matrix $\boldsymbol{S}$, if the eigenvectors have been normalised the matrix $\boldsymbol{U}$ is an orthogonal matrix. In this case
$$ \boldsymbol{S} = \boldsymbol{U} \boldsymbol{\Lambda} \boldsymbol{U}^{T} $$In terms of the notation used for the rotation of matrices, $\boldsymbol{R} = \boldsymbol{U}^{T}$, where in this case $\boldsymbol{R}$ is used to change the basis to one aligned with the eigenvectors:
$$ \boldsymbol{S} = \boldsymbol{R}^{T} \boldsymbol{\Lambda} \boldsymbol{R} $$Since $\boldsymbol{R}$ is an orthogonal matrix,
$$ \boldsymbol{\Lambda} = \boldsymbol{R} \boldsymbol{S} \boldsymbol{R}^{T} $$We can test this for a given matrix:
In [4]:
# Create a symmetric matrix
S = A + A.T
# Compute eigenvectors of S and print eigenvalues
lmbda, U = np.linalg.eig(S)
print(lmbda)
# R matrix
R = U.T
# Diagonalise S
Lambda = R.dot(S.dot(R.T))
print(Lambda)
The above matrix is diagonal, and the diagonal entries are the eigenvalues.
It was shown the lectures that the maximum eigenvalue, and associated eigenvector, can be estimated via repeated matrix-vector products. The method is known as power iterations. Below is sample code for the power iteration.
In [5]:
# Create starting vector
x0 = np.random.rand(S.shape[0])
# Perform power iteration
for i in range(10):
x0 = S.dot(x0)
x0 = x0/np.linalg.norm(x0)
x1 = S.dot(x0)
# Get maxiumum exact eigenvalue (absolute value)
eval_max_index = abs(lmbda).argmax()
max_eig = lmbda[eval_max_index]
# Print estimated max eigenvalue and error
max_eig_est = np.sign(x1.dot(x0))*np.linalg.norm(x1)/np.linalg.norm(x0)
print("Estimate of largest eigenvalue: {}".format(max_eig_est))
print("Error: {}".format(abs(max_eig - max_eig_est)))
After just 10 iterations, the estimated eigenvalue is very accurate.
We now demonstrate a case where the power iteration fails - namely, when the starting vector is orthogonal to the eigenvector associated with the largest eigenvalue. We compute a starting vector that is orthogonal to the eigenvector associated with the largest eigenvector, and then perform power iterations:
In [6]:
# Create starting vector
x0 = np.random.rand(S.shape[0])
# Get eigenvector associated with maxiumum eigenvalue
eval_max_index = abs(lmbda).argmax()
evec_max = U[:,eval_max_index]
# Make starting vector orthogonal to eigenvector associated with maximum
x0 = x0 - x0.dot(evec_max)*evec_max
# Perform power iteration
for i in range(10):
x0 = S.dot(x0)
x0 = x0/np.linalg.norm(x0)
x1 = S.dot(x0)
# Print estimated max eigenvalue and error
max_eig_est = np.sign(x1.dot(x0))*np.linalg.norm(x1)/np.linalg.norm(x0)
print("Estimate of largest eigenvalue: {}".format(max_eig_est))
print("Error: {}".format(abs(max_eig - max_eig_est)))
# Get second largest eigenvalue
print("Second largest eigenvalue (exact): {}".format(lmbda[np.argsort(abs(lmbda))[-2]]))
It is clear that in this case we have approached the second largest eigenvalue.
A better way to compute the largest eigenvalue, given corresponding eigenvectors, is using the Rayleigh quotient. We want to find $\mu$ that minimises
$$ \| \boldsymbol{A} \boldsymbol{x}_{n} - \mu \boldsymbol{x}_{n} \| $$It turns out that
$$ \mu = \frac{\boldsymbol{x}_{n}^{T} \boldsymbol{A} \boldsymbol{x}_{n}}{\boldsymbol{x}_{n}^{T} \boldsymbol{x}_{n}} $$which is known at the Rayleigh quotient. We can use this for the preceding power iteration:
In [7]:
rayleigh_quotient = x1.dot(S).dot(x1)/(x1.dot(x1))
print("Rayleigh_quotient: {}".format(rayleigh_quotient))
In the lecture, we considered the growth rate and composition of a sheep flock over a large number of farming cycles. The equation of interests was
$$ \begin{bmatrix} x_{n+1} \\ y_{n+1} \\ z_{n+1} \end{bmatrix} = \begin{bmatrix} 0 & 2 & 0.9633 \\ 0.545 & 0 & 0 \\ 0 & 0.8 & 0 \end{bmatrix} \begin{bmatrix} x_{n} \\ y_{n} \\ z_{n} \end{bmatrix} $$we can use power iterations to estimate the maximum eigenvalue and the corresponding eigenvector. First we, create the matrix:
In [8]:
A = np.array([[0, 2, 0.9663], [0.545, 0 ,0], [0, 0.8, 0]])
Next, we use power iterations:
In [9]:
# Create starting vector
x0 = np.random.rand(A.shape[0])
# Perform power iteration
for i in range(10):
x0 = A.dot(x0)
x0 = x0/np.linalg.norm(x0)
# Normalise eigenvector using l1 norm
x0 = x0/np.linalg.norm(x0, 1)
# Print estimated eigenvector associated with largest eigenvalue
print("Estimate of eigenvector for the largest eigenvalue: {}".format(x0))
# Print estimated max eigenvalue (using Rayleigh quotient)
print("Estimate of largest eigenvalue: {}".format(x0.dot(A).dot(x0)/x0.dot(x0)))
The computed flock distribution and growth rate are nearly identical to what was computed in the lectures.