Lecture 7: Matrix decompositions and how we compute them

Recap of the previous lecture

  • Eigenvectors and eigenvalues
  • Characteristic polynomial and why it is a bad idea
  • Power method: $x := Ax$
  • Gershgorin theorem
  • Schur theorem: $A = U T U^*$
  • Normal matrices: $A^* A = A A^*$
  • Two advanced topic: pseudospectrum

Today lecture

Today we will talk about matrix factorizations

  • LU decomposition and Gaussian elimination -- already covered
  • QR decomposition and Gram-Schmidt algorithm
  • Schur decomposition and QR-algorithm
  • Few words about the SVD decomposition

We already introduced QR decomposition some time ago, but now we are going to do it in more details.

These are the basic matrix factorizations in numerical linear algebra.

General concept of matrix factorization

In numerical linear algebra we need to solve different tasks, for example:

  • Solve linear systems $Ax = f$
  • Compute eigenvalues / eigenvectors
  • Compute singular values / singular vectors
  • Compute inverses, even sometimes determinants
  • Compute matrix functions like $\exp(A), \cos(A)$ (these are not elementwise functions)

In order to do this, we represent the matrix as a sum and/or product of matrices with simpler structure,
such that we can solve abovementioned tasks faster / in a more stable form.

What is a simpler structure?

What is a simpler structure

We already encountered several classes of matrices with structure.

In dense matrix business, the most important classes are

unitary matrices ($U^* U = I$, why?), upper/lower triangular matrices , diagonal matrices

Other classes of structured matrices

For sparse matrices the sparse constraints are often included in the factorizations.

For Toeplitz matrices an important class of matrices is the class of matrices with low displacement rank,

which is based on the low-rank matrices.

The class of low-rank matrices and block low-rank matrices is everywhere.

Plan

The plan for today's lecture is to discuss the decompositions one-by-one and point out:

  • How to compute a particular decomposition
  • When the decomposition exists
  • What is done in real life (LAPACK).

Decompositions we want to discuss today

  • LU factorization & Cholesky factorization -- quick reminder, already done.
  • QR-decomposition and Gram-Schmidt algorithm
  • 1 slide about the SVD (more tomorrow)

PLU decomposition

Any nonsingular matrix can be factored as

$$A = P L U, $$

where $P$ is a permutation matrix, $L$ is a lower triangular matrix, $U$ is an upper triangular

Main goal of the LU decomposition is to solve linear systems, because

$$ A^{-1} f = (L U)^{-1} f = U^{-1} L^{-1} f, $$

and this reduces to the solution of two linear systems $$ L y = f, \quad U x = y. $$

Positive definite matrices and Cholesky decomposition, reminder

If the matrix is Hermitian Positive Definite,

i.e.

$$A = A^*, \quad (Ax, x) > 0, \quad x \ne 0,$$

then it can be factored as

$$A = RR^*,$$

where $R$ is upper triangular.

We will need this for the QR decomposition.

QR-decomposition

The next decomposition: QR decomposition. Again from the name it is clear that a matrix is represented as a product
$$ A = Q R, $$ where $Q$ is an orthogonal (unitary) matrix and $R$ is upper triangular.

The matrix sizes: $Q$ is $n \times m$, $R$ is $m \times m$ if $n\geq m$.

QR-factorization is defined for any rectangular matrix.

QR-decomposition: applications

This algorithm plays a crucial role in many problems:

  • Computing orthogonal bases in a linear space
  • Used in the preprocessing step for the SVD
  • QR-algorithm for the computation of eigenvectors and eigenvalues (one of the 10 most important algorithms of the 20th century) is based on the QR-decomposition
  • Solving overdetermined systems of linear equations

QR-factorization and least squares

Suppose we need to solve

$$\Vert A x - f \Vert_2 \rightarrow \min_x,$$

where $A$ is $n \times m$, $n \geq m$.

Then we factorize

$$A = Q R,$$

and

$z = R x$, we have

$$\Vert Q z - f \Vert_2 \rightarrow \min, $$

thus

$$z = (Q^* Q)^{-1} Q^* f = Q^* f,$$

and $x$ can be recovered from

$$R x = z.$$

Note that this is a square system of linear equations.

Short side note: randomized least squares

One of the efficient ways to solve really overdetermined ($n\gg m$) system of linear equations is to use Kaczmarz method.

Instead of solving all equations, pick one randomly, which reads

$$a^{\top}_i x = f_i,$$

and given an approximation $x_k$ try to find $x_{k+1}$ as

$$x_{k+1} = \arg \min_x \Vert x - x_k \Vert, \quad \mbox{s. t.} \quad a^{\top}_i x = f_i.$$

a simple analysis gives

$$x_{k+1} = x_k - \frac{(a_i, x_k) - b_i}{(a_i, a_i)} a_i. $$

A cheap update, but the analysis is quite complicated. If time permits, we will cover randomized algorithms of linear algebra in a separate lecture.

Theorem: Every rectangular $n \times m$ matrix, $n \geq m$ matrix has a QR-decomposition.

There are several ways to prove it and compute it:

  • Theoretical: using the Gram matrices and Cholesky factorization
  • Geometrical: using the Gram-Schmidt orthogonalization
  • Practical: using Householder/Givens transformations

Proof using Cholesky decomposition

If we have the representation of the form $$A = QR,$$ then $A^* A = ( Q R)^* (QR) = R^* (Q^* Q) R = R^* R$, the matrix $A^* A$ is called Gram matrix, and its elements are scalar products of the columns of $A$.

Proof using Cholesky decomposition (full column rank)

Assume that $A$ has full column rank. Then, it is simple to show that $A^* A$ is positive definite:

$$ (A^* A y, y) = (Ay, Ay)^2 = \Vert Ay \Vert > 0, \quad y\not = 0. $$

Therefore, $A^* A = R^* R$ always exists.

Then the matrix $A R^{-1}$ is unitary:

$$ (A R^{-1})^* (AR^{-1})= R^{-*} A^* A R^{-1} = R^{-*} R^* R R^{-1} = I. $$

Proof using Cholesky decomposition (rank-deficient case)

When an $n \times m$ matrix does not have full column rank, it is said to be rank-deficient.

The QR-decomposition, however, also exists.

For any rank-deficient matrix there is a sequence of full-column rank matrices $A_k$ such that $A_k \rightarrow A$ (why?).

Each $A_k$ can be decomposed as $A_k = Q_k R_k$.

The set of all unitary matrices is compact, thus there exists a converging subsequence $Q_{n_k} \rightarrow Q$ (why?),

and $Q^* A_k \rightarrow Q^* A = R$, which is triangular.

Stability of QR-decomposition via Cholesky decomposition

So, the simplest way to compute QR-decomposition is then

$$A^* A = R^* R,$$

(Cholesky factorization)

$$Q = A R^{-1}.$$

It is a bad idea for numerical stability. Let us do some demo (for a submatrix of the Hilbert matrix).


In [1]:
import numpy as np
n = 20
r = 8
#a = np.random.randn(n, r)
a = [[1.0/(i+j+0.5) for i in range(r)] for j in range(n)]
a = np.array(a)
q, Rmat = np.linalg.qr(a)
e = np.eye(r)
print 'Built-in QR orth', np.linalg.norm(np.dot(q.T, q) - e)
gram_matrix = a.T.dot(a)
Rmat1 = np.linalg.cholesky(gram_matrix)
q1 = np.dot(a, np.linalg.inv(Rmat1.T))
print 'Via Cholesky:', np.linalg.norm(np.dot(q1.T, q1) - e)


Built-in QR orth 1.11469516053e-15
Via Cholesky: 0.105536980788

Second way: Gram-Schmidt orthogonalization

QR-decomposition is a mathematical way of writing down the Gram-Schmidt orthogonalization process.
Given a sequence of vectors $a_1, \ldots, a_m$ we want to find orthogonal basis $q_1, \ldots, q_m$ such that every $a_i$ is a linear combination of such vectors.

Gram-Schmidt:

  1. $q_1 := a_1/\Vert a_1 \Vert$
  2. $q_2 := a_2 - (a_2, q_1) q_1, \quad q_2 := q_2/\Vert q_2 \Vert$
  3. $q_3 := a_3 - (a_3, q_1) q_1 - (a_3, q_2) q_2, \quad q_2 := q_3/\Vert q_3 \Vert$
  4. And go on

Note that the transformation from $Q$ to $A$ has triangular structure, since from the $k$-th vector we subtract only the previous ones. It follows from the fact that the product of triangular matrices is a triangular matrix.

Modified Gram-Schmidt

Gram-Schmidt can be very unstable (i.e., the produced vectors will be not orthogonal, especially if $q_k$ has small norm).
This is called loss of orthogonality.

There is a remedy, called modified Gram-Schmidt method. Instead of doing

$$q_k := a_k - (a_k, q_1) q_1 - \ldots - (a_k, q_{k-1}) q_{k-1}$$

we do it step-by-step. First we set $q_k := a_k$ and orthogonalize sequentially:

$$ q_k := q_k - (q_k, q_1)q_1, \quad q_k := q_{k} - (q_k,q_2)q_2, \ldots $$

In exact arithmetic, it is the same. In floating point it is absolutely different!

Note that the complexity is $\mathcal{O}(nm^2)$ operations

QR-decomposition: the (almost) practical way

If $A = QR$, then
$$ R = Q^* A, $$ and we need to find a certain orthogonal matrix $Q$ that brings a matrix into orthogonal form.
For simplicity, we will look for an $n \times n$ matrix such that $$ Q^* A = \begin{bmatrix}

  • & & \ 0 & & \ 0 & 0 & * \ &0_{(n-m) \times m} \end{bmatrix} $$

We will do it column-by-column.
First, we find a Householder matrix $H_1 = (I - 2 uu^{\top})$ such that (we illustrate on a $4 \times 3$ matrix)

$$ H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & * & * \\ 0 & * & * \end{bmatrix} $$

Then, $$ H_2 H_1 A = \begin{bmatrix}

* & * & * \\
0 & * & * \\
0 & 0 & * \\
0 & 0 & *

\end{bmatrix}, $$ where $$ H_2 = \begin{bmatrix} 1 & 0 \\ 0 & H'_2, \end{bmatrix} $$ and $H'_2$ is a $3 \times 3$ Householder matrix.

Finally, $$ H_3 H_2 H_1 A = \begin{bmatrix}

* & * & * \\
0 & * & * \\
0 & 0 & * \\
0 & 0 & 0

\end{bmatrix}, $$

You can try to implement it yourself, it is simple.

QR-decomposition: real life

In reality, since this is a dense matrix factorization, you should implement the algorithm in terms of blocks.

Instead of using Householder transformation, we use block Householder transformation of the form

$$H = (I - 2UU^*), $$

where $U^* U = I$.

This allows us to use BLAS-3 operations.

Rank-revealing QR-decomposition

The QR-decomposition can be also used to compute the (numerical) rank of the matrix, see

Rank-Revealing QR Factorizations and the Singular Value Decomposition, Y. P. Hong; C.-T. Pan

It is done via so-called rank-revealing factorization.

It is based on the representation

$$P A = Q R, $$

where $P$ is the permutation matrix (it permutes columns), and $R$ has the block form

$$R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22}\end{bmatrix}.$$

The goal is to find $P$ such that the norm of $R_{22}$ is small, so you can find the numerical rank by looking at it.

An estimate is $\sigma_{r+1} \leq \Vert R_{22} \Vert_2$ (check why).

Summary

LU and QR decompositions can be computed using direct methods in finite amount of operations.

What about Schur form and SVD?

They can not be computed by direct methods (why?) they can only be computed by iterative methods.

Although iterative methods still have the same $\mathcal{O}(n^3)$ complexity in floating point arithmetic thanks to fast convergence.

Schur form

Recall that every matrix can be written in the Schur form

$$A = Q T Q^*,$$

with triangular $T$ and unitary $Q$ and this decomposition gives eigenvalues of the matrix (they are on the diagonal of $T$).

The first and the main algorithm for computing the Schur form is the QR-algorithm.

QR-algorithm

The QR algorithm (Kublanovskaya and Francis independently proposed it in 1961).

Do not **mix** QR algorithm and QR decomposition!

QR-decomposition is the representation of a matrix, whereas QR algorithm uses QR decomposition to compute the eigenvalues!

Way to QR-algorithm

Consider the equation

$$A = Q T Q^*,$$

and rewrite it in the form

$$ Q T = A Q. $$

On the left we can see QR-factorization of the matrix $AQ$.

We can use this to derive fixed-point iteration for the Schur form, also known as QR-algorithm.

Derivation of the QR-algorithm as fixed-point iteration

Then we can write down the iterative process

$$ Q_{k+1} R_{k+1} = A Q_k, \quad Q_{k+1}^* A = R_{k+1} Q^*_k $$

Introduce

$$A_k = Q^* _k A Q_k = Q^*_k Q_{k+1} R_{k+1} = \widehat{Q}_k R_{k+1}$$

and the new approximation reads

$$A_{k+1} = Q^*_{k+1} A Q_{k+1} = ( Q_{k+1}^* A = R_{k+1} Q^*_k) = R_{k+1} \widehat{Q}_k.$$

So we arrive at the standard form of the QR algorithm:

The final formulas are then written in the classical QRRQ-form:

  1. Start from $A_0 = A$.
  2. Compute QR factorization of $A_k = Q_k R_k$.
  3. Set $A_{k+1} = R_k Q_k$.

Iterate until $A_k$ is triangular enough (e.g. norm of subdiagonal part is small enough).

What about the convergence and complexity

Statement

Matrices $A_k$ are unitary similar to $A$

$A_k = Q^*_{k-1} A_{k-1} Q_{k-1} = (Q_{k-1} \ldots Q_1)^* A (Q_{k-1} \ldots Q_1)$

and the product of unitary matrices is a unitary matrix.

Complexity of each step is $\mathcal{O}(n^3)$, if a general QR-decomposition is done.

Our hope is that $A_k$ will be very close to the triangular matrix for suffiently large $k$.


In [5]:
import numpy as np
n = 4
a = [[1.0/(i + j + 0.5) for i in range(n)] for j in range(n)]
niters = 100
for k in range(niters):
    q, rmat = np.linalg.qr(a)
    a = rmat.dot(q)
print 'Leading 3x3 block of a:'
print a[:3, :3]


Leading 3x3 block of a:
[[  2.41052440e+000   3.22429032e-017  -5.75075656e-017]
 [  1.55227759e-084   3.49984625e-001   5.16395464e-017]
 [  1.66750853e-220   4.85294271e-137   1.53236733e-002]]

Convergence and complexity of the QR-algorithm

The convergence of the QR-algorithm is from the largest eigenvalues to the smallest.

At least 2-3 iterations is needed for an eigenvalue.

Each step is one QR-factorization and one matrix-by-matrix product. $\mathcal{O}(n^3)$ complexity.

It means, $\mathcal{O}(n^4)$ complexity?

Fortunately, not.

We can also speedup the QR-algorithm by using shifts, since $A_k - \lambda I$ has the same Schur vectors.

We will discuss these details tomorrow.

A few words about the SVD

Last but not least: the singular value decomposition of matrices.

$$A = U \Sigma V^*.$$

We can compute it via eigendecomposition of

$$A^* A = U^* \Sigma^2 U,$$

but it is a bad idea (c.f. Gram matrices)

We will discuss methods for computing SVD tomorrow.

Summary of today's lecture

  • QR decomposition and Gram-Schmidt algorithm, reduction to a simpler form by Householder transformations
  • Schur decomposition and QR-algorithm

Next lecture

  • Efficient implementation of QR algorithm, convergence properties
  • Efficient computation of the SVD: 4 algorithms
  • It is done by reduction to simpler forms (tridiagonal, bidiagonal).
  • Applications of the SVD.
Questions?

In [6]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[6]: