From the title, it is clear, that LU factorization is the representation of a given matrix $A$ as a product
of a lower triangular matrix $L$ and an upper triangular matrix $U$.
The matrix $L$ has to be non-singular with ones on the diagonal, and the matrix $U$ has to be non-singular.
Main goal of the LU decomposition is to solve linear systems, because $$ A^{-1} f = (L U)^{-1} f = U^{-1} L^{-1} f, $$ and this reduces to the solution of a linear system $$ L y = f, $$ and $$ U x = y. $$ Solving linear systems with triangular matrices is easy: you just solve the equation with one unknown, put into the second equation, and so on. The computational cost of such algorithm in $\mathcal{O}(n^2)$ arithmetic operations, once $LU$ factorization is known.
Note that the inverse of triangular matrix is a triangular matrix.
In [39]:
import numpy as np
import sympy
from sympy.matrices import Matrix
import IPython
sympy.init_printing(use_latex=True)
n = 5
w = Matrix(n, n, lambda i, j: 1/(i + j + sympy.Integer(1)/2))
L, U, tmp = w.LUdecomposition()
#Generate the final expression
expr = (L + U)
fn = '%s \\times %s = %s' % (sympy.latex(L), sympy.latex(U), sympy.latex(w))
IPython.display.Math(fn)
Out[39]:
A simple implementation of the LU factorization can be done in 3 cycles:
In [40]:
L = Matrix(n, n, lambda i, j: 0)
U = Matrix(n, n, lambda i, j: 0)
a = w.copy() #Our matrix
for k in xrange(n): #Eliminate one row
L[k, k] = 1
for i in xrange(k+1, n):
L[i, k] = a[i, k] / a[k, k]
for j in xrange(k+1, n):
a[i, j] = a[i, j] - L[i, k] * a[k, j]
for j in xrange(k, n):
U[k, j] = a[k, j]
In [41]:
U
Out[41]:
Proof. If there is a LU-decomposition, then the matrix is strictly regular -- this follows from the fact that to get a minor you multiply a corresponding submatrix of $L$ by a corresponding submatrix of $U$, and they are non-singular (since non-singularity of triangular matrices is equivalent to the fact that their diagonal elements are not equal to zero).
The other way can be proven by induction. Suppose that we know that for all $(n-1) \times (n-1)$ matrices will non-singular minors LU-decomposition exists.
Then, consider the block form $$ A = \begin{bmatrix} a & c^{\top} \\ b & D \end{bmatrix}, $$ where $D$ is $(n-1) \times (n-1)$.
Then we do "block elimination":
$$
\begin{bmatrix}
1 & 0 \\
-z & I
\end{bmatrix}
\begin{bmatrix}
a & c^{\top} \\
b & D
\end{bmatrix}=
\begin{bmatrix}
a & c^{\top} \\
0 & A_1
\end{bmatrix},
$$
where $z = \frac{b}{a}, \quad A_1 = D - \frac{1}{a} b c^{\top}$.
We can show that $A_1$ is also strictly regular, thus it has (by induction) the LU-decomposition:
$$
A_1 = L_1 U_1,
$$
And then $$ L = \begin{bmatrix} 1 & 0 \\ z & L_1 \end{bmatrix}, \quad U = \begin{bmatrix} a & c^{\top} \\ 0 & U_1 \end{bmatrix} $$ we get $LA = U$ and $L$ and $U$ have the necessary triangular structure.
What happens, if the matrix is not strictly regular (or the pivots in the Gaussian elimination are really small?). There is classical $2 \times 2$ example of a bad LU decomposition. The matrix we look at is
$$
A = \begin{pmatrix}
\varepsilon & 1 \\
1 & 1
\end{pmatrix}
$$
If $\varepsilon$ is sufficiently small, we might fail.
In [47]:
import numpy as np
eps = 1.1 * 1e-16
a = [[eps, 1],[1, 1]]
a = np.array(a)
a0 = a.copy()
n = a.shape[0]
L = np.zeros((n, n))
U = np.zeros((n, n))
for k in xrange(n): #Eliminate one row
L[k, k] = 1
for i in xrange(k+1, n):
L[i, k] = a[i, k] / a[k, k]
for j in xrange(k+1, n):
a[i, j] = a[i, j] - L[i, k] * a[k, j]
for j in xrange(k, n):
U[k, j] = a[k, j]
print 'L * U - A:'
print np.dot(L, U) - a0
print L
We can do pivoting, i.e. permute rows and columns to maximize $A_{kk}$ that we divide over.
The simplest but effective strategy is the row pivoting: at each step, select the index that is maximal in modulus, and put it onto the diagonal.
It gives us the decomposition
$$A = P L U,$$where $P$ is a permutation matrix.
What makes pivoting good?
It is made good by the fact that
but the elements of $U$ can grow, up to $2^n$! (in practice, this is very rarely encountered).
In this case, the LU-factorization always exists and moreover, it can be written in the form of Cholesky factorization:
$$
A = R R^*
$$
And we always can select pivots on the diagonal!
In [26]:
import numpy as np
import scipy as sp
from scipy.sparse import csc_matrix, linalg as sla
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
#A = csc_matrix([[1,2,0,4],[1,0,0,1],[1,0,2,1],[2,2,1,0.]])
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csc');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].spy(A)
ax[0].set_title('Sparsity pattern of a matrix')
lu = sla.splu(A)
ax[1].spy(lu.L)
ax[1].set_title('Sparsity of an L factor')
Out[26]:
In [8]:
Out[8]:
This algorithm plays a crucial role in many problems:
Every rectangular $n \times m$ matrix, $n \geq m$ matrix has a QR-decomposition. There are several ways to prove it and compute it.
It is simple to show that $A^* A$ is positive definite:
$$
(A^* A y, y) = (Ay, Ay) = \Vert Ay \Vert^2 \geq 0.
$$
Therefore, $A^* A = R^* R$ always exists.
Then the matrix $A R^{-1}$ is orthogonal:
$$
(A R^{-1})^* (AR^{-1})= R^{-*} A^* A R^{-1} = R^{-*} R^* R R^{-1} = I.
$$
Gram matrices are not good for numerical stability!
Let us show that on random and Hilbert matrices :) The Gram-matrix method may lead to the loss of orthogonality.
In [50]:
import numpy as np
n = 10
r = 5
#a = np.random.randn(n, r)
a = [[1.0/(i+j+0.5) for i in xrange(r)] for j in xrange(n)]
a = np.array(a)
q, Rmat = np.linalg.qr(a)
e = np.eye(r)
print 'Built-in QR orth', np.linalg.norm(np.dot(q.T, q) - e)
gram_matrix = a.T.dot(a)
Rmat1 = np.linalg.cholesky(gram_matrix)
q1 = np.dot(a, np.linalg.inv(Rmat1.T))
print 'Via Gram matrix:', np.linalg.norm(np.dot(q1.T, q1) - e)
QR-decomposition is a mathematical way of writing down the Gram-Schmidt orthogonalization process.
Given a sequence of vectors $a_1, \ldots, a_m$ we want to find orthogonal basis $q_1, \ldots, q_m$ such that every $a_i$ is a linear combination of such vectors.
Gram-Schmidt:
Gram-Schmidt can be very unstable (i.e., the produced vectors will be not orthogonal, especially if $q_k$ has small norm).
This is called loss of orthogonality.
There is a remedy, called modified Gram-Schmidt method.
Instead of doing
$$q_k := a_k - (a_k, q_1) q_1 - \ldots - (a_k, q_{k-1}) q_{k-1}$$
we do it step-by-step. First we set $q_k := a_k$ and orthogonalize sequentially:
$$
q_k := q_k - (q_k, q_1)q_1, \quad q_k := q_{k} - (q_k,q_2)q_2, \ldots
$$
In exact arithmetic, it is the same. In floating point it is absolutely different!
Note that the complexity in $\mathcal{O}(nm^2)$ operations
If $A = QR$, then
$$
R = Q^* A,
$$
and we need to find a certain orthogonal matrix $Q$ that brings a matrix into orthogonal form.
For simplicity, we will look for an $n \times n$ matrix such that
$$
Q^* A = \begin{bmatrix}
We will do it column-by-column.
First, we find a Householder matrix $H_1 = (I - 2 uu^{\top})$ such that (we illustrate on a $4 \times 3$ matrix)
Then, $$ H_2 H_1 A = \begin{bmatrix}
* & * & * \\
0 & * & * \\
0 & 0 & * \\
0 & 0 & *
\end{bmatrix}, $$ where $$ H_2 = \begin{bmatrix} 1 & 0 \\ 0 & H'_2, \end{bmatrix} $$ and $H'_2$ is a $3 \times 3$ Householder matrix.
Finally, $$ H_3 H_2 H_1 A = \begin{bmatrix}
* & * & * \\
0 & * & * \\
0 & 0 & * \\
0 & 0 & 0
\end{bmatrix}, $$
A simple implementation of the Householder method in Python looks as follows.
In [28]:
#Initialization
import numpy as np
n = 4
r = 3
a = np.random.randn(n, r)
k = 0
In [31]:
def householder(x):
u = x.copy()
u = u / np.linalg.norm(u)
u[0] = u[0] - 1 #Or plus
u = u / np.linalg.norm(u)
u = np.reshape(u, (-1, 1))
return u
x = a[:, k:k+1]
u0 = householder(a[k:, k:k+1])
u = np.zeros((n, 1))
u[k:] = u0
H = np.eye(n) - 2 * u.dot(u.T)
a = H.T.dot(a)
print a
k = k + 1
The QR algorithm (Kublanovskaya and Francis independently proposed it in 1961). Do not mix QR algorithm and QR decomposition. QR-decomposition is the representation of a matrix, whereas QR algorithm uses QR decomposition computes the eigenvalues!
Computation of the eigenvectors and eigenvalues of a matrix.
In [32]:
import numpy as np
n = 4
a = [[1.0/(i + j + 0.5) for i in xrange(n)] for j in xrange(n)]
niters = 10
for k in xrange(niters):
q, rmat = np.linalg.qr(a)
a = rmat.dot(q)
print 'current a:'
print a
In [51]:
n = 100
a = np.random.randn(n, n)
%timeit np.linalg.qr(a)
%timeit scipy.linalg.lu(a)
In [34]:
%timeit l, u = np.linalg.cholesky?
In [35]:
import scipy
In [137]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[137]: