Lecture 4: Linear systems


Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues

Recap of the previous lecture

  • Scalar product
  • Unitary/orthogonal matrices and norm conservation
  • Matrix rank definition
  • Skeleton approximation and dyadic representation of a rank-$r$ matrix
  • Singular value decomposition

Today lecture

Today we will talk about:

  • Linear systems, inverse matrices
  • Gaussian elimination
  • Sparse matrices
  • Condition numbers

Before that I will spend a few minutes on certain techinal notes on the HW1.

Use numpy arrays in homework!

  • Use

      a = np.zeros((n, n)) #Note for the double brackets

    instead of

      a = [[0 for i in xrange(n)] for j in xrange(n)]

    In this case you can not add arrays like

      a = a + a

    and have to write down functions/cycles instead one line

  • Use numpy arrays indexing like a[i, j] instead of a[i][j]

  • Use %matplotib inline and plt.plot() (without plt.show()) in the Notebook
  • For code writing, there is a PEP8 style guide for Python code - try to follow

Motivational video on algebra

And before we start - a very simple and intuitive video on algebra. It also contains a short ove

In [2]:
from IPython.display import YouTubeVideo


Linear equations and matrices

A linear system of equations can be written in the form \begin{equation} \begin{split} &2 y + 3 x = 5 \longrightarrow \quad &3x + 2 y + 0 z = 5\\ &2 x + 3z = 5 \longrightarrow\quad &2 x + 0 y + 3 z = 5\\ &x + y = 2 \longrightarrow\quad & 1 x + 1 y + 0 z = 2\\ \end{split} \end{equation}

Matrix form

$$ \begin{pmatrix} 3 & 2 & 0 \\ 2 & 0 & 3 \\ 1 & 1 & 0 \\ \end{pmatrix}\begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} 5 \\ 5 \\ 2 \end{pmatrix} $$

The "matrix form" is $$ A x = f, $$

where $A$ is a $3 \times 3$ matrix.

Linear systems in applications

Linear systems are everywhere - in modelling, even if you have nonlinear systems, by linearization they are reduced to a sequence of linear systems. They appear in different applications:

  • Circuit modelling (Kirchoffs law)
  • Photonics (Maxwell equation, electrodynamics)
  • Computational fluid dynamics (Navier-Stokes equation)
  • +$\infty$ more

Linear systems are big

We take a continious problem, discretize on a mesh with $N$ elements and get a linear system.
Example of a mesh around A319 aircraft (taken from GMSH website).
The main difficulty is that these systems are big: millions or billions of unknowns!

Linear systems are structured

Storing $N^2$ elements of a matrix is prohibitive even for $N = 100000$. How to work with such matrices?
Fortunately, those matrices are structured and require $\mathcal{O}(N)$ parameters to be stored.
The most widespread structure are sparse matrices: in the matrix there are $\mathcal{O}(N)$ non-zeros!
Example (one of the famous matrices around for $n = 5$): $$ \begin{pmatrix} 2 & -1 & 0 & 0 & 0 \\ -1 & 2 & -1 & 0 & 0 \\ 0 & -1 & 2 & -1 & 0 \\ 0 & 0 &-1& 2 & -1 \\ 0 & 0 & 0 & -1 & 2 \\ \end{pmatrix} $$ At least you can store such matrices, and multiply by vector fast; but how to solve linear systems (and that is typically the final goal).

How to solve linear systems

Important: forget about determinants and the Cramer rule (it is are good for $2 \times 2$ matrices still).

How to solve linear systems

The main tool is variable elimination. \begin{equation} \begin{split} &2 y + 3 x = 5 \longrightarrow \quad &y = 5/2 - 3/2 x \\ &2 x + 3z = 5 \longrightarrow\quad &z = 5/3 - 2/3 x\\ &x + y = 2 \longrightarrow\quad & 5/2 + 5/3 - (3/2 + 2/3) x = 2,\\ \end{split} \end{equation} and that is how you find $x$ (and all previous ones).
This process is called Gaussian elimination and is one of the most widely used algorithms.

Gaussian elimination

Gaussian elimination consists of two steps:

  1. Forward step
  2. Backward step

Forward step

In the forward step, we eliminate $x_1$: $$ x_1 = f_1 - (a_{12} x_2 + \ldots + a_{1n} x_n)/a_{11}, $$ and then substitute this into the equations $2, \ldots, n$. Then we eliminate $x_2$ and so on from the second equation. The important thing is that the pivots (that we divide over) are not equal to $0$.

Backward step

In the backward step, we solve equation for $x_n$, put it into the equation for $x_{n-1}$ and so on, until we compute all $x_i, i=1,\ldots, n$.

Complexity of the Gaussian elimination

Each elimination step requires $\mathcal{O}(n^2)$ operations. Thus, the cost of the naive algorithm is $\mathcal{O}(n^3)$.
Think a little bit: Can Strassen help here?

A short demo

Now we can do a short demo (with a Hilbert matrix)

In [8]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

n = 500
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.ones(n) #Right-hand side
x = np.linalg.solve(a, rhs)

#And check if everything is fine
er = np.linalg.norm(a.dot(x) - rhs) / np.linalg.norm(rhs)
print er

<matplotlib.text.Text at 0x10fb80ed0>

As you see, the error grows with larger $n$, and we have to find out why.
Important point is that it is not a problem of the algorithm: it is a problem of representing
the matrix in the memory. The error occurs in the moment when the matrix elements are evaluated approximately.

Linear systems and inverse matrix

What was the problem in the previous example? Why the error grows so quickly?
And here is one of the main concepts of numerical linear algebra: the concept of condition number of a matrix.
But before that we have to define the inverse.

The inverse of a matrix $A$ is defined as a matrix $X$ denoted by $A^{-1}$ such that
$$ AX = XA = I, $$ where $I$ is the identity matrix (i.e., $I_{ij} = 0$ if $i \ne j$ and $1$ otherwise). The computation of the inverse is linked to the solution of linear systems. Indeed, $i$-th column of the product gives
$$ A x_i = e_i, $$ where $e_i$ is the $i$-th column of the identity matrix. Thus, we can apply Gaussian elimination to solve this system. Moreover, if there are no divisions by zero in this process (and the pivots do not depend on the right-hand side), the it is possible to solve the system.

Inverse matrix and linear systems

If we have computed $A^{-1}$, the solution of linear system
$$Ax = f$$ is just $x = A^{-1} f$.
$$ A(A^{-1} f) = (AA^{-1})f = I f = f. $$

Neumann series

To study, why there can be such big errors in a solution (see the example above on the Hilbert matrix) we need an important auxilary result: Neumann series:
If for a matrix $\Vert F \Vert < 1$ then the matrix $(I - F)$ is invertible and $$(I - F)^{-1} = I + F + F^2 + F^3 + \ldots = \sum_{k=0}^{\infty} F^k.$$ Note that it is a matrix version of the geometric progression.


The proof is constructive. First of all, the show prove that the series $\sum_{k=0}^{\infty} F^k$ converges.
Like in the scalar case, we have
$$ (I - F) \sum_{k=0}^N F^k = (I - F^{N+1}) \rightarrow I $$ We can also estimate the norm of the inverse: $$ \Vert \sum_{k=0}^N F^k \Vert \leq \sum_{k=0}^N \Vert F \Vert^k \Vert I \Vert = \frac{\Vert I \Vert}{I - \Vert F \Vert} $$

Small perturbation of the inverse

Using this result, we can estimate, how the perturbation of the matrix influences the inverse matrix. We assume that the perturbatin $E$ is small in the sense that $\Vert A^{-1} E \Vert < 1$. Then $$(A + E)^{-1} = \sum_{k=0}^{\infty} (-A^{-1} E)^k A^{-1}$$ and moreover, $$ \frac{\Vert (A + E)^{-1} - A^{-1} \Vert}{\Vert A^{-1} \Vert} \leq \frac{\Vert A^{-1} \Vert \Vert E \Vert}{1 - \Vert A^{-1} E \Vert}. $$ As you see, the norm of the inverse enters the estimate.

Condition number of a linear system

Now consider the perturbed linear system: $$ (A + \Delta A) \widehat{x} = f + \Delta f. $$ Then the algebra begins!


$\widehat{x} - x = (A + \Delta A)^{-1} (f + \Delta f) - A^{-1} f = ((A + \Delta A)^{-1} - A^{-1})f + (A + \Delta A)^{-1} \Delta f = $
$ = \Big[\sum_{k=0}^{\infty} (-A^{-1} \Delta A)^k\Big] A^{-1} f + \Big[\sum_{k=0}^{\infty} (A^{-1} \Delta A)^k \Big] A^{-1} \Delta f,$
therefore $\frac{\Vert \widehat{x} - x \Vert}{\Vert x \Vert} \leq \frac{\Vert A \Vert \Vert A^{-1} \Vert}{1 - \Vert A^{-1}\Delta A\Vert}\Big(\frac{\Vert\Delta A\Vert}{\Vert A \Vert} + \frac{\Vert \Delta f \Vert}{ \Vert f \Vert}\Big)$.

The crucial role is played by the condition number $\mathrm{cond}(A) = \Vert A \Vert \Vert A^{-1} \Vert$.
The larger the condition number, the less number of digits we can recover.

The spectral norm of the matrix is equal to largest singular value, and the singular values of the inverse matrix are equal to the inverses of the singular values. Thus, the condition number is equal to the ratio of the largest singular value and the smallest singular value. $$ \mathrm{cond(A)} = \Vert A \Vert \Vert A^{-1} \Vert = \frac{\sigma_{\max}}{\sigma_{\min}} $$

Hilbert matrix (again)

We can also try to test how tight is the estimate, both with ones in the right-hand side, and with a random vector in the right-hand side. The results are strickingly different

In [10]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

n = 100
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.ones(n) #Right-hand side
f = np.linalg.solve(a, rhs)

#And check if everything is fine
er = np.linalg.norm(a.dot(f) - rhs) / np.linalg.norm(rhs)
cn = np.linalg.cond(a)
print 'Error:', er, 'Condition number:', cn

Error: 2.81450551264e-08 Condition number: 3.33972881047e+19

And with random right-hand side...

In [14]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

n = 500
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.random.randn(n) #Right-hand side
f = np.linalg.solve(a, rhs)

#And check if everything is fine
er = np.linalg.norm(a.dot(f) - rhs) / np.linalg.norm(rhs)
cn = np.linalg.cond(a)
print 'Error:', er, 'Condition number:', cn

Error: 292.997678103 Condition number: 3.5650102203e+20

Can you think about an explanation?

Todays lecture

  • Gaussian elimination
  • Condition number

In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)