Use
a = np.zeros((n, n)) #Note for the double brackets
instead of
a = [[0 for i in xrange(n)] for j in xrange(n)]
In this case you can not add arrays like
a = a + a
and have to write down functions/cycles instead one line
Use numpy arrays indexing like a[i, j]
instead of a[i][j]
%matplotib inline
and plt.plot()
(without plt.show()
) in the Notebook
In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('8CX-Q0gtSp8')
Out[2]:
A linear system of equations can be written in the form \begin{equation} \begin{split} &2 y + 3 x = 5 \longrightarrow \quad &3x + 2 y + 0 z = 5\\ &2 x + 3z = 5 \longrightarrow\quad &2 x + 0 y + 3 z = 5\\ &x + y = 2 \longrightarrow\quad & 1 x + 1 y + 0 z = 2\\ \end{split} \end{equation}
Linear systems are everywhere - in modelling, even if you have nonlinear systems, by linearization they are reduced to a sequence of linear systems. They appear in different applications:
We take a continious problem, discretize on a mesh with $N$ elements and get a linear system.
Example of a mesh around A319 aircraft
(taken from GMSH website).
Storing $N^2$ elements of a matrix is prohibitive even for $N = 100000$. How to work with such matrices?
Fortunately, those matrices are structured and require $\mathcal{O}(N)$ parameters to be stored.
The most widespread structure are sparse matrices: in the matrix there are $\mathcal{O}(N)$ non-zeros!
Example (one of the famous matrices around for $n = 5$):
$$
\begin{pmatrix}
2 & -1 & 0 & 0 & 0 \\
-1 & 2 & -1 & 0 & 0 \\
0 & -1 & 2 & -1 & 0 \\
0 & 0 &-1& 2 & -1 \\
0 & 0 & 0 & -1 & 2 \\
\end{pmatrix}
$$
At least you can store such matrices, and multiply by vector fast; but how to solve linear systems (and that is typically the final goal).
The main tool is variable elimination.
\begin{equation}
\begin{split}
&2 y + 3 x = 5 \longrightarrow \quad &y = 5/2 - 3/2 x \\
&2 x + 3z = 5 \longrightarrow\quad &z = 5/3 - 2/3 x\\
&x + y = 2 \longrightarrow\quad & 5/2 + 5/3 - (3/2 + 2/3) x = 2,\\
\end{split}
\end{equation}
and that is how you find $x$ (and all previous ones).
This process is called Gaussian elimination and is one of the most widely used algorithms.
In the forward step, we eliminate $x_1$: $$ x_1 = f_1 - (a_{12} x_2 + \ldots + a_{1n} x_n)/a_{11}, $$ and then substitute this into the equations $2, \ldots, n$. Then we eliminate $x_2$ and so on from the second equation. The important thing is that the pivots (that we divide over) are not equal to $0$.
In [8]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 500
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.ones(n) #Right-hand side
x = np.linalg.solve(a, rhs)
#And check if everything is fine
er = np.linalg.norm(a.dot(x) - rhs) / np.linalg.norm(rhs)
print er
plt.xkcd()
plt.plot(x)
plt.title('Yea')
Out[8]:
As you see, the error grows with larger $n$, and we have to find out why.
Important point is that it is not a problem of the algorithm: it is a problem of representing
the matrix in the memory. The error occurs in the moment when the matrix elements are evaluated approximately.
The inverse of a matrix $A$ is defined as a matrix $X$ denoted by $A^{-1}$ such that
$$
AX = XA = I,
$$
where $I$ is the identity matrix (i.e., $I_{ij} = 0$ if $i \ne j$ and $1$ otherwise).
The computation of the inverse is linked to the solution of linear systems. Indeed, $i$-th column of the product gives
$$
A x_i = e_i,
$$
where $e_i$ is the $i$-th column of the identity matrix. Thus, we can apply Gaussian elimination to solve this system. Moreover, if there are no divisions by zero in this process (and the pivots do not depend on the right-hand side), the it is possible to solve the system.
To study, why there can be such big errors in a solution (see the example above on the Hilbert matrix) we need an important auxilary result: Neumann series:
If for a matrix $\Vert F \Vert < 1$ then the matrix $(I - F)$ is invertible and
$$(I - F)^{-1} = I + F + F^2 + F^3 + \ldots = \sum_{k=0}^{\infty} F^k.$$
Note that it is a matrix version of the geometric progression.
The proof is constructive. First of all, the show prove that the series $\sum_{k=0}^{\infty} F^k$ converges.
Like in the scalar case, we have
$$
(I - F) \sum_{k=0}^N F^k = (I - F^{N+1}) \rightarrow I
$$
We can also estimate the norm of the inverse:
$$
\Vert \sum_{k=0}^N F^k \Vert \leq \sum_{k=0}^N \Vert F \Vert^k \Vert I \Vert = \frac{\Vert I \Vert}{I - \Vert F \Vert}
$$
Using this result, we can estimate, how the perturbation of the matrix influences the inverse matrix. We assume that the perturbatin $E$ is small in the sense that $\Vert A^{-1} E \Vert < 1$. Then $$(A + E)^{-1} = \sum_{k=0}^{\infty} (-A^{-1} E)^k A^{-1}$$ and moreover, $$ \frac{\Vert (A + E)^{-1} - A^{-1} \Vert}{\Vert A^{-1} \Vert} \leq \frac{\Vert A^{-1} \Vert \Vert E \Vert}{1 - \Vert A^{-1} E \Vert}. $$ As you see, the norm of the inverse enters the estimate.
$\widehat{x} - x = (A + \Delta A)^{-1} (f + \Delta f) - A^{-1} f = ((A + \Delta A)^{-1} - A^{-1})f + (A + \Delta A)^{-1} \Delta f = $
$ = \Big[\sum_{k=0}^{\infty} (-A^{-1} \Delta A)^k\Big] A^{-1} f + \Big[\sum_{k=0}^{\infty} (A^{-1} \Delta A)^k \Big] A^{-1} \Delta f,$
therefore
$\frac{\Vert \widehat{x} - x \Vert}{\Vert x \Vert} \leq \frac{\Vert A \Vert \Vert A^{-1} \Vert}{1 - \Vert A^{-1}\Delta A\Vert}\Big(\frac{\Vert\Delta A\Vert}{\Vert A \Vert} + \frac{\Vert \Delta f \Vert}{ \Vert f \Vert}\Big)$.
The crucial role is played by the condition number $\mathrm{cond}(A) = \Vert A \Vert \Vert A^{-1} \Vert$.
The larger the condition number, the less number of digits we can recover.
The spectral norm of the matrix is equal to largest singular value, and the singular values of the inverse matrix are equal to the inverses of the singular values. Thus, the condition number is equal to the ratio of the largest singular value and the smallest singular value. $$ \mathrm{cond(A)} = \Vert A \Vert \Vert A^{-1} \Vert = \frac{\sigma_{\max}}{\sigma_{\min}} $$
In [10]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.ones(n) #Right-hand side
f = np.linalg.solve(a, rhs)
#And check if everything is fine
er = np.linalg.norm(a.dot(f) - rhs) / np.linalg.norm(rhs)
cn = np.linalg.cond(a)
print 'Error:', er, 'Condition number:', cn
And with random right-hand side...
In [14]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 500
a = [[1.0/(i + j + 1) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
rhs = np.random.randn(n) #Right-hand side
f = np.linalg.solve(a, rhs)
#And check if everything is fine
er = np.linalg.norm(a.dot(f) - rhs) / np.linalg.norm(rhs)
cn = np.linalg.cond(a)
print 'Error:', er, 'Condition number:', cn
Can you think about an explanation?
Todays lecture
In [1]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[1]: