A matrix is a two-dimensional table. Here is an example of a $3 \times 3$ matrix \begin{equation} A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \end{equation}
A vector is a $n \times 1$ vector (there are row and column vectors).
Many physical models are formulated as linear equations:
which is based on the fact that if the change is small, everything can be approximated by a linear function: $$ f(x + \delta x) \approx f(x) + \delta x. $$ Of course, nonlinearities may come into play, but even in this case the numerical methods linearize the problem around the current approximation. But matrices and linear dependence also play important role in data analysis as well.
We generate a sequence of random points in two dimensions, and setup a rotation matrix $A$.
In [57]:
%matplotlib inline
import matplotlib.pyplot as plt
import prettyplotlib as ppl
from sklearn.decomposition import PCA
import numpy as np
P = 1000
points = np.random.randn(P,2)
A = [[2, 1], [0, 1]]
A = np.array(A)
ppl.plot(points[:, 0], points[:, 1], ls='', marker='o')
Out[57]:
We can also plot the rotated points: look at how they are "skewed".
In [32]:
R = np.dot(points, A)
ppl.plot(R[:, 0], R[:, 1], ls='', marker='o')
Out[32]:
We can also rotate them back by finding principal components, which is equivalent to singular value decomposition (SVD). We will talk about SVD later, now we just call it from numpy.linalg module. As we can see, the points are back to symmetric distribution, so we have recovered the matrix.
Principal component analysis is used for many applications, in particular, for the classification problem: after the rotation, the points, belonging to to different classes, can be visually separated.
In [40]:
%matplotlib inline
u, s, v = np.linalg.svd(R, full_matrices=False)
unrotated = R.dot(v.T)
ppl.plot(unrotated[:, 0], unrotated[:, 1], ls='', marker='o')
Out[40]:
The linear models and the factors may have a real physical meaning.
One of the most interesting illustrations is the cocktail party problem, which is defined as follows.
We have a set of sources $x(t)$ (people talking) and a set of microphones.
At each microphone we record a linear mixture:
$$
y = A x(t) + \eta(t),
$$
where $\eta(t)$ is some noise.
We do not know $A$ and want to recover it.
Consider composition of two linear operators:
Then, $z = Ay = A B x = C x$, where $C$ is the matrix-by-matrix product.
A product of an $n \times k$ matrix $A$ and a $k \times m$ matrix $B$ is a $n \times m$ matrix $C$ with the elements
$$
c_{ij} = \sum_{s=1}^k a_{is} b_{sj}, \quad i = 1, \ldots, n, \quad j = 1, \ldots, m
$$
Complexity of a naive algorithm for MM is $\mathcal{O}(n^3)$.
Matrix-by-matrix product is the core for almost all efficient algorithms in linear algebra. Basically, all the NLA algorithms are reduced to a sequence of matrix-by-matrix products, so efficient implementation of MM reduces the complexity of numerical algorithms by the same factor. However, implementing MM is not easy at all!
In [16]:
import numpy as np
def matmul(a, b):
n = a.shape[0]
k = a.shape[1]
m = b.shape[1]
c = np.zeros((n, m))
for i in xrange(n):
for j in xrange(m):
for s in xrange(k):
c[i, j] += a[i, s] * b[s, j]
load_ext cythonmagic
makes it possible to use %%cython
magic for an automatic compilation of the Cython code.
Cython is a Python compiler with additional directives that allow to specify the type of some variables. In many cases it may lead to significant speedup.
For other possibilities you may look at the following comparison
In [13]:
%load_ext cythonmagic
In [18]:
%%cython
import numpy as np
def cython_matmul(double [:, :] a, double[:, :] b):
cdef int n = a.shape[0]
cdef int k = a.shape[1]
cdef int m = b.shape[1]
cdef int i
cdef int j
cdef int s
c = np.zeros((n, m))
cdef double[:, :] cview = c
for i in xrange(n):
for j in xrange(m):
for s in xrange(k):
c[i, j] += a[i, s] * b[s, j]
return c
Then we just time three different routines. Guess the answer.
In [19]:
n = 100
a = np.random.randn(n, n)
b = np.random.randn(n, n)
%timeit c = matmul(a, b)
%timeit cf = cython_matmul(a, b)
%timeit c = np.dot(a, b)
Why it is so?
There are two important issues:
We need to reduce the number of read/write operations! This is typically achieved in efficient implementations of the BLAS libraries, one of which (Intel MKL) we now use.
The main difference is the number of operations vs. the number of data!
Remark: a quest for $\mathcal{O}(n^2)$ matrix-by-matrix multiplication algorithm is not yet done.
Strassen gives $\mathcal{O}(n^{2.78...})$
World record $\mathcal{O}(n^{2.37})$ Reference
The constant is unfortunately too big to make it practical!
Break the matrix into blocks! ($2 \times 2$ is an illustration) $$ A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}, \quad B = \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} $$
Then, $$ A B = \begin{bmatrix} A_{11} B_{11} + A_{12} B_{21} & A_{11} B_{12} + A_{12} B_{22} \\ A_{21} B_{11} + A_{22} B_{21} & A_{21} B_{12} + A_{22} B_{22} \end{bmatrix} $$
If $A_{11}, B_{11}$ and their product fit into the cache memory (which is 1024 Kb for the Haswell Intel Chip), then we load them only once into the memory.
In [10]:
## This demo requires Anaconda distribution to be installed
import mkl
import numpy as np
n = 1000
a = np.random.randn(n)
mkl.set_num_threads(1)
%timeit a + a
mkl.set_num_threads(2)
%timeit a + a
In [11]:
## This demo requires Anaconda distribution to be installed
import mkl
n = 500
a = np.random.randn(n, n)
mkl.set_num_threads(1)
%timeit a.dot(a)
mkl.set_num_threads(2)
%timeit a.dot(a)
Typically, two cases are distinguished:
In both cases, the efficiency is governed by a bandwidth:
I.e., for BLAS-1 routines (like sum of two vectors) reads/writes take all the time.
For BLAS-3 routines, the speedup can be obtained that is more noticable.
For large-scale clusters (>100 000 cores, see the Top500 list) there is still scaling.
In [64]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[64]:
In [ ]: