Primer on Linear Algebra


In [1]:
# The line below enables plots inside the notebook
%matplotlib inline
from __future__ import division  # division always returns a float!
import numpy as np
import matplotlib.pyplot as plt  # standard way to import matplotlib
import seaborn as sns    # standard way to import seaborn

Linear algebra is the branch of mathematics concerning vector spaces and linear mappings between such spaces. Linear algebra is also about solving systems of linear equations including numerical methods to solve those systems.

A matrix is a rectangular array of numbers. The dimension of a matrix is determined by the number of rows and columns. A matrix of $n$ rows and $m$ columns is said to have dimensions $n × m$. Each entity inside a matrix is call an element of that matrix. Elements are referenced using two subscripts, hence the element $a_{ij}$ lies in the $i^{th}$ row and $j^{th}$ column.

An array of numbers arranged in a single column is called a column vector, or simply a vector. If the numbers are set out in a row, the term row vector is used. Thus, a column vector is a matrix of dimensions $n \times 1$, and a row vector can be viewed as a matrix of dimensions $1 \times n$. As you can see vectors are just a special case of matrices.

A particular important matrix is the square matrix, which has the same number of rows and columns. In such a matrix, the elements $a_{ij}$ for $i=j$ ($a_{00}$, $a_{01}$, $a_{02}$... $a_{nn}$) are called the diagonal elements of the matrix. The rest of the elements, i.e those for which $i \neq j$ are called the off-diagonal elements. Most people start counting elements from 1. Instead, and to keep it Pythonic, we will start counting from 0.

A common notation is to use boldface uppercase letters for matrices and boldface lowercase letters for vectors.

$$ \mathbf{A} = \begin{bmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \\ a_{20} & a_{21} & a_{22} \end{bmatrix} \mathbf{b} = \begin{bmatrix} a_{0} \\ a_{1} \\ a_{2} \end{bmatrix} $$

In NumPy we can use the array object to work with mathematical matrices. Altough as we will explore soon a NumPy array does not behaves exactly as a mathematical matrix.


In [2]:
A = np.arange(9).reshape(3,3)
b = np.array([[0, 1, 2]])
c = np.array([0, -1, 2])

print A
print b
print c


[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[0 1 2]]
[ 0 -1  2]

NumPy array have dimensions defined in the same way as a matrix.


In [3]:
A.shape, b.shape, c.shape


Out[3]:
((3, 3), (1, 3), (3,))

Trace

The trace of square matrix is the sum of the elements on the principal diagonal.

$$ \mathbf{A}=\begin{bmatrix} A_{00} & A_{01} & \cdots & A_{0n} \\ A_{10} & A_{11} & \cdots & A_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ A_{n0} & A_{n1} & \cdots & A_{nn} \\ \end{bmatrix} $$$$ Tr(\mathbf{A}) = A_{00} + A_{11} + \dots + A_{nn}=\sum_{i=0}^{n} A_{ii} $$

For our purposes the trace has no special meaning. It is introduced here, solely, because it is a useful concept to simplify other matrix calculations.


In [5]:
print A
np.trace(A)


[[0 1 2]
 [3 4 5]
 [6 7 8]]
Out[5]:
12

Determinants

Determinants is another scalar that can be obtained from a square matrix.

The determinant of a matrix is zero if the rows of that matrix are linearly dependent. Linearly dependent, means that it is possible to write any row as a linear function of the other rows. For example:

$$\det(\mathbf{A}) = \begin{pmatrix} 1 & 2 \\ 3 & 6 \end{pmatrix} = 0$$

Because, the second row is the first one multiplied by 3 (or the first row is the second one multiplied by $\frac{1}{3}$).


In [6]:
B = np.array([[1,2], [3, 6]])
np.linalg.det(B)


Out[6]:
0.0

The determinant can be used, for example, to find out if if a system on linear equations has a unique solution (non-zero determinant), otherwise the are no solution or more than one solution (zero determinant).

Historically, determinants predates linear algebra by more than a century. And for a long time the study of linear algebra had a strong focus on dealing with determinants (and computing them by hand!). Now, with the extensive use of numerical methods to solve problems in linear algebra, determinants has become somehow less important than before. Still determinants are very important from a theoretical point of view.

Inverse

The inverse of a matrix $A$ (noted as $A^{-1}$) is the matrix that when multiplied by $A$ gives the indentity matrix

$$A \cdot A^{-1} = I$$

where:

$I = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}$

Only square matrices with non-zero determinant are invertible. A not invertible matrix is called singular. Singular matrices does not occur very often, a square matrix constructed by taking its elements from a continuous uniform distribution will rarely be singular.

In geometrical terms the inverse matrix $A^{-1}$ reverses the stretching and rotating accomplished by the matrix, $A$.


In [6]:
C = np.random.rand(3,3)
C_inv = np.linalg.inv(C)
print C_inv


[[-0.45752658  4.63544076 -7.42088672]
 [ 2.39217988 -4.25027667 -0.62773568]
 [-0.40608589  0.19955427  5.10514455]]

Now we can check that the above computation was correct by multiplying the matrix and its inverse (we will se more on matrix multiplication soon). Notice that the off-diagonal elements are zero (at least for all practical purposes).


In [7]:
np.dot(C, C_inv)


Out[7]:
array([[  1.00000000e+00,   2.22044605e-16,   0.00000000e+00],
       [  5.55111512e-17,   1.00000000e+00,   0.00000000e+00],
       [  0.00000000e+00,  -1.38777878e-17,   1.00000000e+00]])

Transpose

The transpose $\mathbf{A}^T$ of an $m \times n$ matrix $\mathbf{A}$ is the $n \times m$ matrix obtained from interchanging the rows and columns of $\mathbf{A}$

$$ \mathbf{A} = \begin{bmatrix} a_{00} & a_{01} & \cdots & a_{0n} \\ a_{10} & a_{11} & \cdots & a_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m0} & a_{m1} & \cdots & a_{mn} \\ \end{bmatrix} $$$$\mathbf{A}^T = \begin{bmatrix} a_{00} & a_{01} & \cdots & a_{0m} \\ a_{10} & a_{11} & \cdots & a_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n0} & a_{n1} & \cdots & a_{nm} \\ \end{bmatrix}$$

In [14]:
A.T  # equivalent to np.transpose(A)


Out[14]:
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

Matrix equality

Two matrices are equal if they have the same dimensions and their corresponding elements are equal. Two check if two NumPy arrays are equal you have to use the _arrayequal function.


In [16]:
np.array_equal(A, C)


Out[16]:
False

For two arrays the == operator returns and array of booleans, if both arrays have the same dimensions as in


In [17]:
A == C


Out[17]:
array([[False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

Otherwise the result is just a simple boolean, True or False.


In [18]:
A == [1,2]


Out[18]:
False

But you can also get and array of boolean even if the dimensions do not agree


In [19]:
A == 1


Out[19]:
array([[False,  True, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

In [20]:
A == [0,1,2]


Out[20]:
array([[ True,  True,  True],
       [False, False, False],
       [False, False, False]], dtype=bool)

This is possible because NumPy broadcast (stretch) the smaller array across the larger array so that they have compatible shapes. Two dimensions are compatible when:

  1. They are equal, or
  2. One of them is 1

Comparing floating point numbers for equality can be tricky. Computers have finite memory to represents numbers and when they are manipulated, truncation and round errors are introduced. In general, is a good idea to compare if two floats are equal within certain tolerance. If we need to compare arrays of floats we could use the function allclose instead of the function _arrayequal

Matrix and vector adition

The usual matrix addition is defined for two matrices of the same dimensions. The sum of two $m × n$ matrices $\mathbf{A}$ and $\mathbf{A}$, denoted by $\mathbf{A} + \mathbf{A}$, is again an $m × n$ matrix computed by adding corresponding elements:

$$ \mathbf{A}+\mathbf{B} = \begin{bmatrix} a_{00} & a_{01} & \cdots & a_{0n} \\ a_{10} & a_{11} & \cdots & a_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m0} & a_{m1} & \cdots & a_{mn} \\ \end{bmatrix} + \begin{bmatrix} b_{00} & b_{01} & \cdots & b_{0n} \\ b_{10} & b_{11} & \cdots & b_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{m0} & b_{m1} & \cdots & b_{mn} \\ \end{bmatrix} $$$$\mathbf{A}+\mathbf{B} = \begin{bmatrix} a_{00} + b_{00} & a_{01} + b_{01} & \cdots & a_{0n} + b_{0n} \\ a_{10} + b_{10} & a_{11} + b_{11} & \cdots & a_{1n} + b_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m0} + b_{m0} & a_{m1} + b_{m1} & \cdots & a_{mn} + b_{mn} \\ \end{bmatrix} $$

for example:


In [9]:
A+A


Out[9]:
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [10]:
A-A


Out[10]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

CAUTION: Matrix addition (and subtraction) is defined only for matrices with the same dimensionality. Nevertheless, within NumPy is possible to add arrays with different dimensionality. This is possible because of broadcasting as we see when we were comparing arrays for equality.

For example in mathematical notation you can write $\mathbf{A} + 2$, but this is a shorthand (or an abuse of notation) for writing:

$$\begin{bmatrix} a_{00} & a_{01} & \cdots & a_{0n} \\ a_{10} & a_{11} & \cdots & a_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m0} & a_{m1} & \cdots & a_{mn} \\ \end{bmatrix} + \begin{bmatrix} 2 & 0 & \cdots & 0 \\ 0 & 2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 2 \\ \end{bmatrix} $$

Instead NumPy Broadcast the scalar to match the shape of the array.


In [11]:
A + 2


Out[11]:
array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10]])

In [24]:
A + b


Out[24]:
array([[ 0,  2,  4],
       [ 3,  5,  7],
       [ 6,  8, 10]])

Vector addition and forces

Vectors can be seen as arrow in an n-dimensional space starting at the origin. Thus, vectors can be used so represent forces. A classic example of the application of vector addition is computing the resultant force when more than one force is applied to an object.


In [15]:
f1 = np.array([1, 5])
f2 = np.array([6, 4])
f3 = f1 + f2
print 'f3 = %s' % f3

for f, color in zip([f1, f2, f3], ['red', 'blue', 'violet']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=2, 
              length_includes_head=True, head_width=.2)

plt.xlim(0, 8)
plt.ylim(0, 10);


f3 = [7 9]

Scalar Multiplication

The scalar multiplication of a matrix is defined as:

$$\mathbf{A}\lambda = \begin{bmatrix} A_{00} & A_{01} & \cdots & A_{0m} \\ A_{10} & A_{11} & \cdots & A_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ A_{n0} & A_{n1} & \cdots & A_{nm} \\ \end{bmatrix}\lambda = \begin{bmatrix} A_{00} \lambda & A_{01} \lambda & \cdots & A_{0m} \lambda \\ A_{10} \lambda & A_{11} \lambda & \cdots & A_{1m} \lambda \\ \vdots & \vdots & \ddots & \vdots \\ A_{n0} \lambda & A_{n1} \lambda & \cdots & A_{nm} \lambda \\ \end{bmatrix}$$

The geometrical interpretation of scalar multiplication corresponds to stretching or compressing a vector (changing its magnitude) without changing its direction. Hence, scalars are what scales vectors.


In [16]:
A*10


Out[16]:
array([[ 0, 10, 20],
       [30, 40, 50],
       [60, 70, 80]])

In [17]:
b*0.5


Out[17]:
array([[ 0. ,  0.5,  1. ]])

In [18]:
f4 = f1 * 2
f5 = f2 * -0.5

for f, color, ls in zip([f1, f2, f4, f5], ['red', 'blue']*2, 
                    ['solid', 'solid', 'dotted', 'dotted']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=1, ls=ls, 
              length_includes_head=True, head_width=.2)

plt.xlim(-5, 6)
plt.ylim(-2, 10);


Matrix and vector multiplication

The dot product (or scalar product or inner product)

This product can be defined in geometrical terms:

$$ \mathbf{a} \cdot \mathbf{b} = \left\|a \right\| \left\|b \right\| cos \theta $$

where:
$\|$ a $\|$ and $\|$ b $\|$ are the magnitudes (or length or euclidean norm) of the vectors $\mathbf{a}$ and $\mathbf{b}$
and $\theta$ is the angle bewteen $\mathbf{a}$ and $\mathbf{b}$

In NumPy, the * operator is used to perform a term by term multiplication. If you need to do a dot product it is necessary to call the dot function.


In [29]:
print b, c
print b*c
print np.dot(b, c)  # this is NOT the same as the above operation!


[[0 1 2]] [ 0 -1  2]
[[ 0 -1  4]]
[3]

From the definition of the dot product is possible to compute the angle between two vectors. The ($180 \over \pi$) factor is there to convert from radians to degrees. Alternativelly you can use the NumPy function degrees.


In [30]:
angle = np.arccos(np.dot(f1, f2) 
                  / (np.linalg.norm(f1)*np.linalg.norm(f2)))*180/np.pi

for f, color in zip([f1, f2], ['red', 'blue']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=2, 
              length_includes_head=True, head_width=.2)


plt.xlim(0, 6)
plt.ylim(0, 6)
plt.plot(0, label=r'$\theta = %.1f^\circ$' % angle, alpha=0)
plt.legend(loc=0, fontsize=20);


The product of two matrices can be defined in very different ways (each definition providing a different result). The most common definition of a matrix multiplication is a generalization of the dot product. This definition is the most common because has many applications in mathematics, engineering and science. The dot product can also be defined using and algebraic definition, a definition that is more easy to understand (for matrices) than the geometrical one:

$$ \mathbf{a}=\begin{bmatrix} a_{0} & a_{1} & \cdots & a_{m} \\ \end{bmatrix},\quad\mathbf{b}=\begin{bmatrix} b_{0} \\ b_{1} \\ \vdots \\ b_{m} \\ \end{bmatrix} $$$$ \mathbf{a} \cdot \mathbf{b} =\begin{bmatrix} a_{0}b_{0} + a_{1}b_{1} + \cdots a_{m}b_{m} \\ \end{bmatrix} $$

In a shorter form $$\mathbf{a} \cdot \mathbf{b} = \sum_{i=0}^m a_{i}b_{i}$$

And now for matrices

$$\mathbf{A}=\begin{bmatrix} A_{00} & A_{01} & \cdots & A_{0m} \\ A_{10} & A_{11} & \cdots & A_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ A_{n0} & A_{n1} & \cdots & A_{nm} \\ \end{bmatrix},\quad\mathbf{B}=\begin{bmatrix} B_{00} & B_{01} & \cdots & B_{0p} \\ B_{10} & B_{11} & \cdots & B_{1p} \\ \vdots & \vdots & \ddots & \vdots \\ B_{m0} & B_{m1} & \cdots & B_{mp} \\ \end{bmatrix}$$$$\mathbf{A} \cdot \mathbf{B} = \begin{bmatrix} \sum_{i=0}^m A_{0i}B_{i0} & \sum_{i=0}^m A_{0i}B_{i1} & \cdots & \sum_{i=0}^m A_{0i}B_{ip} \\ \sum_{i=0}^m A_{1i}B_{i0} & \sum_{i=0}^m A_{1i}B_{i1} & \cdots & \sum_{i=0}^m A_{1i}B_{ip} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=0}^m A_{ni}B_{i0} & \sum_{i=0}^m A_{ni}B_{i1} & \cdots & \sum_{i=0}^m A_{ni}B_{ip} \\ \end{bmatrix}$$

In [19]:
print A
print A*A
print np.dot(A, A)


[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ 0  1  4]
 [ 9 16 25]
 [36 49 64]]
[[ 15  18  21]
 [ 42  54  66]
 [ 69  90 111]]

Notice that the matrix product $\mathbf{A} \cdot \mathbf{B}$ is defined only if the number of columns of $\mathbf{A}$ is equal to the number of rows of $\mathbf{B}$. If $\mathbf{A}$ has dimensions $n \times m$ and $\mathbf{B}$ dimensions $m \times p$ then, $\mathbf{A} \cdot \mathbf{B}$ will have dimensions $n \times p$.

The matrix product have some properties that could be counterintuitive given what you know from high school algebra. For example

  • In genereral the matrix product is non-commutative, i.e.$\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$. Only square matrices can be commutative.

For example if we have the matrices $\mathbf{b}$ and $\mathbf{A}$ with shapes:


In [20]:
b.shape, A.shape


Out[20]:
((1, 3), (3, 3))

We can multiple them in the order $\mathbf{bA}$, but not $\mathbf{Ab}$


In [21]:
np.dot(b, A)  # This is a valid operation


Out[21]:
array([[15, 18, 21]])

In [24]:
np.dot(A, b)  # This is not a valid operation


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-5e57adadb135> in <module>()
----> 1 np.dot(A, b)  # This is not a valid operation

ValueError: objects are not aligned

Sience $\mathbf{A}$ is a square matrix we can multiply it by its transpose (since the dimensions agree)


In [25]:
np.dot(A, A.T)


Out[25]:
array([[  5,  14,  23],
       [ 14,  50,  86],
       [ 23,  86, 149]])

But even when dimensions agree this does not mean that matrix multiplication is necessary commutative.


In [26]:
np.dot(A.T, A)


Out[26]:
array([[45, 54, 63],
       [54, 66, 78],
       [63, 78, 93]])
  • $\mathbf{A}\mathbf{V} = \mathbf{S}\mathbf{V}$, does not necessarily imply that $\mathbf{A} = \mathbf{S}$

In [28]:
v = np.array([ 0.16476382,  0.50577448,  0.84678513])
s = 13.34846922834954

print A
print np.dot(A, v)
print np.dot(s, v)

np.allclose(np.dot(A, v), np.dot(s, v))


[[0 1 2]
 [3 4 5]
 [6 7 8]]
[  2.19934474   6.75131503  11.30328532]
[  2.19934478   6.75131508  11.30328525]
Out[28]:
True

If $\mathbf{Y}\mathbf{Z}$ equals the zero matrix, then it is not necessarily true that $\mathbf{Y} = 0 $ or $\mathbf{Z} = 0$


In [29]:
Y = np.array([[2, 3, -2],[0, 0, 0],[0, 0,0]])
Z = np.array([[0, 0,  0],[0, 0, 2],[0, 0,3]])

In [30]:
np.dot(Y, Z)


Out[30]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

The cross product (or vector product)

The cross product of the vectors $a × b$ is defined as the vector $c$ that is perpendicular to both $a$ and $b$, with a direction given by the right-hand rule and a magnitude equal to the area of the parallelogram that the vectors span.

\begin{equation} \mathbf{a} \times \mathbf{b} = \left\| \mathbf{a} \right\| \left\| \mathbf{b} \right\| \sin \theta \ \mathbf{n} \end{equation}

where:

$\theta$ is the angle between $\mathbf{a}$ and $\mathbf{b}$ in the plane containing them.
$\|$a$\|$ and $\|$b$\|$ are the magnitudes of vectors $\mathbf{a}$ and $\mathbf{b}$
$\mathbf{n}$ is a unit vector perpendicular to the plane containing $\mathbf{a}$ and $\mathbf{b}$ in the direction given by the right-hand rule.

The vector product is not commutative (it is anti-commutative) as can be seen from the animation below.


In [31]:
print b, c
print np.cross(b, c)
print np.cross(-c, b)
print np.cross(b, b)


[[0 1 2]] [ 0 -1  2]
[[4 0 0]]
[[4 0 0]]
[[0 0 0]]

Linear transformations

Matrix algebra originated to describe coordinate transformations. In the following examples we explore how to apply transformations to a set of points. Just for fun, let the set of points be a smiley face.


In [33]:
# Define a funtion that draws circles, and circular sections
def circle(r=1., phi_a=0., phi_b=2*np.pi):
    """
    A function to create arcs.
    
    Parameters
    ----------
    r : float. 
        Radius of the circular section
    phi_a: float. 
        Initial point
    phi_b: float. 
        Final point
    
    Returns
    -------
    Two arrays proving the x and y coordiantes for the arc. The default 
    values corresponds to a circle of unitary radius.
    
    """
    phi = np.arange(phi_a, phi_b, 0.05)
    x_ori, y_ori = 0, 0
    # change from polar to cartesian coordinates
    x = r*np.cos(phi) + x_ori
    y = r*np.sin(phi) + y_ori
    return x, y

# draw the face
x0, y0 = circle()
# draw the mouth
x1, y1 = circle(r=.5, phi_a=np.pi, phi_b=2*np.pi)
x0 = np.append(x0, x1)
y0 = np.append(y0, y1)
# draw the eyes
x = np.append(x0, (-0.5, 0.5))
y = np.append(y0, (0.5, 0.5))
# create the face matrix
face = np.vstack((x,y))
# Plot the results
plt.plot(face[0], face[1], 'ko' )
plt.axis('equal');


Now we will rotate the smiley face by applying to it the matrix

$$\mathbf{B} = \begin{bmatrix} cos \theta & -sin \theta & \\ sin \theta & cos \theta & \\ \end{bmatrix}$$

In [36]:
theta = np.radians(30)
B = np.array([[np.cos(theta), -np.sin(theta)], 
            [np.sin(theta),  np.cos(theta)]])

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


We can also scale our face (notice how the numbers on the axes change)


In [40]:
B = np.array([[4, 0], 
            [0, 4]])

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


The scaling does not necessary have to be symmetric.


In [44]:
B = np.array([[8, 0], 
            [0,  2]])

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


We can also shear our face vertically or either horizontally as show next.


In [48]:
B = np.array([[1, 0], 
            [0,  1]])

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


Orthogonal matrices and orthogonal transformations

If we have a a square matrix $\mathbf{A}$ for which: its inverse $\mathbf{A}^{-1}$ is equal to its transpose $\mathbf{A}.T$, then $\mathbf{A}$ is an orthogonal matrix. For example, the rotation matrix discussed earlier,

$$A = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \\ \end{bmatrix}$$

is orthogonal:

$$A^{T} = A^{-1} = \begin{bmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \\ \end{bmatrix}$$

The columns and rows of an orthogonal matrix form a system of orthogonal unit vectors (orthonormal vectors).

An orthogonal transformation is a linear transformation whose transformation matrix is orthogonal. Orthogonal transformations are important because they preserve the lengths and the angles between vectors. All orthogonal transformations in a two or three dimensional space are either rotations or reflections (or combinations of both).

Eigenvalues and eigenvectors

An eigenvector (eigen means characteristic, unique, proper...) of a square matrix $\mathbf{A}$ is a vector $\mathbf{v}$ (other than the null vector) that satisfy the following equation:

$$\mathbf{A} \mathbf{v} = \lambda \mathbf{v}$$

Where:
$\lambda$ is the eigenvalue (scalar) associated with the eigenvector $\mathbf{v}$.

Solving the above equation (i.e. finding the eigenvalues and eigenvectors) is know as the eigenvalue problem, and is of central importance in many branches of science and engineering.

Notice from the above equation that $\lambda$ scales $\mathbf{v}$ and that $\mathbf{A}$ is also scaling $\mathbf{v}$ (because both terms are equal). One way to visualize eigenvectors and eigenvalues is to think about them in terms of the linear transformations we saw above. In that context, an eigenvector is a vector (other than the null vector), which does not change its direction when a linear transformation is applied (or is exactly reversed). Eigenvectors can change in length, and the magnitude of the change is indicated by its associated eigenvalue.

If you need to compute eigenvalues with NumPy you can use the linalg.eigh function:


In [51]:
np.linalg.eig(A)


Out[51]:
(array([  1.33484692e+01,  -1.34846923e+00,  -2.48477279e-16]),
 array([[ 0.16476382,  0.79969966,  0.40824829],
        [ 0.50577448,  0.10420579, -0.81649658],
        [ 0.84678513, -0.59128809,  0.40824829]]))

The first array contains the eigenvalues and the second the corresponding eigenvectors, in general but not always the eigenvalues are ordered from low to high.

The eigenvectors are listed as column of the second array, not rows!

There is also another function called linalg.eigh (notice the h at the end). The function linalg.eig() is for non-symmetric matrices and the result could be complex numbers linalg.eigh is for symmetric (AKA hermitian matrices) and the results are always real numbers. Although linalg.eig will also works for symmetric matrices it is a better idea to use linalg.eigh for those matrices.

As we have already seen a vector may be seen as an arrow in an n-dimensional space. In that case, an eigenvector $\mathbf{v}$ is an arrow whose direction is either preserved or exactly reversed after multiplication by $\mathbf{A}$. The corresponding eigenvalue $\lambda$ determines how the length of the arrow is modified, the length of the arrow will increase if $|\lambda| > 1$, remain the same if $|\lambda| = 1$, and decrease it if $|\lambda|< 1$. The direction will be reversed if eigenvalue is negative or will be the same if positive.

As examples, we can compute the eigenvalues and eigenvectors for the transformations matrices we see before.


In [52]:
# Symetric scaling
B = np.array([[4, 0], 
            [0,  4]])
eigen_vals, eigen_vecs = np.linalg.eigh(B)
f0 = eigen_vals[0] * eigen_vecs[:,0]
f1 = eigen_vals[1] * eigen_vecs[:,1]

print eigen_vals
print eigen_vecs

plt.axis('equal')
for f, color in zip([f0, f1], ['red', 'blue']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=2, 
              length_includes_head=True, head_width=.2)

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


[ 4.  4.]
[[ 1.  0.]
 [ 0.  1.]]

In [55]:
# Asymetric scaling
B = np.array([[8, 0], 
            [0,  2]])
eigen_vals, eigen_vecs = np.linalg.eigh(B)
f0 = eigen_vals[0] * eigen_vecs[:,0]
f1 = eigen_vals[1] * eigen_vecs[:,1]

print eigen_vals
print eigen_vecs

plt.axis('equal')
for f, color in zip([f0, f1], ['red', 'blue']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=2, 
              length_includes_head=True, head_width=.2)

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


[ 2.  8.]
[[ 0.  1.]
 [ 1.  0.]]

In [59]:
# horizontal shear
B = np.array([[1, 2], 
            [0,  1]])

eigen_vals, eigen_vecs = np.linalg.eig(B)
f0 = eigen_vals[0] * eigen_vecs[:,0]
f1 = eigen_vals[1] * eigen_vecs[:,1]

print eigen_vals
print eigen_vecs

plt.axis('equal')
for f, color in zip([f0, f1], ['red', 'blue']):
    plt.arrow(0, 0, f[0], f[1], color=color, lw=2, 
              length_includes_head=True, head_width=.2)

new_face = np.dot(B, face)
plt.plot(new_face[0], new_face[1], 'ko')
plt.axis('equal');


[ 1.  1.]
[[  1.00000000e+00  -1.00000000e+00]
 [  0.00000000e+00   1.11022302e-16]]

Rotation

Eigenvalues and eigenvectors are not restricted to real values. For example the eigenvalues and eigenvectors of rotation matrices are complex numbers (except when $\theta = 0$, but that's not a rotation!)


In [71]:
theta = np.radians(30)
B = np.array([[np.cos(theta), -np.sin(theta)], 
            [np.sin(theta),  np.cos(theta)]])

np.linalg.eig(B)


Out[71]:
(array([ 0.8660254+0.5j,  0.8660254-0.5j]),
 array([[ 0.70710678+0.j        ,  0.70710678-0.j        ],
        [ 0.00000000-0.70710678j,  0.00000000+0.70710678j]]))

Apart from real and complex numbers, eigenvectors could be functions. In such contexts people sometimes talk about, eigenfunction, eigenface, eigenstate, and eigenstuff.

One of the reasons that eigenvectors are ubiquitous in science is that they can be used as alternative (and in general more useful) coordinate system to represent our data/models. We will see an example of this idea when we discuss Principal Component analysis (PCA) in chapter 4.