Linear Algebra with examples using Numpy


In [1]:
import numpy as np
from numpy.random import randn as randn
from numpy.random import randint as randint

Linear Algebra and Machine Learning

  • Ranking web pages in order of importance
    • Solved as the problem of finding the eigenvector of the page score matrix
  • Dimensionality reduction - Principal Component Analysis
  • Movie recommendation
    • Use singular value decomposition (SVD) to break down user-movie into user-feature and movie-feature matrices, keeping only the top $k$-ranks to identify the best matches
  • Topic modeling
    • Extensive use of SVD and matrix factorization can be found in Natural Language Processing, specifically in topic modeling and semantic analysis

Vectors

A vector can be represented by an array of real numbers

$$\mathbf{x} = [x_1, x_2, \ldots, x_n]$$

Geometrically, a vector specifies the coordinates of the tip of the vector if the tail were placed at the origin


In [2]:
from IPython.display import Image
Image('images/vector.png')


Out[2]:

In [3]:
x = np.array([1,2,3,4])
print x
print x.shape


[1 2 3 4]
(4,)

The norm of a vector $\mathbf{x}$ is defined by

$$||\boldsymbol{x}|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}$$

In [4]:
print np.sqrt(np.sum(x**2))
print np.linalg.norm(x)


5.47722557505
5.47722557505

Adding a constant to a vector adds the constant to each element

$$a + \boldsymbol{x} = [a + x_1, a + x_2, \ldots, a + x_n]$$

In [5]:
a = 4
print x
print x + 4


[1 2 3 4]
[5 6 7 8]

Multiplying a vector by a constant multiplies each term by the constant.

$$a \boldsymbol{x} = [ax_1, ax_2, \ldots, ax_n]$$

In [6]:
print x
print x*4
print np.linalg.norm(x*4)
#print np.linalg.norm(x)


[1 2 3 4]
[ 4  8 12 16]
21.9089023002

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the dot product is give by

$$\boldsymbol{x} \cdot \boldsymbol{y} = x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

In [7]:
y = np.array([4, 3, 2, 1])
print x
print y
np.dot(x,y)


[1 2 3 4]
[4 3 2 1]
Out[7]:
20

If $\mathbf{x} \cdot \mathbf{y} = 0$ then $x$ and $y$ are orthogonal (aligns with the intuitive notion of perpindicular)


In [8]:
w = np.array([1, 2])
v = np.array([-2, 1])
np.dot(w,v)


Out[8]:
0

The norm squared of a vector is just the vector dot product with itself $$ ||x||^2 = x \cdot x $$


In [9]:
print np.linalg.norm(x)**2
print np.dot(x,x)


30.0
30

The distance between two vectors is the norm of the difference. $$ d(x,y) = ||x-y|| $$


In [10]:
np.linalg.norm(x-y)


Out[10]:
4.4721359549995796

Cosine Similarity is the cosine of the angle between the two vectors give by

$$cos(\theta) = \frac{\boldsymbol{x} \cdot \boldsymbol{y}}{||\boldsymbol{x}|| \text{ } ||\boldsymbol{y}||}$$

In [11]:
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
np.dot(x,y)/(np.linalg.norm(x)*np.linalg.norm(y))


Out[11]:
0.96886393162696616

If both $\boldsymbol{x}$ and $\boldsymbol{y}$ are zero-centered, this calculation is the correlation between $\boldsymbol{x}$ and $\boldsymbol{y}$


In [12]:
x_centered = x - np.mean(x)
y_centered = y - np.mean(y)
# The following gives the "Centered Cosine Similarity"
# ... which is equivelent to the "Sample Pearson Correlation Coefficient"
# ... (in the correlation case, we're interpreting the vector as a list of samples)
# ... see: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#For_a_sample
np.dot(x_centered,y_centered)/(np.linalg.norm(x_centered)*np.linalg.norm(y_centered))


Out[12]:
0.99999999999999978

Linear Combinations of Vectors

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then

$$\boldsymbol{x} + \boldsymbol{y} = [x_1+y_1, x_2+y_2, \ldots, x_n+y_n]$$

In [13]:
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
print x+y


[ 6  8 10 12]

In [14]:
a=2
x = np.array([1,2,3,4])
print a*x


[2 4 6 8]

A linear combination of a collection of vectors $(\boldsymbol{x}_1, \boldsymbol{x}_2, \ldots, \boldsymbol{x}_m)$ is a vector of the form

$$a_1 \cdot \boldsymbol{x}_1 + a_2 \cdot \boldsymbol{x}_2 + \cdots + a_m \cdot \boldsymbol{x}_m$$

In [15]:
a1=2
x1 = np.array([1,2,3,4])
print a1*x1
a2=4
x2 = np.array([5,6,7,8])
print a2*x2
print a1*x1 + a2*x2


[2 4 6 8]
[20 24 28 32]
[22 28 34 40]

Matrices

An $n \times p$ matrix is an array of numbers with $n$ rows and $p$ columns:

$$ X = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} $$

$n$ = the number of subjects
$p$ = the number of features

For the following $2 \times 3$ matrix $$ X = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \end{bmatrix} $$

We can create in Python using NumPY


In [16]:
X = np.array([[1,2,3],[4,5,6]])
print X[1, 2]
print X
print X.shape


6
[[1 2 3]
 [4 5 6]]
(2, 3)

Basic Properties

Let $X$ and $Y$ be matrices of the dimension $n \times p$. Let $x_{ij}$ $y_{ij}$ for $i=1,2,\ldots,n$ and $j=1,2,\ldots,p$ denote the entries in these matrices, then

  1. $X+Y$ is the matrix whose $(i,j)^{th}$ entry is $x_{ij} + y_{ij}$
  2. $X-Y$ is the matrix whose $(i,j)^{th}$ entry is $x_{ij} - y_{ij}$
  3. $aX$, where $a$ is any real number, is the matrix whose $(i,j)^{th}$ entry is $ax_{ij}$

In [17]:
X = np.array([[1,2,3],[4,5,6]])
print X
Y = np.array([[7,8,9],[10,11,12]])
print Y
print X+Y


[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]
[[ 8 10 12]
 [14 16 18]]

In [18]:
X = np.array([[1,2,3],[4,5,6]])
print X
Y = np.array([[7,8,9],[10,11,12]])
print Y
print X-Y


[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]
[[-6 -6 -6]
 [-6 -6 -6]]

In [19]:
X = np.array([[1,2,3],[4,5,6]])
print X
a=5
print a*X


[[1 2 3]
 [4 5 6]]
[[ 5 10 15]
 [20 25 30]]

In order to multiply two matrices, they must be conformable such that the number of columns of the first matrix must be the same as the number of rows of the second matrix.

Let $X$ be a matrix of dimension $n \times k$ and let $Y$ be a matrix of dimension $k \times p$, then the product $XY$ will be a matrix of dimension $n \times p$ whose $(i,j)^{th}$ element is given by the dot product of the $i^{th}$ row of $X$ and the $j^{th}$ column of $Y$

$$\sum_{s=1}^k x_{is}y_{sj} = x_{i1}y_{1j} + \cdots + x_{ik}y_{kj}$$

Note:

$$XY \neq YX$$

If $X$ and $Y$ are square matrices of the same dimension, then the both the product $XY$ and $YX$ exist; however, there is no guarantee the two products will be the same


In [20]:
X = np.array([[2,1,0],[-1,2,3]])
print X
Y = np.array([[0,-2],[1,2],[1,1]])
print Y

# Matrix multiply with dot operator
print np.dot(X,Y)
print X.dot(Y)


[[ 2  1  0]
 [-1  2  3]]
[[ 0 -2]
 [ 1  2]
 [ 1  1]]
[[ 1 -2]
 [ 5  9]]
[[ 1 -2]
 [ 5  9]]

In [21]:
# Regular multiply operator is just element-wise multiplication
print X
print Y.transpose()
print X*Y.T


[[ 2  1  0]
 [-1  2  3]]
[[ 0  1  1]
 [-2  2  1]]
[[0 1 0]
 [2 4 3]]

Additional Properties of Matrices

  1. If $X$ and $Y$ are both $n \times p$ matrices, then $$X+Y = Y+X$$
  2. If $X$, $Y$, and $Z$ are all $n \times p$ matrices, then $$X+(Y+Z) = (X+Y)+Z$$
  3. If $X$, $Y$, and $Z$ are all conformable, then $$X(YZ) = (XY)Z$$
  4. If $X$ is of dimension $n \times k$ and $Y$ and $Z$ are of dimension $k \times p$, then $$X(Y+Z) = XY + XZ$$
  5. If $X$ is of dimension $p \times n$ and $Y$ and $Z$ are of dimension $k \times p$, then $$(Y+Z)X = YX + ZX$$
  6. If $a$ and $b$ are real numbers, and $X$ is an $n \times p$ matrix, then $$(a+b)X = aX+bX$$
  7. If $a$ is a real number, and $X$ and $Y$ are both $n \times p$ matrices, then $$a(X+Y) = aX+aY$$
  8. If $z$ is a real number, and $X$ and $Y$ are conformable, then $$X(aY) = a(XY)$$

Matrix Transpose

The transpose of an $n \times p$ matrix is a $p \times n$ matrix with rows and columns interchanged

$$ X^T = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{p1} & x_{p2} & \cdots & x_{pn} \end{bmatrix} $$

In [22]:
print X
X_T = X.transpose()
print X_T
print X_T.shape


[[ 2  1  0]
 [-1  2  3]]
[[ 2 -1]
 [ 1  2]
 [ 0  3]]
(3, 2)

Properties of Transpose

  1. Let $X$ be an $n \times p$ matrix and $a$ a real number, then $$(cX)^T = cX^T$$
  2. Let $X$ and $Y$ be $n \times p$ matrices, then $$(X \pm Y)^T = X^T \pm Y^T$$
  3. Let $X$ be an $n \times k$ matrix and $Y$ be a $k \times p$ matrix, then $$(XY)^T = Y^TX^T$$

Vector in Matrix Form

A column vector is a matrix with $n$ rows and 1 column and to differentiate from a standard matrix $X$ of higher dimensions can be denoted as a bold lower case $\boldsymbol{x}$

$$ \boldsymbol{x} = \begin{bmatrix} x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{bmatrix} $$

In numpy, when we enter a vector, it will not normally have the second dimension, so we can reshape it


In [23]:
x = np.array([1,2,3,4])
print x
print x.shape


[1 2 3 4]
(4,)

In [24]:
y = x.reshape(4,1)
z = x[:,np.newaxis]
print y
print z
print y.shape
print z.shape


[[1]
 [2]
 [3]
 [4]]
[[1]
 [2]
 [3]
 [4]]
(4, 1)
(4, 1)

and a row vector is generally written as the transpose

$$\boldsymbol{x}^T = [x_1, x_2, \ldots, x_n]$$

In [25]:
x_T = y.transpose()
print x_T
print x_T.shape
print x


[[1 2 3 4]]
(1, 4)
[1 2 3 4]

If we have two vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ of the same length $(n)$, then the dot product is give by matrix multiplication

$$\boldsymbol{x}^T \boldsymbol{y} = \begin{bmatrix} x_1& x_2 & \ldots & x_n \end{bmatrix} \begin{bmatrix} y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{bmatrix} = x_1y_1 + x_2y_2 + \cdots + x_ny_n$$

Inverse of a Matrix

The inverse of a square $n \times n$ matrix $X$ is an $n \times n$ matrix $X^{-1}$ such that

$$X^{-1}X = XX^{-1} = I$$

Where $I$ is the identity matrix, an $n \times n$ diagonal matrix with 1's along the diagonal.

If such a matrix exists, then $X$ is said to be invertible or nonsingular, otherwise $X$ is said to be noninvertible or singular.


In [26]:
print np.identity(4)
X = np.array([[1,2,3], [0,1,0], [-2, -1, 0]])
Y = np.linalg.inv(X)
print Y
print Y.dot(X)
print np.allclose(X,Y.dot(X))


[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]
[[ 0.         -0.5        -0.5       ]
 [ 0.          1.          0.        ]
 [ 0.33333333 -0.5         0.16666667]]
[[  1.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   1.00000000e+00   0.00000000e+00]
 [  0.00000000e+00  -2.77555756e-17   1.00000000e+00]]
False

Properties of Inverse

  1. If $X$ is invertible, then $X^{-1}$ is invertible and $$(X^{-1})^{-1} = X$$
  2. If $X$ and $Y$ are both $n \times n$ invertible matrices, then $XY$ is invertible and $$(XY)^{-1} = Y^{-1}X^{-1}$$
  3. If $X$ is invertible, then $X^T$ is invertible and $$(X^T)^{-1} = (X^{-1})^T$$

Orthogonal Matrices

Let $X$ be an $n \times n$ matrix such than $X^TX = I$, then $X$ is said to be orthogonal which implies that $X^T=X^{-1}$

This is equivalent to saying that the columns of $X$ are all orthogonal to each other (and have unit length).

Matrix Equations

A system of equations of the form: \begin{align*} a_{11}x_1 + \cdots + a_{1n}x_n &= b_1 \\ \vdots \hspace{1in} \vdots \\ a_{m1}x_1 + \cdots + a_{mn}x_n &= b_m \end{align*} can be written as a matrix equation: $$ A\mathbf{x} = \mathbf{b} $$ and hence, has solution $$ \mathbf{x} = A^{-1}\mathbf{b} $$

Eigenvectors and Eigenvalues

Let $A$ be an $n \times n$ matrix and $\boldsymbol{x}$ be an $n \times 1$ nonzero vector. An eigenvalue of $A$ is a number $\lambda$ such that

$$A \boldsymbol{x} = \lambda \boldsymbol{x}$$

A vector $\boldsymbol{x}$ satisfying this equation is called an eigenvector associated with $\lambda$

Eigenvectors and eigenvalues will play a huge roll in matrix methods later in the course (PCA, SVD, NMF).


In [27]:
A = np.array([[1, 1], [1, 2]])
vals, vecs = np.linalg.eig(A)
print vals
print vecs


[ 0.38196601  2.61803399]
[[-0.85065081 -0.52573111]
 [ 0.52573111 -0.85065081]]

In [28]:
lam = vals[0]
vec = vecs[:,0]
print A.dot(vec)
print lam * vec


[-0.3249197   0.20081142]
[-0.3249197   0.20081142]

In [ ]: