1. Linear Algebra

In the context of deep learning, linear algebra is a mathematical toolbox that offers helpful techniques for manipulating groups of numbers simultaneously. It provides structures like vectors and matrices (spreadsheets) to hold these numbers and new rules for how to add, subtract, multiply, and divide them.

1.1 Vector

A vector of n dimensions is an ordered collection of n coordinates, where each coordinate is a scalar of the underlying field. An n-dimensional vector v with real coordinates is an element of R^n


In [1]:
import numpy as np

In [2]:
y = np.array([1,2,3])
x = np.array([2,3,4])

1.1.1 Elementwise Operations


In [3]:
y + x


Out[3]:
array([3, 5, 7])

In [4]:
y-x


Out[4]:
array([-1, -1, -1])

In [5]:
y/x


Out[5]:
array([0.5       , 0.66666667, 0.75      ])

1.1.2 Dot productions

The dot product of two vectors is a scalar. Dot product of vectors and matrices (matrix multiplication) is one of the most important operations in deep learning.


In [6]:
np.dot(y,x)


Out[6]:
20

1.1.3 Hadamard product

This is element wise multiplication which results to another vector.


In [7]:
x * y


Out[7]:
array([ 2,  6, 12])

2. Matrices

Matrix is a rectangular array of scalars. Primarily, an n × m matrix A is used to describe a linear transformation from m to n dimensions, where the matrix is an operator. We describe the dimensions of a matrix in terms of rows by columns

\begin{split}\begin{bmatrix} 2 & 4 \\ 5 & -7 \\ 12 & 5 \\ \end{bmatrix} \begin{bmatrix} a² & 2a & 8\\ 18 & 7a-4 & 10\\ \end{bmatrix}\end{split}

The first has dimensions (3,2). The second (2,3).


In [10]:
a = np.array([[1,2,3],[4,5,6]])

b = np.array([[1,2,3]])

2.1 Scalar Operations

Scalar operations with matrices work the same way as they do for vectors. Simply apply the scalar to every element in the matrix — add, subtract, divide, multiply, etc.

\begin{split}\begin{bmatrix} 2 & 3 \\ 2 & 3 \\ 2 & 3 \\ \end{bmatrix} + 1 = \begin{bmatrix} 3 & 4 \\ 3 & 4 \\ 3 & 4 \\ \end{bmatrix}\end{split}

In [11]:
a + 1


Out[11]:
array([[2, 3, 4],
       [5, 6, 7]])

2.2 Elementwise operations

In order to add, subtract, or divide two matrices they must have equal dimensions. We combine corresponding values in an elementwise fashion to produce a new matrix. \begin{split}\begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} + \begin{bmatrix} 1 & 2\ 3 & 4 \

\end{bmatrix}

\begin{bmatrix} a+1 & b+2\\ c+3 & d+4 \\ \end{bmatrix}

\end{split}


In [16]:
a = np.array([[1,2],[3,4]])
b = np.array([[3,4],[5,6]])

In [17]:
a + b


Out[17]:
array([[ 4,  6],
       [ 8, 10]])

In [18]:
b-a


Out[18]:
array([[2, 2],
       [2, 2]])

2.3 Hardmard production

Hadamard product of matrices is an elementwise operation. Values that correspond positionally are multiplied to produce a new matrix.

\begin{split}\begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \\ \end{bmatrix} \odot \begin{bmatrix} b_1 & b_2 \\ b_3 & b_4 \\ \end{bmatrix} = \begin{bmatrix} a_1 \cdot b_1 & a_2 \cdot b_2 \\ a_3 \cdot b_3 & a_4 \cdot b_4 \\ \end{bmatrix}\end{split}

AB is a valid matrix product if A is p × q and B is q × r (left matrix has same number of columns as right matrix has rows).

NOTE: Not all Matrices are eligible for multiplication. here are the rules

  • The number of columns of the 1st matrix must equal the number of rows of the 2nd
  • The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of the 1st and columns of the 2nd

Matrix multiplication relies on dot product to multiply various combinations of rows and columns. In the image below, taken from Khan Academy’s excellent linear algebra course, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B

\begin{split}\begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} = \begin{bmatrix} 1a + 3b & 2a + 4b \\ 1c + 3d & 2c + 4d \\ 1e + 3f & 2e + 4f \\ \end{bmatrix}\end{split}

In [19]:
a*b


Out[19]:
array([[ 3,  8],
       [15, 24]])

2.4 Matrix transpose

Neural networks frequently process weights and inputs of different sizes where the dimensions do not meet the requirements of matrix multiplication. Matrix transpose provides a way to “rotate” one of the matrices so that the operation complies with multiplication requirements and can continue. There are two steps to transpose a matrix:

  • Rotate the matrix right 90°
  • Reverse the order of elements in each row (e.g. [a b c] becomes [c b a])

As an example, transpose matrix M into T:

\begin{split}\begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} \quad \Rightarrow \quad \begin{bmatrix} a & c & e \\ b & d & f \\ \end{bmatrix}\end{split}

3. Calculus

You need to know some basic calculus in order to understand how functions change over time (derivatives), and to calculate the total amount of a quantity that accumulates over a time period (integrals). The language of calculus will allow you to speak precisely about the properties of functions and better understand their behaviour.

3.1 Derivertives

it is an instantaneous rate of change


In [20]:
def get_derivative(func, x):
    """Compute the derivative of `func` at the location `x`."""
    h = 0.0001                          # step size
    return (func(x+h) - func(x)) / h    # rise-over-run

def f(x): return x**2                   # some test function f(x)=x^2
x = 3                                   # the location of interest
computed = get_derivative(f, x)
actual = 2*x

computed, actual   # = 6.0001, 6        # pretty close if you ask me...


Out[20]:
(6.000100000012054, 6)

In [ ]: