In the context of deep learning, linear algebra is a mathematical toolbox that offers helpful techniques for manipulating groups of numbers simultaneously. It provides structures like vectors and matrices (spreadsheets) to hold these numbers and new rules for how to add, subtract, multiply, and divide them.
A vector of n dimensions is an ordered collection of n coordinates, where each coordinate is a scalar of the underlying field. An n-dimensional vector v with real coordinates is an element of R^n
In [1]:
import numpy as np
In [2]:
y = np.array([1,2,3])
x = np.array([2,3,4])
In [3]:
y + x
Out[3]:
In [4]:
y-x
Out[4]:
In [5]:
y/x
Out[5]:
In [6]:
np.dot(y,x)
Out[6]:
In [7]:
x * y
Out[7]:
Matrix is a rectangular array of scalars. Primarily, an n × m matrix A is used to describe a linear transformation from m to n dimensions, where the matrix is an operator. We describe the dimensions of a matrix in terms of rows by columns
\begin{split}\begin{bmatrix} 2 & 4 \\ 5 & -7 \\ 12 & 5 \\ \end{bmatrix} \begin{bmatrix} a² & 2a & 8\\ 18 & 7a-4 & 10\\ \end{bmatrix}\end{split}The first has dimensions (3,2). The second (2,3).
In [10]:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[1,2,3]])
Scalar operations with matrices work the same way as they do for vectors. Simply apply the scalar to every element in the matrix — add, subtract, divide, multiply, etc.
\begin{split}\begin{bmatrix} 2 & 3 \\ 2 & 3 \\ 2 & 3 \\ \end{bmatrix} + 1 = \begin{bmatrix} 3 & 4 \\ 3 & 4 \\ 3 & 4 \\ \end{bmatrix}\end{split}
In [11]:
a + 1
Out[11]:
In order to add, subtract, or divide two matrices they must have equal dimensions. We combine corresponding values in an elementwise fashion to produce a new matrix. \begin{split}\begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} + \begin{bmatrix} 1 & 2\ 3 & 4 \
\end{split}
In [16]:
a = np.array([[1,2],[3,4]])
b = np.array([[3,4],[5,6]])
In [17]:
a + b
Out[17]:
In [18]:
b-a
Out[18]:
Hadamard product of matrices is an elementwise operation. Values that correspond positionally are multiplied to produce a new matrix.
\begin{split}\begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \\ \end{bmatrix} \odot \begin{bmatrix} b_1 & b_2 \\ b_3 & b_4 \\ \end{bmatrix} = \begin{bmatrix} a_1 \cdot b_1 & a_2 \cdot b_2 \\ a_3 \cdot b_3 & a_4 \cdot b_4 \\ \end{bmatrix}\end{split}AB is a valid matrix product if A is p × q
and B is q × r
(left matrix has same number of columns
as right matrix has rows).
NOTE: Not all Matrices are eligible for multiplication. here are the rules
Matrix multiplication relies on dot product to multiply various combinations of rows and columns. In the image below, taken from Khan Academy’s excellent linear algebra course, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B
\begin{split}\begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} = \begin{bmatrix} 1a + 3b & 2a + 4b \\ 1c + 3d & 2c + 4d \\ 1e + 3f & 2e + 4f \\ \end{bmatrix}\end{split}
In [19]:
a*b
Out[19]:
Neural networks frequently process weights and inputs of different sizes where the dimensions do not meet the requirements of matrix multiplication. Matrix transpose provides a way to “rotate” one of the matrices so that the operation complies with multiplication requirements and can continue. There are two steps to transpose a matrix:
As an example, transpose matrix M into T:
\begin{split}\begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} \quad \Rightarrow \quad \begin{bmatrix} a & c & e \\ b & d & f \\ \end{bmatrix}\end{split}You need to know some basic calculus in order to understand how functions change over time (derivatives)
, and to calculate the total amount of a quantity that accumulates over a time period (integrals)
. The language of calculus will allow you to speak precisely about the properties of functions and better understand their behaviour.
it is an instantaneous rate of change
In [20]:
def get_derivative(func, x):
"""Compute the derivative of `func` at the location `x`."""
h = 0.0001 # step size
return (func(x+h) - func(x)) / h # rise-over-run
def f(x): return x**2 # some test function f(x)=x^2
x = 3 # the location of interest
computed = get_derivative(f, x)
actual = 2*x
computed, actual # = 6.0001, 6 # pretty close if you ask me...
Out[20]:
In [ ]: