The foundations we'll assume throughout this course are:
In [ ]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
In [ ]:
#export
from exp.nb_00 import *
import operator
def test(a,b,cmp,cname=None):
if cname is None: cname=cmp.__name__
assert cmp(a,b),f"{cname}:\n{a}\n{b}"
def test_eq(a,b): test(a,b,operator.eq,'==')
In [ ]:
test_eq(TEST,'test')
In [ ]:
# To run tests in console:
# ! python run_notebook.py 01_matmul.ipynb
In [ ]:
#export
from pathlib import Path
from IPython.core.debugger import set_trace
from fastai import datasets
import pickle, gzip, math, torch, matplotlib as mpl
import matplotlib.pyplot as plt
from torch import tensor
MNIST_URL='http://deeplearning.net/data/mnist/mnist.pkl'
In [ ]:
path = datasets.download_data(MNIST_URL, ext='.gz'); path
Out[ ]:
In [ ]:
with gzip.open(path, 'rb') as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
In [ ]:
x_train,y_train,x_valid,y_valid = map(tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train, x_train.shape, y_train, y_train.shape, y_train.min(), y_train.max()
Out[ ]:
In [ ]:
assert n==y_train.shape[0]==50000
test_eq(c,28*28)
test_eq(y_train.min(),0)
test_eq(y_train.max(),9)
In [ ]:
mpl.rcParams['image.cmap'] = 'gray'
In [ ]:
img = x_train[0]
In [ ]:
img.view(28,28).type()
Out[ ]:
In [ ]:
plt.imshow(img.view((28,28)));
In [ ]:
weights = torch.randn(784,10)
In [ ]:
bias = torch.zeros(10)
In [ ]:
def matmul(a,b):
ar,ac = a.shape # n_rows * n_cols
br,bc = b.shape
assert ac==br
c = torch.zeros(ar, bc)
for i in range(ar):
for j in range(bc):
for k in range(ac): # or br
c[i,j] += a[i,k] * b[k,j]
return c
In [ ]:
m1 = x_valid[:5]
m2 = weights
In [ ]:
m1.shape,m2.shape
Out[ ]:
In [ ]:
%time t1=matmul(m1, m2)
In [ ]:
t1.shape
Out[ ]:
This is kinda slow - what if we could speed it up by 50,000 times? Let's try!
In [ ]:
len(x_train)
Out[ ]:
Operators (+,-,*,/,>,<,==) are usually element-wise.
Examples of element-wise operations:
In [ ]:
a = tensor([10., 6, -4])
b = tensor([2., 8, 7])
a,b
Out[ ]:
In [ ]:
a + b
Out[ ]:
In [ ]:
(a < b).float().mean()
Out[ ]:
In [ ]:
m = tensor([[1., 2, 3], [4,5,6], [7,8,9]]); m
Out[ ]:
Frobenius norm:
$$\| A \|_F = \left( \sum_{i,j=1}^n | a_{ij} |^2 \right)^{1/2}$$Hint: you don't normally need to write equations in LaTeX yourself, instead, you can click 'edit' in Wikipedia and copy the LaTeX from there (which is what I did for the above equation). Or on arxiv.org, click "Download: Other formats" in the top right, then "Download source"; rename the downloaded file to end in .tgz if it doesn't already, and you should find the source there, including the equations to copy and paste.
In [ ]:
(m*m).sum().sqrt()
Out[ ]:
In [ ]:
def matmul(a,b):
ar,ac = a.shape
br,bc = b.shape
assert ac==br
c = torch.zeros(ar, bc)
for i in range(ar):
for j in range(bc):
# Any trailing ",:" can be removed
c[i,j] = (a[i,:] * b[:,j]).sum()
return c
In [ ]:
%timeit -n 10 _=matmul(m1, m2)
In [ ]:
890.1/5
Out[ ]:
In [ ]:
#export
def near(a,b): return torch.allclose(a, b, rtol=1e-3, atol=1e-5)
def test_near(a,b): test(a,b,near)
In [ ]:
test_near(t1,matmul(m1, m2))
The term broadcasting describes how arrays with different shapes are treated during arithmetic operations. The term broadcasting was first used by Numpy.
From the Numpy Documentation:
The term broadcasting describes how numpy treats arrays with
different shapes during arithmetic operations. Subject to certain
constraints, the smaller array is “broadcast” across the larger
array so that they have compatible shapes. Broadcasting provides a
means of vectorizing array operations so that looping occurs in C
instead of Python. It does this without making needless copies of
data and usually leads to efficient algorithm implementations.
In addition to the efficiency of broadcasting, it allows developers to write less code, which typically leads to fewer errors.
This section was adapted from Chapter 4 of the fast.ai Computational Linear Algebra course.
In [ ]:
a
Out[ ]:
In [ ]:
a > 0
Out[ ]:
How are we able to do a > 0? 0 is being broadcast to have the same dimensions as a.
For instance you can normalize our dataset by subtracting the mean (a scalar) from the entire data set (a matrix) and dividing by the standard deviation (another scalar), using broadcasting.
Other examples of broadcasting with a scalar:
In [ ]:
a + 1
Out[ ]:
In [ ]:
m
Out[ ]:
In [ ]:
2*m
Out[ ]:
We can also broadcast a vector to a matrix:
In [ ]:
c = tensor([10.,20,30]); c
Out[ ]:
In [ ]:
m
Out[ ]:
In [ ]:
m.shape,c.shape
Out[ ]:
In [ ]:
m + c
Out[ ]:
In [ ]:
c + m
Out[ ]:
We don't really copy the rows, but it looks as if we did. In fact, the rows are given a stride of 0.
In [ ]:
t = c.expand_as(m)
In [ ]:
t
Out[ ]:
In [ ]:
m + t
Out[ ]:
In [ ]:
t.storage()
Out[ ]:
In [ ]:
t.stride(), t.shape
Out[ ]:
You can index with the special value [None] or use unsqueeze() to convert a 1-dimensional array into a 2-dimensional array (although one of those dimensions has value 1).
In [ ]:
c.unsqueeze(0)
Out[ ]:
In [ ]:
c.unsqueeze(1)
Out[ ]:
In [ ]:
m
Out[ ]:
In [ ]:
c.shape, c.unsqueeze(0).shape,c.unsqueeze(1).shape
Out[ ]:
In [ ]:
c.shape, c[None].shape,c[:,None].shape
Out[ ]:
You can always skip trailling ':'s. And '...' means 'all preceding dimensions'
In [ ]:
c[None].shape,c[...,None].shape
Out[ ]:
In [ ]:
c[:,None].expand_as(m)
Out[ ]:
In [ ]:
m + c[:,None]
Out[ ]:
In [ ]:
c[:,None]
Out[ ]:
In [ ]:
def matmul(a,b):
ar,ac = a.shape
br,bc = b.shape
assert ac==br
c = torch.zeros(ar, bc)
for i in range(ar):
# c[i,j] = (a[i,:] * b[:,j]).sum() # previous
c[i] = (a[i ].unsqueeze(-1) * b).sum(dim=0)
return c
In [ ]:
%timeit -n 10 _=matmul(m1, m2)
In [ ]:
885000/277
Out[ ]:
In [ ]:
test_near(t1, matmul(m1, m2))
In [ ]:
c[None,:]
Out[ ]:
In [ ]:
c[None,:].shape
Out[ ]:
In [ ]:
c[:,None]
Out[ ]:
In [ ]:
c[:,None].shape
Out[ ]:
In [ ]:
c[None,:] * c[:,None]
Out[ ]:
In [ ]:
c[None] > c[:,None]
Out[ ]:
When operating on two arrays/tensors, Numpy/PyTorch compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
Arrays do not need to have the same number of dimensions. For example, if you have a 256*256*3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:
Image (3d array): 256 x 256 x 3
Scale (1d array): 3
Result (3d array): 256 x 256 x 3
The numpy documentation includes several examples of what dimensions can and can not be broadcast together.
Einstein summation (einsum) is a compact representation for combining products and sums in a general way. From the numpy docs:
"The subscripts string is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding operand. Whenever a label is repeated it is summed, so np.einsum('i,i', a, b) is equivalent to np.inner(a,b). If a label appears only once, it is not summed, so np.einsum('i', a) produces a view of a with no changes."
In [ ]:
# c[i,j] += a[i,k] * b[k,j]
# c[i,j] = (a[i,:] * b[:,j]).sum()
def matmul(a,b): return torch.einsum('ik,kj->ij', a, b)
In [ ]:
%timeit -n 10 _=matmul(m1, m2)
In [ ]:
885000/55
Out[ ]:
In [ ]:
test_near(t1, matmul(m1, m2))
We can use pytorch's function or operator directly for matrix multiplication.
In [ ]:
%timeit -n 10 t2 = m1.matmul(m2)
In [ ]:
# time comparison vs pure python:
885000/18
Out[ ]:
In [ ]:
t2 = m1@m2
In [ ]:
test_near(t1, t2)
In [ ]:
m1.shape,m2.shape
Out[ ]:
In [ ]:
!python notebook2script.py 01_matmul.ipynb
In [ ]: