Matrix multiplication from foundations

The foundations we'll assume throughout this course are:

Python
Python modules (non-DL)
pytorch indexable tensor, and tensor creation (including RNGs - random number generators)
fastai.datasets

Check imports



In [ ]:

    
%load_ext autoreload
%autoreload 2

%matplotlib inline

Jump_to lesson 8 video



In [ ]:

    
#export
from exp.nb_00 import *
import operator

def test(a,b,cmp,cname=None):
    if cname is None: cname=cmp.__name__
    assert cmp(a,b),f"{cname}:\n{a}\n{b}"

def test_eq(a,b): test(a,b,operator.eq,'==')



In [ ]:

    
test_eq(TEST,'test')



In [ ]:

    
# To run tests in console:
# ! python run_notebook.py 01_matmul.ipynb

Get data

Jump_to lesson 8 video



In [ ]:

    
#export
from pathlib import Path
from IPython.core.debugger import set_trace
from fastai import datasets
import pickle, gzip, math, torch, matplotlib as mpl
import matplotlib.pyplot as plt
from torch import tensor

MNIST_URL='http://deeplearning.net/data/mnist/mnist.pkl'



In [ ]:

    
path = datasets.download_data(MNIST_URL, ext='.gz'); path









    Out[ ]:





PosixPath('/home/ubuntu/.fastai/data/mnist.pkl.gz')



In [ ]:

    
with gzip.open(path, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')



In [ ]:

    
x_train,y_train,x_valid,y_valid = map(tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train, x_train.shape, y_train, y_train.shape, y_train.min(), y_train.max()









    Out[ ]:





(tensor([[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]),
 torch.Size([50000, 784]),
 tensor([5, 0, 4,  ..., 8, 4, 8]),
 torch.Size([50000]),
 tensor(0),
 tensor(9))



In [ ]:

    
assert n==y_train.shape[0]==50000
test_eq(c,28*28)
test_eq(y_train.min(),0)
test_eq(y_train.max(),9)



In [ ]:

    
mpl.rcParams['image.cmap'] = 'gray'



In [ ]:

    
img = x_train[0]



In [ ]:

    
img.view(28,28).type()









    Out[ ]:





'torch.FloatTensor'



In [ ]:

    
plt.imshow(img.view((28,28)));

Initial python model

Jump_to lesson 8 video



In [ ]:

    
weights = torch.randn(784,10)



In [ ]:

    
bias = torch.zeros(10)

Matrix multiplication



In [ ]:

    
def matmul(a,b):
    ar,ac = a.shape # n_rows * n_cols
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            for k in range(ac): # or br
                c[i,j] += a[i,k] * b[k,j]
    return c



In [ ]:

    
m1 = x_valid[:5]
m2 = weights



In [ ]:

    
m1.shape,m2.shape









    Out[ ]:





(torch.Size([5, 784]), torch.Size([784, 10]))



In [ ]:

    
%time t1=matmul(m1, m2)









    



CPU times: user 818 ms, sys: 0 ns, total: 818 ms
Wall time: 835 ms



In [ ]:

    
t1.shape









    Out[ ]:





torch.Size([5, 10])

This is kinda slow - what if we could speed it up by 50,000 times? Let's try!



In [ ]:

    
len(x_train)









    Out[ ]:





50000

Elementwise ops

Operators (+,-,*,/,>,<,==) are usually element-wise.

Examples of element-wise operations:

Jump_to lesson 8 video



In [ ]:

    
a = tensor([10., 6, -4])
b = tensor([2., 8, 7])
a,b









    Out[ ]:





(tensor([10.,  6., -4.]), tensor([2., 8., 7.]))



In [ ]:

    
a + b









    Out[ ]:





tensor([12., 14.,  3.])



In [ ]:

    
(a < b).float().mean()









    Out[ ]:





tensor(0.6667)



In [ ]:

    
m = tensor([[1., 2, 3], [4,5,6], [7,8,9]]); m









    Out[ ]:





tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

Frobenius norm:

$$\| A \|_F = \left( \sum_{i,j=1}^n | a_{ij} |^2 \right)^{1/2}$$

Hint: you don't normally need to write equations in LaTeX yourself, instead, you can click 'edit' in Wikipedia and copy the LaTeX from there (which is what I did for the above equation). Or on arxiv.org, click "Download: Other formats" in the top right, then "Download source"; rename the downloaded file to end in .tgz if it doesn't already, and you should find the source there, including the equations to copy and paste.



In [ ]:

    
(m*m).sum().sqrt()









    Out[ ]:





tensor(16.8819)

Elementwise matmul



In [ ]:

    
def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            # Any trailing ",:" can be removed
            c[i,j] = (a[i,:] * b[:,j]).sum()
    return c



In [ ]:

    
%timeit -n 10 _=matmul(m1, m2)









    



1.39 ms ± 70 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [ ]:

    
890.1/5









    Out[ ]:





178.02



In [ ]:

    
#export
def near(a,b): return torch.allclose(a, b, rtol=1e-3, atol=1e-5)
def test_near(a,b): test(a,b,near)



In [ ]:

    
test_near(t1,matmul(m1, m2))

Broadcasting

The term broadcasting describes how arrays with different shapes are treated during arithmetic operations. The term broadcasting was first used by Numpy.

From the Numpy Documentation:

The term broadcasting describes how numpy treats arrays with 
different shapes during arithmetic operations. Subject to certain 
constraints, the smaller array is “broadcast” across the larger 
array so that they have compatible shapes. Broadcasting provides a 
means of vectorizing array operations so that looping occurs in C
instead of Python. It does this without making needless copies of 
data and usually leads to efficient algorithm implementations.

In addition to the efficiency of broadcasting, it allows developers to write less code, which typically leads to fewer errors.

This section was adapted from Chapter 4 of the fast.ai Computational Linear Algebra course.

Jump_to lesson 8 video

Broadcasting with a scalar



In [ ]:

    
a









    Out[ ]:





tensor([10.,  6., -4.])



In [ ]:

    
a > 0









    Out[ ]:





tensor([1, 1, 0], dtype=torch.uint8)

How are we able to do a > 0? 0 is being broadcast to have the same dimensions as a.

For instance you can normalize our dataset by subtracting the mean (a scalar) from the entire data set (a matrix) and dividing by the standard deviation (another scalar), using broadcasting.

Other examples of broadcasting with a scalar:



In [ ]:

    
a + 1









    Out[ ]:





tensor([11.,  7., -3.])



In [ ]:

    
m









    Out[ ]:





tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])



In [ ]:

    
2*m









    Out[ ]:





tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.],
        [14., 16., 18.]])

Broadcasting a vector to a matrix

We can also broadcast a vector to a matrix:



In [ ]:

    
c = tensor([10.,20,30]); c









    Out[ ]:





tensor([10., 20., 30.])



In [ ]:

    
m









    Out[ ]:





tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])



In [ ]:

    
m.shape,c.shape









    Out[ ]:





(torch.Size([3, 3]), torch.Size([3]))



In [ ]:

    
m + c









    Out[ ]:





tensor([[11., 22., 33.],
        [14., 25., 36.],
        [17., 28., 39.]])



In [ ]:

    
c + m









    Out[ ]:





tensor([[11., 22., 33.],
        [14., 25., 36.],
        [17., 28., 39.]])

We don't really copy the rows, but it looks as if we did. In fact, the rows are given a stride of 0.



In [ ]:

    
t = c.expand_as(m)



In [ ]:

    
t









    Out[ ]:





tensor([[10., 20., 30.],
        [10., 20., 30.],
        [10., 20., 30.]])



In [ ]:

    
m + t









    Out[ ]:





tensor([[11., 22., 33.],
        [14., 25., 36.],
        [17., 28., 39.]])



In [ ]:

    
t.storage()









    Out[ ]:





 10.0
 20.0
 30.0
[torch.FloatStorage of size 3]



In [ ]:

    
t.stride(), t.shape









    Out[ ]:





((0, 1), torch.Size([3, 3]))

You can index with the special value [None] or use unsqueeze() to convert a 1-dimensional array into a 2-dimensional array (although one of those dimensions has value 1).



In [ ]:

    
c.unsqueeze(0)









    Out[ ]:





tensor([[10., 20., 30.]])



In [ ]:

    
c.unsqueeze(1)









    Out[ ]:





tensor([[10.],
        [20.],
        [30.]])



In [ ]:

    
m









    Out[ ]:





tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])



In [ ]:

    
c.shape, c.unsqueeze(0).shape,c.unsqueeze(1).shape









    Out[ ]:





(torch.Size([3]), torch.Size([1, 3]), torch.Size([3, 1]))



In [ ]:

    
c.shape, c[None].shape,c[:,None].shape









    Out[ ]:





(torch.Size([3]), torch.Size([1, 3]), torch.Size([3, 1]))

You can always skip trailling ':'s. And '...' means 'all preceding dimensions'



In [ ]:

    
c[None].shape,c[...,None].shape









    Out[ ]:





(torch.Size([1, 3]), torch.Size([3, 1]))



In [ ]:

    
c[:,None].expand_as(m)









    Out[ ]:





tensor([[10., 10., 10.],
        [20., 20., 20.],
        [30., 30., 30.]])



In [ ]:

    
m + c[:,None]









    Out[ ]:





tensor([[11., 12., 13.],
        [24., 25., 26.],
        [37., 38., 39.]])



In [ ]:

    
c[:,None]









    Out[ ]:





tensor([[10.],
        [20.],
        [30.]])

Matmul with broadcasting



In [ ]:

    
def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
#       c[i,j] = (a[i,:]          * b[:,j]).sum() # previous
        c[i]   = (a[i  ].unsqueeze(-1) * b).sum(dim=0)
    return c



In [ ]:

    
%timeit -n 10 _=matmul(m1, m2)









    



254 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [ ]:

    
885000/277









    Out[ ]:





3194.945848375451



In [ ]:

    
test_near(t1, matmul(m1, m2))

Broadcasting Rules



In [ ]:

    
c[None,:]









    Out[ ]:





tensor([[10., 20., 30.]])



In [ ]:

    
c[None,:].shape









    Out[ ]:





torch.Size([1, 3])



In [ ]:

    
c[:,None]









    Out[ ]:





tensor([[10.],
        [20.],
        [30.]])



In [ ]:

    
c[:,None].shape









    Out[ ]:





torch.Size([3, 1])



In [ ]:

    
c[None,:] * c[:,None]









    Out[ ]:





tensor([[100., 200., 300.],
        [200., 400., 600.],
        [300., 600., 900.]])



In [ ]:

    
c[None] > c[:,None]









    Out[ ]:





tensor([[0, 1, 1],
        [0, 0, 1],
        [0, 0, 0]], dtype=torch.uint8)

When operating on two arrays/tensors, Numpy/PyTorch compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or
one of them is 1, in which case that dimension is broadcasted to make it the same size

Arrays do not need to have the same number of dimensions. For example, if you have a 256*256*3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

Image  (3d array): 256 x 256 x 3
Scale  (1d array):             3
Result (3d array): 256 x 256 x 3

The numpy documentation includes several examples of what dimensions can and can not be broadcast together.

Einstein summation

Einstein summation (einsum) is a compact representation for combining products and sums in a general way. From the numpy docs:

"The subscripts string is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding operand. Whenever a label is repeated it is summed, so np.einsum('i,i', a, b) is equivalent to np.inner(a,b). If a label appears only once, it is not summed, so np.einsum('i', a) produces a view of a with no changes."

Jump_to lesson 8 video



In [ ]:

    
# c[i,j] += a[i,k] * b[k,j]
# c[i,j] = (a[i,:] * b[:,j]).sum()
def matmul(a,b): return torch.einsum('ik,kj->ij', a, b)



In [ ]:

    
%timeit -n 10 _=matmul(m1, m2)









    



57.2 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [ ]:

    
885000/55









    Out[ ]:





16090.90909090909



In [ ]:

    
test_near(t1, matmul(m1, m2))

pytorch op

We can use pytorch's function or operator directly for matrix multiplication.

Jump_to lesson 8 video



In [ ]:

    
%timeit -n 10 t2 = m1.matmul(m2)









    



18.2 µs ± 6.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [ ]:

    
# time comparison vs pure python:
885000/18









    Out[ ]:





49166.666666666664



In [ ]:

    
t2 = m1@m2



In [ ]:

    
test_near(t1, t2)



In [ ]:

    
m1.shape,m2.shape









    Out[ ]:





(torch.Size([5, 784]), torch.Size([784, 10]))

Export



In [ ]:

    
!python notebook2script.py 01_matmul.ipynb









    



Converted 01_matmul.ipynb to nb_01.py



In [ ]: