Welcome to IPython Notebook. We will be using these notebooks to run the labs. If you're reading this it likely means that you have successfully setup IPython on your computer.

As many of you likely already know python, feel free to skip to some of the harder stuff in sections 2.3, 3, 4 and 5. For those that have never seen python before please see these links:

https://groklearning.com/course/intro-python-1/

https://groklearning.com/course/intro-python-2/.


In [ ]:
# To allow compatibility between python 2 and python 3
from __future__ import print_function

1. Basic Python Syntax

If you just need a quick refresher here is some simple python syntax. Remember that indentation is important in python. To run the cell you can press Ctrl-Enter or hit the Play button at the top. Please edit the code and experiment.


In [ ]:
# This is a comment line
# numbers and variables
age = 26
pi = 3.14159

In [ ]:
# strings and methods
s = 'Hugh F Durrant-Whyte'

# Strings have a method `split`
tokens = s.split()
# tokens is now a list of strings
print(tokens)

In [ ]:
firstName = tokens[0]
middleName = tokens[1]
lastName = tokens[2]
s2 = firstName + ' ' + middleName + ' ' + lastName

In [ ]:
# 'if' statement - indentation matters
if s == s2:
    print('yes the strings are equal')
else:
    print('no')

# if statements can also be inline
answer = 'yes' if s == s2 else 'no'

In [ ]:
# list (mutable ordered sequence)
beatles = ['John', 'Paul', 'George']
beatles.append('Ringo')
print(beatles)
print('Ringo' in beatles)

In [ ]:
# 'for' loop - indentation matters
# Note that name is defined inside the for loop
for name in beatles:
    print('Hello ' + name)

# Iterating over a range of numbers is easy
# range has the following arguments (start, stop, step) where stop isn't included
for number in range(2, 10, 2):
    print(number)

In [ ]:
# tuple (immutable ordered sequence)
ages = (18, 21, 28, 21, 22, 18, 19, 34, 9)

# Note you can't change the contents of a tuple

# set (mutable, unordered, no duplicates)
uniqueAges = set(ages)
uniqueAges.add(18) # already in set, no effect
uniqueAges.remove(21)

# testing set membership (very fast)
if 18 in uniqueAges:
    print('There is an 18-year-old present!')

In [ ]:
# sorting a list
sorted(beatles) # returns a new sorted list
beatles.sort() # in-place - changes beatles list

# Sorting a set returns a list
orderedUniqueAges = sorted(uniqueAges)

# There is no guaranteed order when iterating over a set
for thisAge in uniqueAges:
    print(thisAge)

# Instead iterate over the sorted set:
for age in sorted(uniqueAges):
    print(age)

In [ ]:
# dict - mapping unique keys to values
netWorth = {}
netWorth['Donald Trump'] = 3000000000
netWorth['Bill Gates'] = 58000000000
netWorth['Tom Cruise'] = 40000000
netWorth['Joe Postdoc'] = 20000

# Access the value associated with a key
print(netWorth['Donald Trump'])

# iterating over a dict gives keys
for personName in netWorth:
    print(personName + " is worth: ")
    print(netWorth[personName])

# You can also iterate over key-value pairs:
for (person, worth) in netWorth.items():
    if worth < 1000000:
        print('haha ' + person + ' is not a millionaire')

# testing dict membership is the same as with a set
if 'Tom Cruise' in netWorth:
    print('show me the money!')

In [ ]:
def some_func(arg1, arg2, n=5):
    """
    Triple quoted strings act as function documentation.
    Arguments can be optional by including a default
    """
    # Inline comment
    ranges = [0, 1, 2, 3] # Create a list
    # Create a range object, similar to a list of numbers
    # starting at 0 going up to but not including n
    ranges = range(0, n)
    
    print('Fancy result formating using .format')
    for i in ranges:
        result = arg1 + arg2 
        print('result ({}) = arg1 ({}) + arg2 ({})'.format(result, arg1, arg2))
    
# Call the function with arguments (change me)
some_func(4, 5, 3)

You can import additional modules, including the builtin math module:


In [ ]:
import math
print(math.sqrt(1125899906842624))

1.2 File I/O

Now for a basic, but still very useful, data science example: a csv file parser.


In [ ]:
!head -n1 "sample-clusters.csv"

In [ ]:
csvfilename = "sample-clusters.csv"
all_data = []
with open(csvfilename) as f:
    for linenumber, line in enumerate(f, start=1):
        stringData = line.strip().split()
        floatData = list(map(float, stringData))
        if linenumber < 5: 
            print(linenumber, *floatData)
        all_data.append(floatData)

print("Parsed CSV comprising {} rows by {} columns".format(len(all_data), len(floatData)))

Depending on the file size you may not be able to fit everything in memory, so instead of appending each row to the all_data list you would do some more processing to extract just the feature you require. In more extreme cases you would open an output file and write the relevant features in a row by row fashion.

If you want a bit of practise writing Python before continuing, write code to sum up each column in the given csv file.

2. Linear Algebra in Python

2.1 Linear Algebra Examples

To perform linear algebra in python use the numpy library, you can import the numpy library like so:


In [ ]:
import numpy as np

Here we have imported the numpy library with the alias np. With numpy you can perform basic linear algebra operations. It's important to remember that Python uses zero-based-indexing and uses [ ] for indexing variables. Numpy also uses the : to span a dimension.

Finally numpy by default interprets the * to mean element-wise multiplication, which is the opposite of MATLAB. If you are a MATLAB user, you may find the following site helpful: http://wiki.scipy.org/NumPy_for_Matlab_Users

The following snippet is some examples of numpy usage:


In [ ]:
# Create a new array with three elements
A = np.array([1, 2, 3])
print(A)

# Note that the shape is the size of its dimensions, 
# while size is the number of elements.
print('A is of dimension {A.ndim}, with shape: {A.shape}, and size {A.size}'.format(A=A))

Note that unlike MATLAB's rand function, numpy.random.random has a single variable for size, but this can contain a list of dimension sizes. In the example below we pass the list [4,3] which is interpreted as the dimensions for the new matrix.


In [ ]:
# Create a random 4x3 array
B = np.random.random(size=[4,3]) # Equivalent to rand(4,3) in Matlab.
print(B)
print('B is of dimension {A.ndim}, with shape: {A.shape}, and size {A.size}'.format(A=B))

In [ ]:
# We can index an array like a list
print(B[0]) # Access the first row

# Multidimensions can be handled in two ways:
print(B[0][0])
print(B[0, 0])

In [ ]:
# We can slice arrays
print(B[0, :]) # Get the first row
print(B[:, -1])  # Get the last column

In [ ]:
# Slices can have start/stop/step points defined

# First two elements of the first column
print(B[:2, 0])

# Every second element of the first column
print(B[::2, 0])

# same again but starting from 1
print(B[1::2, 0])

For people familiar with MATLAB, zero indexing can be tricky. A common mistake is the length of vectors. In MATLAB [0:2] results in the vector [0, 1, 2], but in python... indexing is up to but not including the last element:


In [ ]:
# Construct the vector [0, 1, ... 9]
tmp = np.arange(0, 10)

print(tmp)
print(tmp[0:2])

In [ ]:
# Numpy Arrays have lots of methods defined on them
print(B.mean())
print(B.max())

In [ ]:
# Many of these methods can operate on one axis
# Eg to find the mean and max of each column of B:
print(B.mean(axis=0))
print(B.max(axis=0))

Examples of matrix manipulation


In [ ]:
# Note that you can use variable defined earlier in the notebook.
# Create a matrix of ones.
C = np.ones((4, 3))
print(C)

In [ ]:
# Calculate the element wise product.
print(B*C)

In [ ]:
# Calculate the matrix product. Note the dot method performs matrix mult
# .T is the transpose
print(B.dot(C.T))

We can also perform matrix inversions, using the linalg submodule.


In [ ]:
D = B.T.dot(B) # Form an invertible matrix (Positive Definite Matrix).
print(D)
print(np.linalg.inv(D)) # Matrix inverse.
print(np.linalg.cholesky(D)) # Cholesky decomposition.

In [ ]:
# Eigen decomposition returns two variables, eigen values and eigen vectors
vals, vecs = np.linalg.eig(D)
print('Eigen values = {}'.format(vals))
print('''Eigen vectors = 
{}'''.format(vecs))

You should also checkout scipy if you ever find that numpy is missing functionality that you require. http://www.scipy.org/

2.2 Getting Help

Occasionally, you will need to know the parameters of a function or need to read its description. You can use the ? operator to bring up a new window inside IPython's browser with that functions docstring. This works on functions and modules as well. Also IPython notebook can auto-complete for variables that have already been instanciated, just press <TAB>.


In [ ]:
np.random.random?

In [ ]:
np.random?

Also try auto-completing functions in the np.random module by typing np.random.<TAB>

2.3 Input/Output and a simple problem

Its very useful to be able to plot your data and read and write data to files. Lets construct a toy problem, plot the results and then save them to a file.

Problem: The position of a robot oscilates randomly in a 3-d volume for 1000 steps. The oscilation can be modelled as a random perturbation of Gaussian noise centred on its current location. The noise has a variance of 2 world units. It begins at the origin. Calculate a possible trajectory and save it to disk as a matlab matrix.

The robot state at time $t$ is: \begin{equation}\mathbf{x}_t = \begin{bmatrix} x \\ y \\ z \end{bmatrix} \end{equation}

The robots position at time t=0 is:

\begin{equation}\mathbf{x}_0 = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} \end{equation}

The robots position is perturbed by Gaussian noise. The robots state at time $t+1$ is: \begin{equation}\mathbf{x}_{t+1} = \mathbf{x}_t + \mathcal{N}\left(\begin{bmatrix}0\\ 0 \\ 0\end{bmatrix}, \begin{bmatrix} 2 & 0 & 0\\ 0 & 2 & 0\\ 0 & 0 & 2\end{bmatrix}\right) \end{equation}


In [ ]:
%matplotlib inline  
import matplotlib.pyplot as pl 

dims = 3
steps = 1000

state = np.zeros((1, dims))

history = np.zeros([steps, dims]) #Observe that the dimension is a single variable.

step_size = 2
for i in range(0, steps):
    history[i,:] = state
    
    state = state + 2 * np.random.normal(size=(1,dims))

pl.plot(history[:,0], history[:,1]);

Plotting in 3D is not as simple in matplotlib but can be done as shown below. If you execute this code without the %matplotlib inline directive ever being called in this notebook, it should produce a new window that will allow you to manipulate the figures. You may need to comment %matplotlib inline above, restart the kernel (Kernel->Restart->Restart) and re-run all the above cells and add the command pl.show() to the below cell if you wish to do this.


In [ ]:
import mpl_toolkits.mplot3d.art3d
import mpl_toolkits.mplot3d.axes3d as a3

ax = a3.Axes3D(pl.figure())
ax.plot(history[:,0], history[:,1], history[:,2]);

In [ ]:
# Save our numpy array to disk in a Python format
np.save('robot.npy', history)

To save a matlab compatible file, use the scipy.io module. The second argument is a python dictionary that uses the name of the matrix as the key and the value as the actual numpy array.


In [ ]:
# Save as a matlab file 
import scipy.io
# Save mat takes a filename and a 'dictionary' of variables to save.
scipy.io.savemat('robot.mat', {'robot': history})

3. Principal Component Analysis (PCA)

$\newcommand{\X}{\mathbf{X}}$

Principal component analysis is an incredibly useful technique for visualising data and as a pre-processing step for machine learning algorithms like regression and clustering. Essentially it is a dimensionality reduction technique for data, e.g. if your data has 100 dimensions and you would like to visualise it in 2D, PCA will find a linear combination of the 100 features "best" for visualising in 2D. In this case, "best" is assumed to be dimensions with the most variance (which in many cases may not actually be a good assumption).

Alternatively, PCA can be thought of as a mechanism for de-correlating your data, i.e. it will find a projection of your data such that the covariance matrix of the projected data is diagonal only. Using this projected data can in some cases improve the speed and performance of other machine learning algorithms. The following description and exercises should give you a better intuition for that PCA is actually doing.

Let $\X$ be an $n\times m$ matrix, where $n$ is the number of observations and $m$ are the "features". Features are observed variables or combinations thereof, and relate to a specific observation.

If $\X$ has zero mean, then we can construct the empirical covariance matrix $\mathbf{C}$ as:

$\mathbf{C}$ = $\X^T\X$

PCA attempts to find a new coordinate system such that the variance is maximised along the first axis or component. It turns out that this is equivalent to the eigen-decomposition of $\mathbf{C} = \mathbf{V}\mathbf{\Sigma}\mathbf{V}^T$. Where $\mathbf{V}$ is a $m \times m$ matrix of where each column the new basis (or eigen-vector) and $\mathbf{\Sigma}$ is a diagonal matrix of eigenvalues.

Each eigen-vector is associated with one eigenvalue, by choosing a subset of the eigenvectors (typically those associated with the largest eigenvalues), we can reduce the dimensionality of the feature vector by projecting it into this lower dimensional space.

Let $\hat{\mathbf{V}}$ be a $m \times k$ matrix of eigenvectors where only $k$ of the $m$ columns associated with the largest eigenvalues have been chosen. We can project the data onto the new space to from $\hat{\X}$ as follows:

$\hat{\X} = \X\hat{\mathbf{V}}$

$\hat{\X}$ is an approximation of $\X$ with shape $n \times k$.

See Section 12.1 of Pattern Recognition and Machine Learning by Bishop for more details.

3.1 PCA on synthetic data

We will generate two Gaussians such that they are partially overlapping


In [ ]:
# Step 1: Generate a 2-d dataset.
cov1 = np.array([[6,3],[3,6]])
cov2 = np.array([[6,3],[3,6]])

data1 = np.random.multivariate_normal(np.array([4, 4]), cov1, 500)
data2 = np.random.multivariate_normal(np.array([-4, -4]), cov2, 500)
data = np.vstack([data1, data2])

# We need to 'centre' the data, by subtracting the mean of each column.
data = data - data.mean(axis=0)

# Slice the data by taking x = first column, y = second column.
x = data[:, 0]
y = data[:, 1]

# Plot the data
pl.scatter(x,y,marker='x',color='b')
pl.title('Random Dataset')
pl.xlabel('variable 1')
pl.ylabel('variable 2');

Since we will plot often, lets write a function to do this. Note that function variables can be given default values, below we will have a default variable name set to 'Random Dataset'.


In [ ]:
def plot_points(data, name='Random Dataset'):
    """Plot the given 2d data on a scatter plot"""
    pl.scatter(data[:,0], data[:,1], marker='x', color='b')
    pl.title(name)
    pl.xlabel('variable 1')
    pl.ylabel('variable 2')
    pl.axis('equal')

plot_points(data)

In [ ]:
S = data.T.dot(data)
S.shape

In [ ]:
u, v = np.linalg.eig(S)

Note that the eig function does not return ordered eigenvalues, thus we must inspect the eigenvalues to know which eigenvector is the principal component


In [ ]:
print('Eigenvalues are: {}'.format(u))
max_eigen_index = np.argsort(u)[-1] # np.argsort orders in ascending order, [-1] indexes the final element of the list
min_eigen_index = np.argsort(u)[0]
print('The largest eigenvalue is at index : {}'.format(max_eigen_index))

Plot the data, use pl.arrow to draw the arrows of the eigenvalues. Transform the data and plot it using the eigenvectors as the principal axes. You can transform the data by: $\mathbf{X}_p = \mathbf{X}\mathbf{V^T}$, where $\mathbf{V}$ is the matrix of eigen-vectors.


In [ ]:
plot_points(data)

pl.arrow(0, 0, 10*v[0,0], 10*v[1,0] , color='r', head_width=0.5)
pl.arrow(0, 0, 10*v[0,1],10*v[1,1] , color='m', head_width=0.5);

In [ ]:
transformed = data.dot(v)

plot_points(transformed, name='transformed')

pl.arrow(0, 0, 10,0 , color='r', head_width=0.5) 
pl.arrow(0, 0, 0,10 , color='m', head_width=0.5);

Lets project the data onto the first axis, and plot these points.


In [ ]:
first_component = np.zeros((2,2))
first_component[:,max_eigen_index] = v[:,max_eigen_index] 

reconstructed = data.dot(first_component.dot(first_component.T))


pl.scatter(reconstructed[:,0], reconstructed[:,1])

In the previous plot it is difficult to see if there has been any benefit from performing PCA, as all the points are overlayed ontop of each other. These exercises will explore the use of histograms to illustrate the benefit of PCA.

Exercise 3.1 (a)

Plot a histogram of all the data projected onto the first principal component of the data. Use the command pl.hist.


In [ ]:

You should see that two peaks are clearly visible. We can improve this plot by using color.

Exercise 3.1 (b)

Project the data from each of the generating distributions contained in data1 and data2 through the principal component of the whole data.

Plot histograms of both projections on the same plot. Change the color and alpha value of the plots so they are easily distinguished.

Use the ? operator if you need help using pl.hist . The attributes you need to change are color and alpha.

Do the same for the second principal component of the data.


In [ ]:

Exercise 3.1(c)

Construct a plot similar to the one in 3.1(b), but this time use the data's x-axis. Do the same for the data's y-axis. If your task was to seperate the clusters, which of the four plots would you prefer to use?


In [ ]:

3.2 PCA on faces dataset

So far we have seen that PCA can be used to produce a small number of informative features from a large set of features (in the previous case only 1 informative feature was created). However, is there any information left in the other principal components? This next section hopes to qualatatively demonstrate this using the Olivetti Faces dataset.

To do this we will need to download some open data. The ML package scikit-learn contains some useful tools for doing just this.


In [ ]:
import sklearn.datasets as datasets

Lets use the olivetti faces dataset. You can read about it here: http://scikit-learn.org/stable/datasets/olivetti_faces.html. Importantly it consists of 400 images of size 64x64 each stored in a single row of the data matrix. We can consider each pixel to be a feature, while each row is an observation.

(Note: If you have been switching between python 2 and python 3 you may get an error when you run the cell below. Just delete the folder ~/scikit_learn_data in your home directory and try the download again.)


In [ ]:
# Download the dataset (can take some time)
dataObject = datasets.fetch_olivetti_faces(shuffle=True, random_state=100)

In [ ]:
raw_faces = dataObject.data

In [ ]:
# To improve the contrast, lets remove the average value of each pixel, and then also for each image.
per_pixel_centred = raw_faces - raw_faces.mean(axis=0)

# Note that this line uses broadcasting, see Section 4.2 for more about Broadcasting.
faces = per_pixel_centred - per_pixel_centred.mean(axis=1).reshape((400, 1)) # Remove the mean for each image.

In [ ]:
def plot_face(face):
    vmax = np.max(face.max(), -face.min())
    pl.imshow(face.reshape((64,64)), cmap=pl.cm.gray,
              vmin=-vmax,vmax=vmax, interpolation='nearest')
    ax = pl.gca();ax.set_yticks([]);ax.set_xticks([])

def plot_faces(faces, number):
    for i in range(0, number):
        ax = pl.subplot(number//3, 3, i+1)
        plot_face(faces[i,:])

plot_faces(faces, 6)

Computing the eigen-decomposition gets more difficult as the number of dimensions increase. An alternative decompostion is the Singular Value Decomposition (SVD). The Singluar Value Decompostion is given by: $\newcommand{\U}{\mathbf{U}} \newcommand{\S}{\mathbf{S}} \newcommand{\W}{\mathbf{W}} \newcommand{\V}{\mathbf{V}}$ $\X = \mathbf{U}\mathbf{S}\mathbf{W}$, note that unlike the eigen decompostion, the SVD does not require a square matrix. Let $\X$ have dimensions $n\times m$ The matricies $\U$ and $\W$ are orthoganal and have dimensions $n \times n$ and $m \times m$.

Consider the empirical covariance and use substitute the SVD:

$\X^T\X = \W^T \S^T \U^T \U \S \W$.

Since $\U$ is an orthogonal matrix.

$\X^T\X = \W^T \S^T \S \W$

Comparing this to the eigen-decomposition of $\X^T\X = \V\mathbf{\Sigma}\V^T$, we can see that $\W = \V$. Using the SVD, we can compute the eigenvalues much faster.


In [50]:
%%time
# Time is an IPython 'magic' function that will record how long this block took to execute.

_, s, w = np.linalg.svd(faces) # Since we don't care about the u term, we can use the dummy varibale _

Lets see if we can reduce the size of the features in this data using PCA. Lets take the first 100 principal components.


In [51]:
num_components = 100
top_components = w[0:num_components, :]
plot_faces(top_components, 9) # Plot the top (most important) eigenfaces


Observe the principal components of the faces data set that seem to resemble faces. We can visualise how much information is lost by transforming the first few faces through only the principal components.


In [52]:
# For the first face, show the components of the top eigenfaces
face = faces[0, :]
plot_face(face)



In [53]:
first_face_components = face.dot(top_components[0:9,:].T)
print("The contribution from the first 9 eigenfaces are:")
for i, c in enumerate(first_face_components, start=1):
    print("{:6.2f} * eigenface {} {}".format(c, i, "+" if i < 9 else ""))


The contribution from the first 9 eigenfaces are:
 -3.45 * eigenface 1 +
  0.70 * eigenface 2 +
  0.48 * eigenface 3 +
 -1.30 * eigenface 4 +
  1.51 * eigenface 5 +
 -0.57 * eigenface 6 +
  1.73 * eigenface 7 +
 -0.44 * eigenface 8 +
  1.40 * eigenface 9 

In [54]:
# Build up a representation of the first face using just these first 9 eigenfaces
plot_face(first_face_components.dot(top_components[:9, :]))



In [55]:
# Compare to the representation if we use the top 75 eigenvectors
# Try other values!
plot_face(face.dot(top_components[0:75,:].T.dot(top_components[:75, :])))



In [56]:
transformed_faces = faces[0:5,:].dot(top_components.T).dot(top_components)
def compare_faces(face1, face2, number):
    for i in range(0,number):
        pl.figure()
        for face,plot_num in zip([face1, face2], [121,122]):
            pl.subplot(plot_num)
            vmax = np.max([face[i,:].max(), -face[i,:].min()])
            pl.imshow(face[i,:].reshape((64,64)),cmap=pl.cm.gray,
              vmin=-vmax,vmax=vmax, interpolation='nearest')
compare_faces(faces, transformed_faces, 5)


Exercise 3.2 (a)

Try with different values of principal components. See if you can identify faces with only 5 principal components.

Exercise 3.2 (b)

Try using an Eigen-decompostion instead of a SVD on the faces data-set. How long does each method take?

3.3 Scikit Learn

We can also use the scikit learn library to achieve a similar thing. For decompositions we need to import the scikit learn decomposition submodule sklearn.decomposition.

To use scikit learn we will first instanciate a PCA object, and specify the number of components for the problem.


In [57]:
import sklearn.decomposition as decomp
num_components = 100
pca = decomp.PCA(n_components = num_components)

Scikit-learn objects for supervised learning typically implement a fit method that is used to train the model. For the case of PCA, this training is equivalent to finding the n principal compenents of the faces as we did in section 3.2. These principal components are then stored inside the pca object so they can be used to transform the faces. The separation between finding the principal components and transforming the faces is useful if we wish to use the principal components of one set of faces as a basis for a different but hopefully very similar new set of faces.

Pass in the faces array as training data and fit the model:


In [58]:
%time
pca.fit(faces)


CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.91 µs
Out[58]:
PCA(copy=True, n_components=100, whiten=False)

To get the reducted dimensionality features, apply the transform method to the data. Since we are interested in visualising the difference between the true and approximate face, we need to immediately apply the inverse_transform to bring the data back into its original space. We can now plot once again.


In [59]:
new_faces = pca.inverse_transform(pca.transform(faces))
compare_faces(faces, new_faces, 5)


Exercise 3.3 (a)

Try modifying the number of components to get different results.

Exercise 3.3 (b)

Scikit learn also implements whitening. Whitening is a process where the components are scaled by their eigenvalues, and is useful as many machine learning algorithms expect all the features to be at the same scale. Experiment with setting whiten=True when constructing the PCA object. Can you tell the difference between whitened and unwhitened eigenfaces?

Exercise 3.3 (c)

Implement 'whitening' for the faces dataset. This requires dividing each principal component by the square root of the its corrosponding eigenvalue. This can be done by replacing each instance of $\V$ with $\mathbf{\Sigma}^{-1/2}\V$ in the PCA equations.

Note that if you choose to use the Singular Value Decomposition (SVD), note that the each singluar value is the square root of a corresponding eigenvalue.


In [59]:

4 Advanced Numpy Usage - Optional

In this section we will explore the concept of broadcasting in numpy arrays. This is a difficult topic, so we will motivate it by first demonstating its usefullness.

4.1 Motivating Example

In this example we will construct a Gram matrix using a kernel function. (Attribution to L. McCalman who showed me this cool application of broadcasting).


In [60]:
def kernel(x,xdash):
    """
    Simple exponential kernel.
    """
    sig = 2.0
    return np.exp(-(np.abs(np.sum(x-xdash)))/sig)

Imagine that this kernel encodes some concept of distance. We wish to know the 'distance' in as described by this kernel between n points. Lets use the data from the robot toy example at the beginning. This could be done niavely as follows:


In [61]:
%%time

n = 1000 # Number of samples
dim = 2 #Dimensionality of each sample.
#points = np.random.random((n,dim))
points = history
DistMatrix = np.zeros((n,n)) # Construct a matrix to hold the result.

for i in range(0, n):
    for j in range(0,n):
        DistMatrix[i,j] = kernel(points[i,:],points[j,:])
        
pl.imshow(DistMatrix)


CPU times: user 6.89 s, sys: 152 ms, total: 7.04 s
Wall time: 7.1 s

However, we can get a great speed-up by using the concept of broadcasting.


In [62]:
points.shape


Out[62]:
(1000, 3)

In [63]:
%time
bcDist = np.exp(-(np.abs((points[:,np.newaxis,:]-points[np.newaxis,:,:]).sum(axis=2))/2.0))
bcDist.shape


CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.01 µs
Out[63]:
(1000, 1000)

In [64]:
(DistMatrix-bcDist).sum()


Out[64]:
0.0

Observe the use of the np.newaxis command, which adds a new axis to the vector. The purpose of this will become clear in the next section.


In [65]:
print(points.shape)
print(points[:,np.newaxis,:].shape)


(1000, 3)
(1000, 1, 3)

4.2 How broadcasting works

Broadcasting allows the elementwise multiplication of matricies which upon first glance are an illegal matrix operation.


In [66]:
A = np.random.random((4,1)) # Create a 4x1 matrix
B = np.random.random((1,3)) # Create a 1x3 matrix
print(A*B) # elementwise multiplication 4x1 and 1x3, have different sizes so this shouldn't work?!?


[[ 0.08985985  0.42011342  0.39097342]
 [ 0.01030074  0.04815808  0.04481773]
 [ 0.04860729  0.22724916  0.21148666]
 [ 0.12201013  0.57042262  0.53085685]]

Broadcasting is a short-hand for copying the vector along the unitary dimension. It is activated when an elementwise operation is specified that has incorrect shape, but one of the dimensions sizes is 1. Thus A*B in the previous cell is actually equivalent to:


In [67]:
Abroadcast = np.repeat(A, 3, axis=1) # Repeat matrix A along the second axis 3 times.

Bbroadcast = np.repeat(B, 4, axis=0) # Repeat matrix B along the first axis 4 times.

diff  = Abroadcast*Bbroadcast - A*B #demonstrate equivalence

print('Matrix A is broadcast to: \n {A} \n'.format(A=Abroadcast))
print('Matrix B is broadcast to: \n {B} \n'.format(B=Bbroadcast))

print('If the following result is zero, then the two approaches are equivalent: \n {}'.format(diff))


Matrix A is broadcast to: 
 [[ 0.52024484  0.52024484  0.52024484]
 [ 0.05963626  0.05963626  0.05963626]
 [ 0.28141258  0.28141258  0.28141258]
 [ 0.70637931  0.70637931  0.70637931]] 

Matrix B is broadcast to: 
 [[ 0.17272608  0.80753019  0.75151812]
 [ 0.17272608  0.80753019  0.75151812]
 [ 0.17272608  0.80753019  0.75151812]
 [ 0.17272608  0.80753019  0.75151812]] 

If the following result is zero, then the two approaches are equivalent: 
 [[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

4.3 Explaining the motivating example

The example in 4.1 demonstated an advanced usage of broadcasting, but the only difference is that matricies are copied accross dimensions instead of vectors in 4.2. Lets break it down further to show each step slowly.


In [68]:
print('inital shape is {}'.format(points.shape))
A = points[:, np.newaxis, :]
B = points[ np.newaxis,:,:]
print('the first matrix shape  is {}'.format(A.shape))
print('the second matrix shape is {}'.format(B.shape))


inital shape is (1000, 3)
the first matrix shape  is (1000, 1, 3)
the second matrix shape is (1, 1000, 3)

In [69]:
C = A-B
print('The result of A-B has shape {}'.format(C.shape))


The result of A-B has shape (1000, 1000, 3)

Observe that we have an extra axis. This contains the difference over every dimension. In order to calculate our kernel we need to sum over all these, so we can use the sum function. However, we need to specify the axis.


In [70]:
D = C.sum(axis=2) # Sum over the second axis.
print('The shape after the sum operation is {}'.format(D.shape))


The shape after the sum operation is (1000, 1000)

The rest is straightforward, as we need to take the absolute value of the sum, divde by sig and then the exponent, to finally arrive at the correct result.


In [71]:
E = np.exp(-np.abs(D)/2)
print((DistMatrix-E).sum())


0.0

Exercise 4.4 (a)

In the cell below is a naive implementation that constructs a polynomial feature matrix (this will be covered more in the next workshop).

Hints:

  • the ** operator is an element-wise operator similar to addition and subtraction.
  • Don't forget to use np.newaxis to change the shape of the vectors.

In [72]:
x = np.arange(5)
exponent = np.arange(5)

naive_result = np.zeros((5,5))
for i in range(x.shape[0]):
    for j in range(exponent.shape[0]):
        naive_result[i,j] = x[i]**exponent[j]

In the cell below, use broadcasting to re-implement the algorithm above:


In [72]:

5. Sparse Matricies - Optional - Advanced

Occasionally you may encounter a problem where you want to deal with a sparse matrix. Perhaps your problem is naturally sparse or you wish to make a sparse approximation. For simplicity's sake lets approximate the matrix in the toy robot example and then attempt to use it to solve a linear equation. This will allow us to compare dense vs sparse matrix techniques in python.


In [73]:
denseMat = DistMatrix.copy() # Lets make a copy so that we don't over-write the DistMatrix variable

denseMat[denseMat < 0.1] = 0 # Use boolean indexing to set matrix indecies to zero.

pl.imshow(denseMat)
pl.figure()
pl.imshow(DistMatrix)


Out[73]:
<matplotlib.image.AxesImage at 0x10d66c710>

In [74]:
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg
import numpy.linalg as linalg

In [75]:
sparseMat = sparse.csc_matrix(denseMat) #Construct a sparse matrix with column ordering.
b = np.ones((sparseMat.shape[0]))

In [76]:
%%time

# If this fails, attempt to make the matrix invertible by applying a small non-zero value to every diagonal entry. 
xsp = splinalg.spsolve(sparseMat, b)
print('done')
# sparseMat)


done
CPU times: user 17 ms, sys: 3.24 ms, total: 20.2 ms
Wall time: 22.4 ms

In [77]:
%%time

xdense = linalg.solve(denseMat,b)
    
print('done')


done
CPU times: user 90.8 ms, sys: 9.69 ms, total: 100 ms
Wall time: 50.1 ms

The above shows that you can get a speed increase by using sparse matricies. Note that it is strongly dependent upon the size and sparsity of the problem. Try again but this time set all values below 0.8 to zero, you should see a significant speed improvement.


In [78]:
# Verify that the results are numerically equivalent.
print(np.sum(np.abs(xsp-xdense)))


2.30344148323e-10

In [78]: