Multiclass Support Vector Machine exercise

(Adapted from Stanford University's CS231n Open Courseware)

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the HW page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir, num_of_batches=6)
# Increase num_of_batches to 6 if you have sufficient memory

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Subsample the data for more efficient code execution in this exercise.
num_training = 49000
#Increase this if you have memory: num_training = 49000
num_validation = 1000
num_test = 1000

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]
Out[6]:
<matplotlib.image.AxesImage at 0x7f8a6f5bc290>

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
# Also, lets transform both data matrices so that each image is a column.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]).T
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]).T
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]).T

print X_train.shape, X_val.shape, X_test.shape


(3073, 49000) (3073, 1000) (3073, 1000)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(10, 3073) * 0.0001 
loss, grad = svm_loss_naive(W, X_train, y_train, 0.00001)
print 'loss: %f' % (loss, )


loss: 9.286414

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_train, y_train, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_train, y_train, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)


numerical: 17.077670 analytic: 17.077459, relative error: 6.175482e-06
numerical: 15.681245 analytic: 15.681636, relative error: 1.245179e-05
numerical: 30.130899 analytic: 30.132981, relative error: 3.455011e-05
numerical: -30.671397 analytic: -30.668096, relative error: 5.381098e-05
numerical: 7.154661 analytic: 7.154970, relative error: 2.160007e-05
numerical: -43.446672 analytic: -43.447884, relative error: 1.395028e-05
numerical: -2.358590 analytic: -2.358097, relative error: 1.045627e-04
numerical: -10.541419 analytic: -10.542922, relative error: 7.129146e-05
numerical: -10.033011 analytic: -10.033905, relative error: 4.457561e-05
numerical: 13.805891 analytic: 13.805620, relative error: 9.838833e-06

Inline Question 1: It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: It's caused by the indiff'ble points in the loss function, for our loss function it's around the point 1. If our optimal point lies around 1 it would become a problem for us. For one dimension we could look at y=abs(x) at x=0 it's not diff'ble, analytical approach would give either 1 or -1 for the gradient around zero, but numerical approach would give something different, depending on the step size, something within [-1,1]


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 9.286414e+00 computed in 6.347606s
Vectorized loss: 9.286414e+00 computed in 0.844389s
difference: -0.000000

In [12]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 6.235493s
Vectorized loss and gradient: computed in 0.844451s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [13]:
# Now implement SGD in LinearSVM.train() function and run it with the code below
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 790.863147
iteration 100 / 1500: loss 288.697490
iteration 200 / 1500: loss 108.350841
iteration 300 / 1500: loss 42.885596
iteration 400 / 1500: loss 19.135786
iteration 500 / 1500: loss 9.727853
iteration 600 / 1500: loss 7.153678
iteration 700 / 1500: loss 6.119707
iteration 800 / 1500: loss 5.272005
iteration 900 / 1500: loss 5.134032
iteration 1000 / 1500: loss 5.356107
iteration 1100 / 1500: loss 5.357455
iteration 1200 / 1500: loss 5.287627
iteration 1300 / 1500: loss 5.417358
iteration 1400 / 1500: loss 5.922196
That took 7.354084s

In [14]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')


Out[14]:
<matplotlib.text.Text at 0x7f8a6f1d4d50>

In [15]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.365673
validation accuracy: 0.380000

In [16]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = np.arange(5)*2e-8+3e-8
regularization_strengths = np.arange(5)*5e3+2e4

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for lr in learning_rates:
    for rs in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train, y_train, learning_rate=lr, reg=rs,
                  num_iters=1500)
        results[(lr,rs)]= (np.mean(svm.predict(X_train)==y_train), 
                           np.mean(svm.predict(X_val)==y_val))
        if best_val < results[(lr,rs)][1]:
            best_val = results[(lr,rs)][1]
            best_svm = svm
            print best_val
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


0.325
0.346
0.372
0.386
0.388
0.393
0.396
lr 3.000000e-08 reg 2.000000e+04 train accuracy: 0.307551 val accuracy: 0.325000
lr 3.000000e-08 reg 2.500000e+04 train accuracy: 0.323980 val accuracy: 0.346000
lr 3.000000e-08 reg 3.000000e+04 train accuracy: 0.333082 val accuracy: 0.332000
lr 3.000000e-08 reg 3.500000e+04 train accuracy: 0.343306 val accuracy: 0.344000
lr 3.000000e-08 reg 4.000000e+04 train accuracy: 0.353408 val accuracy: 0.372000
lr 5.000000e-08 reg 2.000000e+04 train accuracy: 0.353429 val accuracy: 0.343000
lr 5.000000e-08 reg 2.500000e+04 train accuracy: 0.369000 val accuracy: 0.362000
lr 5.000000e-08 reg 3.000000e+04 train accuracy: 0.372735 val accuracy: 0.367000
lr 5.000000e-08 reg 3.500000e+04 train accuracy: 0.372980 val accuracy: 0.369000
lr 5.000000e-08 reg 4.000000e+04 train accuracy: 0.376429 val accuracy: 0.386000
lr 7.000000e-08 reg 2.000000e+04 train accuracy: 0.377673 val accuracy: 0.377000
lr 7.000000e-08 reg 2.500000e+04 train accuracy: 0.378020 val accuracy: 0.388000
lr 7.000000e-08 reg 3.000000e+04 train accuracy: 0.380286 val accuracy: 0.379000
lr 7.000000e-08 reg 3.500000e+04 train accuracy: 0.375980 val accuracy: 0.393000
lr 7.000000e-08 reg 4.000000e+04 train accuracy: 0.378408 val accuracy: 0.388000
lr 9.000000e-08 reg 2.000000e+04 train accuracy: 0.383041 val accuracy: 0.391000
lr 9.000000e-08 reg 2.500000e+04 train accuracy: 0.379510 val accuracy: 0.385000
lr 9.000000e-08 reg 3.000000e+04 train accuracy: 0.378020 val accuracy: 0.396000
lr 9.000000e-08 reg 3.500000e+04 train accuracy: 0.376102 val accuracy: 0.396000
lr 9.000000e-08 reg 4.000000e+04 train accuracy: 0.370082 val accuracy: 0.386000
lr 1.100000e-07 reg 2.000000e+04 train accuracy: 0.386755 val accuracy: 0.392000
lr 1.100000e-07 reg 2.500000e+04 train accuracy: 0.375163 val accuracy: 0.370000
lr 1.100000e-07 reg 3.000000e+04 train accuracy: 0.373714 val accuracy: 0.380000
lr 1.100000e-07 reg 3.500000e+04 train accuracy: 0.376878 val accuracy: 0.379000
lr 1.100000e-07 reg 4.000000e+04 train accuracy: 0.373449 val accuracy: 0.373000
best validation accuracy achieved during cross-validation: 0.396000

In [17]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
sz = [results[x][0]*1500 for x in results] # default size of markers is 20
plt.subplot(1,2,1)
plt.scatter(x_scatter, y_scatter, sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
sz = [results[x][1]*1500 for x in results] # default size of markers is 20
plt.subplot(1,2,2)
plt.scatter(x_scatter, y_scatter, sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')


Out[17]:
<matplotlib.text.Text at 0x7f8a6f0b0510>

In [18]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.381000

In [19]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:,:-1] # strip out the bias
w = w.reshape(10, 32, 32, 3)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: They look like the average images of that class, basically learned templates. Classifier combines each image in training set into one to get a generic template for that class, therefore they do not look like exactly a representetive of that class but rather a generalized version of that class


In [ ]: