Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000L, 32L, 32L, 3L)
Training labels shape:  (50000L,)
Test data shape:  (10000L, 32L, 32L, 3L)
Test labels shape:  (10000L,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000L, 32L, 32L, 3L)
Train labels shape:  (49000L,)
Validation data shape:  (1000L, 32L, 32L, 3L)
Validation labels shape:  (1000L,)
Test data shape:  (1000L, 32L, 32L, 3L)
Test labels shape:  (1000L,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape
print 'dev data shape: ', X_dev.shape


Training data shape:  (49000L, 3072L)
Validation data shape:  (1000L, 3072L)
Test data shape:  (1000L, 3072L)
dev data shape:  (500L, 3072L)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape, X_dev.shape


(49000L, 3073L) (1000L, 3073L) (1000L, 3073L) (500L, 3073L)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [10]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
print 'loss: %f' % (loss)


loss: 9.506828

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [11]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
print "turn off regularization"
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
print "turn on regularization"
loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad)


turn off regularization
numerical: 4.599136 analytic: 4.599136, relative error: 3.710232e-11
numerical: -7.390522 analytic: -7.390522, relative error: 5.751318e-11
numerical: 16.330017 analytic: 16.330017, relative error: 1.110619e-11
numerical: -6.109036 analytic: -6.109036, relative error: 1.515201e-11
numerical: 3.063904 analytic: 3.063904, relative error: 5.157800e-11
numerical: 14.934586 analytic: 14.934586, relative error: 1.033247e-11
numerical: -13.372325 analytic: -13.372325, relative error: 2.215743e-11
numerical: -9.755030 analytic: -9.755030, relative error: 3.948335e-11
numerical: 26.150636 analytic: 26.150636, relative error: 1.043392e-11
numerical: -22.330322 analytic: -22.330322, relative error: 6.178976e-12
turn on regularization
numerical: -14.777298 analytic: -14.777298, relative error: 5.687665e-13
numerical: 7.619684 analytic: 7.619684, relative error: 3.466977e-11
numerical: 16.476399 analytic: 16.476399, relative error: 1.974904e-12
numerical: -6.256526 analytic: -6.256526, relative error: 1.697235e-11
numerical: 26.516512 analytic: 26.516512, relative error: 2.326355e-11
numerical: -31.557242 analytic: -31.557242, relative error: 4.677586e-12
numerical: -9.363191 analytic: -9.363191, relative error: 7.284015e-11
numerical: -5.073724 analytic: -5.073724, relative error: 9.169444e-12
numerical: -13.743991 analytic: -13.743991, relative error: 1.278379e-11
numerical: 16.107823 analytic: 16.107823, relative error: 1.187508e-11

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: It's possible that at some point the loss is near zero, and due to numerial issues the two grad calculation routines treats them differently


In [14]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 9.506828e+00 computed in 0.148000s
Vectorized loss: 9.506828e+00 computed in 0.003000s
difference: 0.000000

In [48]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 0.140000s
Vectorized loss and gradient: computed in 0.005000s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [55]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 794.174436
iteration 100 / 1500: loss 286.404953
iteration 200 / 1500: loss 107.771057
iteration 300 / 1500: loss 41.969605
iteration 400 / 1500: loss 18.849247
iteration 500 / 1500: loss 10.815091
iteration 600 / 1500: loss 7.169613
iteration 700 / 1500: loss 6.299983
iteration 800 / 1500: loss 5.213589
iteration 900 / 1500: loss 5.514441
iteration 1000 / 1500: loss 5.279779
iteration 1100 / 1500: loss 5.768382
iteration 1200 / 1500: loss 5.527183
iteration 1300 / 1500: loss 4.969076
iteration 1400 / 1500: loss 5.357511
That took 8.416000s

In [52]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [56]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.373367
validation accuracy: 0.380000

In [61]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [1e-7, 1e-5, 5e-5, 1e-3]
regularization_strengths = [5e4, 1e5, 5e5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for lr in learning_rates:
    for reg in regularization_strengths:
        svm = LinearSVM()
        loss_history = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                              num_iters=1500, verbose=True)
        y_train_pred = svm.predict(X_train)
        train_accuracy = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        val_accuracy = np.mean(y_val == y_val_pred)
        results[(lr,reg)] = (train_accuracy, val_accuracy)
        if val_accuracy > best_val:
            best_val = val_accuracy
            best_svm = svm
        
    
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 1500: loss 779.864474
iteration 100 / 1500: loss 285.851103
iteration 200 / 1500: loss 107.415822
iteration 300 / 1500: loss 42.298389
iteration 400 / 1500: loss 18.691055
iteration 500 / 1500: loss 10.277979
iteration 600 / 1500: loss 6.961604
iteration 700 / 1500: loss 6.109447
iteration 800 / 1500: loss 5.318733
iteration 900 / 1500: loss 6.108740
iteration 1000 / 1500: loss 6.073655
iteration 1100 / 1500: loss 5.254030
iteration 1200 / 1500: loss 5.353874
iteration 1300 / 1500: loss 5.306717
iteration 1400 / 1500: loss 5.519055
iteration 0 / 1500: loss 797.638494
iteration 100 / 1500: loss 291.059130
iteration 200 / 1500: loss 109.267815
iteration 300 / 1500: loss 43.222071
iteration 400 / 1500: loss 19.015789
iteration 500 / 1500: loss 10.162838
iteration 600 / 1500: loss 7.423294
iteration 700 / 1500: loss 6.361624
iteration 800 / 1500: loss 4.839429
iteration 900 / 1500: loss 5.396280
iteration 1000 / 1500: loss 5.125649
iteration 1100 / 1500: loss 5.937268
iteration 1200 / 1500: loss 4.923274
iteration 1300 / 1500: loss 4.766755
iteration 1400 / 1500: loss 5.528824
iteration 0 / 1500: loss 794.144056
iteration 100 / 1500: loss 289.706976
iteration 200 / 1500: loss 108.423708
iteration 300 / 1500: loss 43.312364
iteration 400 / 1500: loss 19.079869
iteration 500 / 1500: loss 9.986725
iteration 600 / 1500: loss 6.664552
iteration 700 / 1500: loss 5.580314
iteration 800 / 1500: loss 5.721751
iteration 900 / 1500: loss 5.631365
iteration 1000 / 1500: loss 5.126255
iteration 1100 / 1500: loss 5.497367
iteration 1200 / 1500: loss 4.990436
iteration 1300 / 1500: loss 5.302846
iteration 1400 / 1500: loss 5.485635
iteration 0 / 1500: loss 787.527822
iteration 100 / 1500: loss 287.282043
iteration 200 / 1500: loss 108.216315
iteration 300 / 1500: loss 42.862272
iteration 400 / 1500: loss 18.714567
iteration 500 / 1500: loss 10.680174
iteration 600 / 1500: loss 7.249856
iteration 700 / 1500: loss 6.355724
iteration 800 / 1500: loss 5.325528
iteration 900 / 1500: loss 5.594215
iteration 1000 / 1500: loss 5.935361
iteration 1100 / 1500: loss 5.678881
iteration 1200 / 1500: loss 5.325544
iteration 1300 / 1500: loss 4.963159
iteration 1400 / 1500: loss 5.382032
iteration 0 / 1500: loss 793.291272
iteration 100 / 1500: loss 287.893808
iteration 200 / 1500: loss 107.725371
iteration 300 / 1500: loss 42.660592
iteration 400 / 1500: loss 18.815985
iteration 500 / 1500: loss 10.401884
iteration 600 / 1500: loss 7.055314
iteration 700 / 1500: loss 6.242539
iteration 800 / 1500: loss 5.637845
iteration 900 / 1500: loss 5.277868
iteration 1000 / 1500: loss 5.559645
iteration 1100 / 1500: loss 5.608663
iteration 1200 / 1500: loss 5.180669
iteration 1300 / 1500: loss 5.242206
iteration 1400 / 1500: loss 5.774718
iteration 0 / 1500: loss 795.759381
iteration 100 / 1500: loss 290.654654
iteration 200 / 1500: loss 109.056789
iteration 300 / 1500: loss 43.167074
iteration 400 / 1500: loss 18.875440
iteration 500 / 1500: loss 10.686692
iteration 600 / 1500: loss 7.099673
iteration 700 / 1500: loss 5.739386
iteration 800 / 1500: loss 5.804567
iteration 900 / 1500: loss 5.517189
iteration 1000 / 1500: loss 5.519215
iteration 1100 / 1500: loss 5.334343
iteration 1200 / 1500: loss 4.727419
iteration 1300 / 1500: loss 5.468537
iteration 1400 / 1500: loss 5.300411
iteration 0 / 1500: loss 801.435208
iteration 100 / 1500: loss 292.045747
iteration 200 / 1500: loss 110.339629
iteration 300 / 1500: loss 42.749193
iteration 400 / 1500: loss 18.937467
iteration 500 / 1500: loss 10.258564
iteration 600 / 1500: loss 7.296504
iteration 700 / 1500: loss 6.160576
iteration 800 / 1500: loss 5.537966
iteration 900 / 1500: loss 5.671629
iteration 1000 / 1500: loss 5.811528
iteration 1100 / 1500: loss 5.446688
iteration 1200 / 1500: loss 4.704418
iteration 1300 / 1500: loss 5.468603
iteration 1400 / 1500: loss 4.718445
iteration 0 / 1500: loss 785.286336
iteration 100 / 1500: loss 287.093789
iteration 200 / 1500: loss 108.067859
iteration 300 / 1500: loss 42.922716
iteration 400 / 1500: loss 19.513349
iteration 500 / 1500: loss 9.938737
iteration 600 / 1500: loss 7.536334
iteration 700 / 1500: loss 5.528312
iteration 800 / 1500: loss 5.684186
iteration 900 / 1500: loss 5.230681
iteration 1000 / 1500: loss 5.393739
iteration 1100 / 1500: loss 5.196058
iteration 1200 / 1500: loss 5.045343
iteration 1300 / 1500: loss 5.226567
iteration 1400 / 1500: loss 5.413227
iteration 0 / 1500: loss 786.608998
iteration 100 / 1500: loss 286.748830
iteration 200 / 1500: loss 107.389306
iteration 300 / 1500: loss 42.610522
iteration 400 / 1500: loss 19.384199
iteration 500 / 1500: loss 10.764890
iteration 600 / 1500: loss 7.360814
iteration 700 / 1500: loss 5.865266
iteration 800 / 1500: loss 6.035735
iteration 900 / 1500: loss 6.049111
iteration 1000 / 1500: loss 5.001075
iteration 1100 / 1500: loss 5.020869
iteration 1200 / 1500: loss 5.501001
iteration 1300 / 1500: loss 5.385101
iteration 1400 / 1500: loss 5.524234
iteration 0 / 1500: loss 794.316240
iteration 100 / 1500: loss 289.693033
iteration 200 / 1500: loss 109.152671
iteration 300 / 1500: loss 42.073886
iteration 400 / 1500: loss 18.689674
iteration 500 / 1500: loss 9.899018
iteration 600 / 1500: loss 7.398940
iteration 700 / 1500: loss 5.965273
iteration 800 / 1500: loss 5.478874
iteration 900 / 1500: loss 5.179760
iteration 1000 / 1500: loss 5.420848
iteration 1100 / 1500: loss 5.326226
iteration 1200 / 1500: loss 5.319255
iteration 1300 / 1500: loss 6.021286
iteration 1400 / 1500: loss 5.454405
iteration 0 / 1500: loss 780.797889
iteration 100 / 1500: loss 285.453285
iteration 200 / 1500: loss 107.069565
iteration 300 / 1500: loss 43.248235
iteration 400 / 1500: loss 19.212525
iteration 500 / 1500: loss 10.097544
iteration 600 / 1500: loss 7.044086
iteration 700 / 1500: loss 6.029626
iteration 800 / 1500: loss 5.322473
iteration 900 / 1500: loss 5.256107
iteration 1000 / 1500: loss 4.890933
iteration 1100 / 1500: loss 5.625393
iteration 1200 / 1500: loss 5.367809
iteration 1300 / 1500: loss 5.389035
iteration 1400 / 1500: loss 5.193167
iteration 0 / 1500: loss 796.119174
iteration 100 / 1500: loss 291.805545
iteration 200 / 1500: loss 108.431287
iteration 300 / 1500: loss 42.517918
iteration 400 / 1500: loss 18.570273
iteration 500 / 1500: loss 9.918849
iteration 600 / 1500: loss 7.222853
iteration 700 / 1500: loss 5.979579
iteration 800 / 1500: loss 5.406795
iteration 900 / 1500: loss 5.311945
iteration 1000 / 1500: loss 5.428996
iteration 1100 / 1500: loss 5.144368
iteration 1200 / 1500: loss 5.632744
iteration 1300 / 1500: loss 5.436334
iteration 1400 / 1500: loss 4.711630
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.371102 val accuracy: 0.391000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.366122 val accuracy: 0.384000
lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.361776 val accuracy: 0.370000
lr 1.000000e-05 reg 5.000000e+04 train accuracy: 0.366204 val accuracy: 0.382000
lr 1.000000e-05 reg 1.000000e+05 train accuracy: 0.364816 val accuracy: 0.373000
lr 1.000000e-05 reg 5.000000e+05 train accuracy: 0.370245 val accuracy: 0.381000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.367286 val accuracy: 0.373000
lr 5.000000e-05 reg 1.000000e+05 train accuracy: 0.367816 val accuracy: 0.371000
lr 5.000000e-05 reg 5.000000e+05 train accuracy: 0.370102 val accuracy: 0.373000
lr 1.000000e-03 reg 5.000000e+04 train accuracy: 0.374020 val accuracy: 0.377000
lr 1.000000e-03 reg 1.000000e+05 train accuracy: 0.372286 val accuracy: 0.379000
lr 1.000000e-03 reg 5.000000e+05 train accuracy: 0.370408 val accuracy: 0.391000
best validation accuracy achieved during cross-validation: 0.391000

In [62]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [63]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.370000

In [64]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in