Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape
print 'dev data shape: ', X_dev.shape


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape, X_dev.shape


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
print 'loss: %f' % (loss, )


loss: 9.171244

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: -0.000051 analytic: -0.000051, relative error: 7.867722e-06
numerical: 21.163164 analytic: 21.163164, relative error: 1.414897e-11
numerical: 14.525160 analytic: 14.525160, relative error: 7.887427e-13
numerical: 2.794479 analytic: 2.794479, relative error: 8.103725e-11
numerical: -3.191612 analytic: -3.191612, relative error: 3.865458e-11
numerical: 17.117920 analytic: 17.117920, relative error: 1.131278e-11
numerical: 17.823594 analytic: 17.823594, relative error: 2.259455e-11
numerical: -3.616170 analytic: -3.616170, relative error: 3.390688e-11
numerical: 28.993560 analytic: 28.993560, relative error: 4.043457e-12
numerical: 14.324524 analytic: 14.324524, relative error: 1.572051e-12
numerical: -19.355812 analytic: -19.355812, relative error: 1.271480e-11
numerical: 4.454825 analytic: 4.454825, relative error: 9.995042e-12
numerical: -35.683108 analytic: -35.683108, relative error: 8.422623e-12
numerical: 12.935572 analytic: 12.935572, relative error: 1.889000e-11
numerical: -17.238368 analytic: -17.238368, relative error: 6.503895e-12
numerical: -1.445337 analytic: -1.468588, relative error: 7.979004e-03
numerical: -19.042456 analytic: -19.042456, relative error: 3.362702e-12
numerical: -35.648920 analytic: -35.648920, relative error: 1.316916e-11
numerical: -16.854318 analytic: -16.854318, relative error: 8.587038e-12
numerical: -19.260851 analytic: -19.260851, relative error: 1.109076e-11

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 9.171244e+00 computed in 0.128827s
Vectorized loss: 9.171244e+00 computed in 0.003361s
difference: 0.000000

In [12]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 0.136206s
Vectorized loss and gradient: computed in 0.003979s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [13]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 790.874399
iteration 100 / 1500: loss 287.471280
iteration 200 / 1500: loss 108.357853
iteration 300 / 1500: loss 42.599154
iteration 400 / 1500: loss 18.691564
iteration 500 / 1500: loss 10.392634
iteration 600 / 1500: loss 6.608073
iteration 700 / 1500: loss 6.068230
iteration 800 / 1500: loss 5.660794
iteration 900 / 1500: loss 5.366910
iteration 1000 / 1500: loss 5.354727
iteration 1100 / 1500: loss 5.675647
iteration 1200 / 1500: loss 5.159184
iteration 1300 / 1500: loss 5.328340
iteration 1400 / 1500: loss 5.107505
That took 5.328559s

In [14]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [15]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.365265
validation accuracy: 0.379000

In [16]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [5e-8, 1e-7, 5e-5]
regularization_strengths = [3.5e4, 5e4, 1e5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for learning_rate in learning_rates:
    for regularization_strength in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate, reg=regularization_strength,
                             num_iters=1500, verbose=True)
        y_train_pred = svm.predict(X_train)
        y_val_pred = svm.predict(X_val)
        training_accuracy = np.mean(y_train == y_train_pred)
        validation_accuracy = np.mean(y_val == y_val_pred)
        results[(learning_rate, regularization_strength)] = (training_accuracy, validation_accuracy)
        if best_val < validation_accuracy:
            best_val = validation_accuracy
            best_svm = svm
pass
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 1500: loss 566.155019
iteration 100 / 1500: loss 391.901991
iteration 200 / 1500: loss 277.350370
iteration 300 / 1500: loss 197.314456
iteration 400 / 1500: loss 139.668441
iteration 500 / 1500: loss 98.822404
iteration 600 / 1500: loss 71.513056
iteration 700 / 1500: loss 50.998967
iteration 800 / 1500: loss 37.772717
iteration 900 / 1500: loss 27.941215
iteration 1000 / 1500: loss 21.097516
iteration 1100 / 1500: loss 16.241830
iteration 1200 / 1500: loss 13.664102
iteration 1300 / 1500: loss 10.972952
iteration 1400 / 1500: loss 8.759204
iteration 0 / 1500: loss 784.385300
iteration 100 / 1500: loss 473.085438
iteration 200 / 1500: loss 288.539027
iteration 300 / 1500: loss 175.776010
iteration 400 / 1500: loss 107.916615
iteration 500 / 1500: loss 67.092759
iteration 600 / 1500: loss 43.034744
iteration 700 / 1500: loss 27.872031
iteration 800 / 1500: loss 19.209895
iteration 900 / 1500: loss 13.838569
iteration 1000 / 1500: loss 9.644572
iteration 1100 / 1500: loss 8.459474
iteration 1200 / 1500: loss 7.262233
iteration 1300 / 1500: loss 6.394233
iteration 1400 / 1500: loss 6.138331
iteration 0 / 1500: loss 1566.731850
iteration 100 / 1500: loss 572.475839
iteration 200 / 1500: loss 212.651532
iteration 300 / 1500: loss 81.737924
iteration 400 / 1500: loss 33.751680
iteration 500 / 1500: loss 15.493387
iteration 600 / 1500: loss 9.457984
iteration 700 / 1500: loss 6.934639
iteration 800 / 1500: loss 6.085382
iteration 900 / 1500: loss 5.664720
iteration 1000 / 1500: loss 5.610510
iteration 1100 / 1500: loss 5.611987
iteration 1200 / 1500: loss 5.308443
iteration 1300 / 1500: loss 5.228846
iteration 1400 / 1500: loss 5.591309
iteration 0 / 1500: loss 563.491722
iteration 100 / 1500: loss 273.812079
iteration 200 / 1500: loss 137.962714
iteration 300 / 1500: loss 70.255513
iteration 400 / 1500: loss 37.226193
iteration 500 / 1500: loss 21.144231
iteration 600 / 1500: loss 12.894006
iteration 700 / 1500: loss 9.304471
iteration 800 / 1500: loss 7.659542
iteration 900 / 1500: loss 6.380875
iteration 1000 / 1500: loss 5.512303
iteration 1100 / 1500: loss 5.246071
iteration 1200 / 1500: loss 5.064759
iteration 1300 / 1500: loss 5.246632
iteration 1400 / 1500: loss 4.861220
iteration 0 / 1500: loss 799.716783
iteration 100 / 1500: loss 290.155717
iteration 200 / 1500: loss 109.333304
iteration 300 / 1500: loss 42.849722
iteration 400 / 1500: loss 19.231389
iteration 500 / 1500: loss 10.232364
iteration 600 / 1500: loss 6.712971
iteration 700 / 1500: loss 6.222013
iteration 800 / 1500: loss 5.582574
iteration 900 / 1500: loss 4.817785
iteration 1000 / 1500: loss 5.759856
iteration 1100 / 1500: loss 5.525560
iteration 1200 / 1500: loss 5.080675
iteration 1300 / 1500: loss 5.087031
iteration 1400 / 1500: loss 5.190979
iteration 0 / 1500: loss 1571.874899
iteration 100 / 1500: loss 211.627338
iteration 200 / 1500: loss 32.926329
iteration 300 / 1500: loss 8.897025
iteration 400 / 1500: loss 5.724086
iteration 500 / 1500: loss 5.403795
iteration 600 / 1500: loss 5.677281
iteration 700 / 1500: loss 5.143282
iteration 800 / 1500: loss 6.366768
iteration 900 / 1500: loss 5.532893
iteration 1000 / 1500: loss 5.335180
iteration 1100 / 1500: loss 5.606442
iteration 1200 / 1500: loss 5.366788
iteration 1300 / 1500: loss 5.298731
iteration 1400 / 1500: loss 5.928408
iteration 0 / 1500: loss 558.402699
iteration 100 / 1500: loss 7630.382560
iteration 200 / 1500: loss 7657.920181
iteration 300 / 1500: loss 7801.883298
iteration 400 / 1500: loss 7430.029296
iteration 500 / 1500: loss 7702.750377
iteration 600 / 1500: loss 7645.996903
iteration 700 / 1500: loss 6883.863059
iteration 800 / 1500: loss 7401.757114
iteration 900 / 1500: loss 7308.549608
iteration 1000 / 1500: loss 7439.274929
iteration 1100 / 1500: loss 7819.405661
iteration 1200 / 1500: loss 6957.774304
iteration 1300 / 1500: loss 8097.783431
iteration 1400 / 1500: loss 7217.727346
iteration 0 / 1500: loss 791.694784
iteration 100 / 1500: loss 403523846296697090655903703593518104576.000000
iteration 200 / 1500: loss 66699260002166184582787554408675234189127617037337326658644172292694736896.000000
iteration 300 / 1500: loss 11024853489242177069093964025168753945643099152548205102396691827178998330337231325848468651315224248831705088.000000
iteration 400 / 1500: loss 1822319984589153217153631692631788319815235050611524745781457828392691057431421863228599779281410337881441889292783376404284852179649391050620928.000000
iteration 500 / 1500: loss 301214898635472034754075425151114350917680089679004991670046220578727501746633138811832589050035357088108277753859839245173227149076580822963018945739104294966598495968027291418624.000000
iteration 600 / 1500: loss 49788410338063156166629645504489022640221929770534256104389006059721654922315360184974730727125546479439895428606849009233756390258283136081650952114224481973765788895936837152293784006807277718734921983808688357376.000000
iteration 700 / 1500: loss 8229625477427935525020001620133923625580430800873664946966766060047383305188935862189577301293549846984350300340564231608898521432736329309931063220143778983112438886300782210615503588397487877138674320825683708037423594636473232383178553169866129408.000000
iteration 800 / 1500: loss 1360291180997076346596962090431269588538036475497811068600305917640618979304503725237768544013198888161899583137524211609625221721311968498516617835043345361235557977697132019480052258620038777302530359634620059188869498204653926804349590900040465901907859085714745135438503031351541760.000000
cs231n/classifiers/linear_svm.py:123: RuntimeWarning: overflow encountered in double_scalars
  RW = 0.5 * reg * np.sum(W * W)
cs231n/classifiers/linear_svm.py:123: RuntimeWarning: overflow encountered in multiply
  RW = 0.5 * reg * np.sum(W * W)
iteration 900 / 1500: loss inf
iteration 1000 / 1500: loss inf
iteration 1100 / 1500: loss inf
iteration 1200 / 1500: loss inf
iteration 1300 / 1500: loss inf
iteration 1400 / 1500: loss inf
iteration 0 / 1500: loss 1556.489733
iteration 100 / 1500: loss 4310919428721672232395146515490094664716743523558413109760285073800092815392426467130018501827444097281324944331313959141376.000000
iteration 200 / 1500: loss 11131871169259023745728856047113180548975585484214765852219040040066681063272355423249628903536759626988337395941109981795023584644708879583298743269944491983960761548948856801746764531276755346395811278824449550209860276174629922547985207525376.000000
iteration 300 / 1500: loss inf
iteration 400 / 1500: loss inf
iteration 500 / 1500: loss inf
iteration 600 / 1500: loss nan
iteration 700 / 1500: loss nan
iteration 800 / 1500: loss nan
iteration 900 / 1500: loss nan
iteration 1000 / 1500: loss nan
iteration 1100 / 1500: loss nan
iteration 1200 / 1500: loss nan
iteration 1300 / 1500: loss nan
iteration 1400 / 1500: loss nan
lr 5.000000e-08 reg 3.500000e+04 train accuracy: 0.375429 val accuracy: 0.376000
lr 5.000000e-08 reg 5.000000e+04 train accuracy: 0.374082 val accuracy: 0.395000
lr 5.000000e-08 reg 1.000000e+05 train accuracy: 0.357735 val accuracy: 0.366000
lr 1.000000e-07 reg 3.500000e+04 train accuracy: 0.378245 val accuracy: 0.382000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.370143 val accuracy: 0.372000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.354429 val accuracy: 0.377000
lr 5.000000e-05 reg 3.500000e+04 train accuracy: 0.072878 val accuracy: 0.060000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.067449 val accuracy: 0.067000
lr 5.000000e-05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
best validation accuracy achieved during cross-validation: 0.395000
cs231n/classifiers/linear_svm.py:124: RuntimeWarning: overflow encountered in multiply
  dRW = 0.5 * reg * (2 * W)
cs231n/classifiers/linear_svm.py:108: RuntimeWarning: invalid value encountered in greater
  dmargins[margins>0]               = 1
cs231n/classifiers/linear_svm.py:109: RuntimeWarning: invalid value encountered in greater
  dmargins[np.arange(num_train), y] = -np.sum(margins>0, axis=1)
cs231n/classifiers/linear_classifier.py:71: RuntimeWarning: invalid value encountered in add
  self.W += step

In [17]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [18]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.372000

In [19]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in