Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape
print 'dev data shape: ', X_dev.shape


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image.shape
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


(3072,)
[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape, X_dev.shape


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
print 'loss: %f' % (loss, )


loss: 9.051161

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: -2.495144 analytic: -2.495144, relative error: 6.941736e-11
numerical: -22.304320 analytic: -22.304320, relative error: 6.214213e-12
numerical: 4.146493 analytic: 4.146493, relative error: 1.758731e-11
numerical: 0.349295 analytic: 0.349295, relative error: 3.550620e-11
numerical: -2.100415 analytic: -2.100415, relative error: 3.153795e-11
numerical: -13.984143 analytic: -13.984143, relative error: 6.369363e-12
numerical: 21.250213 analytic: 21.250213, relative error: 9.040437e-12
numerical: 8.724775 analytic: 8.724775, relative error: 3.554841e-12
numerical: -8.022198 analytic: -8.022198, relative error: 3.591221e-11
numerical: -6.405024 analytic: -6.405024, relative error: 5.375089e-11
numerical: 5.555533 analytic: 5.555533, relative error: 3.914509e-11
numerical: 2.893185 analytic: 2.893185, relative error: 4.892375e-11
numerical: -11.789664 analytic: -11.789664, relative error: 2.276785e-12
numerical: -17.449637 analytic: -17.449637, relative error: 9.119772e-12
numerical: 5.924505 analytic: 5.924505, relative error: 4.745780e-11
numerical: 5.764991 analytic: 5.764991, relative error: 1.750469e-11
numerical: -7.727453 analytic: -7.743791, relative error: 1.056029e-03
numerical: -7.653441 analytic: -7.653441, relative error: 7.453283e-13
numerical: 8.282169 analytic: 8.282169, relative error: 2.362333e-11
numerical: 5.811707 analytic: 5.811707, relative error: 2.446992e-11

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 9.051161e+00 computed in 0.079774s
Vectorized loss: 9.051161e+00 computed in 0.004407s
difference: 0.000000

In [12]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 0.094812s
Vectorized loss and gradient: computed in 0.003343s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [13]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 788.547209
iteration 100 / 1500: loss 286.760887
iteration 200 / 1500: loss 108.035358
iteration 300 / 1500: loss 42.475163
iteration 400 / 1500: loss 19.146189
iteration 500 / 1500: loss 10.016574
iteration 600 / 1500: loss 7.219889
iteration 700 / 1500: loss 6.206562
iteration 800 / 1500: loss 5.522409
iteration 900 / 1500: loss 6.175363
iteration 1000 / 1500: loss 5.189749
iteration 1100 / 1500: loss 5.265122
iteration 1200 / 1500: loss 5.431770
iteration 1300 / 1500: loss 5.611469
iteration 1400 / 1500: loss 5.030703
That took 3.318122s

In [14]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [17]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.369102
validation accuracy: 0.362000

In [21]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [1e-8, 1e-7, 2e-7]
regularization_strengths = [1e4, 2e4, 3e4, 4e4, 5e4, 6e4, 7e4, 8e4, 9e4, 1e5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for each_learning_rate in learning_rates:
    for each_regularization_strengths in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=each_learning_rate, 
                              reg=each_regularization_strengths,
                              num_iters=1500, verbose=True)
        y_train_pred = svm.predict(X_train)
        training_accuracy = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        validation_accuracy = np.mean(y_val == y_val_pred)
        results[(each_learning_rate,each_regularization_strengths)]=(training_accuracy,validation_accuracy)
        if best_val < validation_accuracy:
            best_val = validation_accuracy
            best_svm = svm
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 1500: loss 176.120983
iteration 100 / 1500: loss 169.685257
iteration 200 / 1500: loss 161.632401
iteration 300 / 1500: loss 158.634342
iteration 400 / 1500: loss 155.237401
iteration 500 / 1500: loss 150.002179
iteration 600 / 1500: loss 149.238158
iteration 700 / 1500: loss 145.939968
iteration 800 / 1500: loss 140.119240
iteration 900 / 1500: loss 138.851965
iteration 1000 / 1500: loss 137.019864
iteration 1100 / 1500: loss 133.539412
iteration 1200 / 1500: loss 129.194157
iteration 1300 / 1500: loss 128.946585
iteration 1400 / 1500: loss 124.776311
iteration 0 / 1500: loss 328.013999
iteration 100 / 1500: loss 312.661926
iteration 200 / 1500: loss 296.999375
iteration 300 / 1500: loss 283.538568
iteration 400 / 1500: loss 272.461575
iteration 500 / 1500: loss 261.584414
iteration 600 / 1500: loss 249.821764
iteration 700 / 1500: loss 241.221036
iteration 800 / 1500: loss 229.772467
iteration 900 / 1500: loss 222.456696
iteration 1000 / 1500: loss 212.554677
iteration 1100 / 1500: loss 202.519501
iteration 1200 / 1500: loss 195.956494
iteration 1300 / 1500: loss 188.663408
iteration 1400 / 1500: loss 181.205729
iteration 0 / 1500: loss 492.787661
iteration 100 / 1500: loss 458.525495
iteration 200 / 1500: loss 430.418090
iteration 300 / 1500: loss 407.320238
iteration 400 / 1500: loss 381.472619
iteration 500 / 1500: loss 360.017589
iteration 600 / 1500: loss 338.394263
iteration 700 / 1500: loss 318.200442
iteration 800 / 1500: loss 300.069584
iteration 900 / 1500: loss 284.000137
iteration 1000 / 1500: loss 266.092504
iteration 1100 / 1500: loss 251.269542
iteration 1200 / 1500: loss 237.013468
iteration 1300 / 1500: loss 223.387286
iteration 1400 / 1500: loss 210.603406
iteration 0 / 1500: loss 640.483252
iteration 100 / 1500: loss 585.293889
iteration 200 / 1500: loss 537.115914
iteration 300 / 1500: loss 497.157341
iteration 400 / 1500: loss 459.081702
iteration 500 / 1500: loss 422.617782
iteration 600 / 1500: loss 389.722805
iteration 700 / 1500: loss 360.109414
iteration 800 / 1500: loss 331.680923
iteration 900 / 1500: loss 307.077356
iteration 1000 / 1500: loss 283.694813
iteration 1100 / 1500: loss 262.046126
iteration 1200 / 1500: loss 242.143422
iteration 1300 / 1500: loss 223.693272
iteration 1400 / 1500: loss 206.995765
iteration 0 / 1500: loss 799.703069
iteration 100 / 1500: loss 720.256898
iteration 200 / 1500: loss 651.152918
iteration 300 / 1500: loss 588.257558
iteration 400 / 1500: loss 531.457757
iteration 500 / 1500: loss 482.005953
iteration 600 / 1500: loss 436.548754
iteration 700 / 1500: loss 395.833944
iteration 800 / 1500: loss 357.846756
iteration 900 / 1500: loss 322.689204
iteration 1000 / 1500: loss 293.855333
iteration 1100 / 1500: loss 265.099371
iteration 1200 / 1500: loss 239.792404
iteration 1300 / 1500: loss 217.429966
iteration 1400 / 1500: loss 197.910959
iteration 0 / 1500: loss 935.705205
iteration 100 / 1500: loss 824.438286
iteration 200 / 1500: loss 729.229289
iteration 300 / 1500: loss 647.775922
iteration 400 / 1500: loss 575.480126
iteration 500 / 1500: loss 510.288861
iteration 600 / 1500: loss 452.403324
iteration 700 / 1500: loss 400.570770
iteration 800 / 1500: loss 355.289410
iteration 900 / 1500: loss 315.672759
iteration 1000 / 1500: loss 279.611683
iteration 1100 / 1500: loss 248.789841
iteration 1200 / 1500: loss 221.127728
iteration 1300 / 1500: loss 196.162480
iteration 1400 / 1500: loss 174.529892
iteration 0 / 1500: loss 1112.982968
iteration 100 / 1500: loss 961.540837
iteration 200 / 1500: loss 834.488783
iteration 300 / 1500: loss 725.162448
iteration 400 / 1500: loss 630.244932
iteration 500 / 1500: loss 547.843886
iteration 600 / 1500: loss 476.254008
iteration 700 / 1500: loss 414.717333
iteration 800 / 1500: loss 360.612834
iteration 900 / 1500: loss 313.134938
iteration 1000 / 1500: loss 273.505960
iteration 1100 / 1500: loss 238.260919
iteration 1200 / 1500: loss 207.668998
iteration 1300 / 1500: loss 181.359103
iteration 1400 / 1500: loss 157.295801
iteration 0 / 1500: loss 1262.846385
iteration 100 / 1500: loss 1072.112281
iteration 200 / 1500: loss 913.260690
iteration 300 / 1500: loss 776.681710
iteration 400 / 1500: loss 662.980503
iteration 500 / 1500: loss 564.782676
iteration 600 / 1500: loss 482.319290
iteration 700 / 1500: loss 410.365808
iteration 800 / 1500: loss 350.341156
iteration 900 / 1500: loss 298.674313
iteration 1000 / 1500: loss 254.634359
iteration 1100 / 1500: loss 217.535560
iteration 1200 / 1500: loss 186.715401
iteration 1300 / 1500: loss 159.228234
iteration 1400 / 1500: loss 136.247437
iteration 0 / 1500: loss 1414.762097
iteration 100 / 1500: loss 1178.798992
iteration 200 / 1500: loss 983.134581
iteration 300 / 1500: loss 820.414663
iteration 400 / 1500: loss 686.374155
iteration 500 / 1500: loss 573.966387
iteration 600 / 1500: loss 479.420265
iteration 700 / 1500: loss 401.413206
iteration 800 / 1500: loss 335.487523
iteration 900 / 1500: loss 281.705069
iteration 1000 / 1500: loss 235.263595
iteration 1100 / 1500: loss 197.424309
iteration 1200 / 1500: loss 165.467252
iteration 1300 / 1500: loss 139.148557
iteration 1400 / 1500: loss 117.190579
iteration 0 / 1500: loss 1580.007512
iteration 100 / 1500: loss 1288.890841
iteration 200 / 1500: loss 1053.729928
iteration 300 / 1500: loss 861.038990
iteration 400 / 1500: loss 705.481976
iteration 500 / 1500: loss 578.821566
iteration 600 / 1500: loss 473.253926
iteration 700 / 1500: loss 388.251232
iteration 800 / 1500: loss 319.261592
iteration 900 / 1500: loss 261.396728
iteration 1000 / 1500: loss 214.811398
iteration 1100 / 1500: loss 177.113227
iteration 1200 / 1500: loss 145.888573
iteration 1300 / 1500: loss 120.004485
iteration 1400 / 1500: loss 99.344141
iteration 0 / 1500: loss 177.553924
iteration 100 / 1500: loss 137.849664
iteration 200 / 1500: loss 110.067465
iteration 300 / 1500: loss 91.090561
iteration 400 / 1500: loss 73.944942
iteration 500 / 1500: loss 61.915645
iteration 600 / 1500: loss 51.185400
iteration 700 / 1500: loss 42.846864
iteration 800 / 1500: loss 35.973217
iteration 900 / 1500: loss 29.772764
iteration 1000 / 1500: loss 25.232604
iteration 1100 / 1500: loss 21.517225
iteration 1200 / 1500: loss 18.316968
iteration 1300 / 1500: loss 15.300411
iteration 1400 / 1500: loss 14.207536
iteration 0 / 1500: loss 328.361067
iteration 100 / 1500: loss 215.944218
iteration 200 / 1500: loss 145.455492
iteration 300 / 1500: loss 96.380065
iteration 400 / 1500: loss 66.476663
iteration 500 / 1500: loss 45.691944
iteration 600 / 1500: loss 32.473700
iteration 700 / 1500: loss 23.525910
iteration 800 / 1500: loss 16.758071
iteration 900 / 1500: loss 13.179873
iteration 1000 / 1500: loss 9.906775
iteration 1100 / 1500: loss 8.358244
iteration 1200 / 1500: loss 7.129248
iteration 1300 / 1500: loss 6.347188
iteration 1400 / 1500: loss 6.459734
iteration 0 / 1500: loss 484.053097
iteration 100 / 1500: loss 261.215812
iteration 200 / 1500: loss 144.012168
iteration 300 / 1500: loss 81.642002
iteration 400 / 1500: loss 47.051179
iteration 500 / 1500: loss 27.362890
iteration 600 / 1500: loss 17.578336
iteration 700 / 1500: loss 12.132635
iteration 800 / 1500: loss 8.477293
iteration 900 / 1500: loss 7.091597
iteration 1000 / 1500: loss 5.711062
iteration 1100 / 1500: loss 5.546180
iteration 1200 / 1500: loss 5.266987
iteration 1300 / 1500: loss 5.109723
iteration 1400 / 1500: loss 5.230221
iteration 0 / 1500: loss 636.809024
iteration 100 / 1500: loss 282.532967
iteration 200 / 1500: loss 128.328397
iteration 300 / 1500: loss 60.292910
iteration 400 / 1500: loss 30.294637
iteration 500 / 1500: loss 16.301536
iteration 600 / 1500: loss 10.075493
iteration 700 / 1500: loss 7.286424
iteration 800 / 1500: loss 6.645830
iteration 900 / 1500: loss 6.154378
iteration 1000 / 1500: loss 5.075401
iteration 1100 / 1500: loss 5.760521
iteration 1200 / 1500: loss 5.067139
iteration 1300 / 1500: loss 4.656306
iteration 1400 / 1500: loss 5.175984
iteration 0 / 1500: loss 781.347507
iteration 100 / 1500: loss 286.672815
iteration 200 / 1500: loss 107.454762
iteration 300 / 1500: loss 42.296004
iteration 400 / 1500: loss 19.053853
iteration 500 / 1500: loss 10.596659
iteration 600 / 1500: loss 7.347544
iteration 700 / 1500: loss 5.759106
iteration 800 / 1500: loss 5.281825
iteration 900 / 1500: loss 5.952305
iteration 1000 / 1500: loss 5.040624
iteration 1100 / 1500: loss 5.398812
iteration 1200 / 1500: loss 5.534688
iteration 1300 / 1500: loss 5.276669
iteration 1400 / 1500: loss 4.748151
iteration 0 / 1500: loss 939.794740
iteration 100 / 1500: loss 280.490961
iteration 200 / 1500: loss 87.437495
iteration 300 / 1500: loss 30.450372
iteration 400 / 1500: loss 12.723677
iteration 500 / 1500: loss 7.442071
iteration 600 / 1500: loss 6.315760
iteration 700 / 1500: loss 5.762072
iteration 800 / 1500: loss 5.991873
iteration 900 / 1500: loss 5.083493
iteration 1000 / 1500: loss 5.395866
iteration 1100 / 1500: loss 5.197711
iteration 1200 / 1500: loss 5.261455
iteration 1300 / 1500: loss 5.263247
iteration 1400 / 1500: loss 5.728991
iteration 0 / 1500: loss 1106.866375
iteration 100 / 1500: loss 270.946825
iteration 200 / 1500: loss 69.997203
iteration 300 / 1500: loss 21.038512
iteration 400 / 1500: loss 9.021746
iteration 500 / 1500: loss 6.165296
iteration 600 / 1500: loss 5.358720
iteration 700 / 1500: loss 5.644313
iteration 800 / 1500: loss 5.896816
iteration 900 / 1500: loss 5.395449
iteration 1000 / 1500: loss 4.811775
iteration 1100 / 1500: loss 5.626286
iteration 1200 / 1500: loss 5.147768
iteration 1300 / 1500: loss 5.401375
iteration 1400 / 1500: loss 5.775746
iteration 0 / 1500: loss 1247.687716
iteration 100 / 1500: loss 251.525596
iteration 200 / 1500: loss 54.375141
iteration 300 / 1500: loss 15.616296
iteration 400 / 1500: loss 7.174472
iteration 500 / 1500: loss 6.112924
iteration 600 / 1500: loss 5.921699
iteration 700 / 1500: loss 6.198691
iteration 800 / 1500: loss 5.209567
iteration 900 / 1500: loss 5.533387
iteration 1000 / 1500: loss 6.411384
iteration 1100 / 1500: loss 5.947705
iteration 1200 / 1500: loss 6.191201
iteration 1300 / 1500: loss 6.065384
iteration 1400 / 1500: loss 5.099074
iteration 0 / 1500: loss 1400.302395
iteration 100 / 1500: loss 231.365736
iteration 200 / 1500: loss 41.980542
iteration 300 / 1500: loss 11.669036
iteration 400 / 1500: loss 6.440091
iteration 500 / 1500: loss 5.393678
iteration 600 / 1500: loss 5.497438
iteration 700 / 1500: loss 6.107789
iteration 800 / 1500: loss 5.018149
iteration 900 / 1500: loss 5.653927
iteration 1000 / 1500: loss 5.285722
iteration 1100 / 1500: loss 5.554749
iteration 1200 / 1500: loss 6.109031
iteration 1300 / 1500: loss 6.006883
iteration 1400 / 1500: loss 5.291361
iteration 0 / 1500: loss 1548.910556
iteration 100 / 1500: loss 209.162627
iteration 200 / 1500: loss 32.569024
iteration 300 / 1500: loss 9.295619
iteration 400 / 1500: loss 6.034522
iteration 500 / 1500: loss 5.804483
iteration 600 / 1500: loss 5.780843
iteration 700 / 1500: loss 5.485966
iteration 800 / 1500: loss 5.340102
iteration 900 / 1500: loss 5.630342
iteration 1000 / 1500: loss 5.984599
iteration 1100 / 1500: loss 5.521604
iteration 1200 / 1500: loss 5.621409
iteration 1300 / 1500: loss 5.847182
iteration 1400 / 1500: loss 5.720156
iteration 0 / 1500: loss 180.128259
iteration 100 / 1500: loss 111.090500
iteration 200 / 1500: loss 74.328971
iteration 300 / 1500: loss 51.592031
iteration 400 / 1500: loss 35.412229
iteration 500 / 1500: loss 25.231528
iteration 600 / 1500: loss 18.587356
iteration 700 / 1500: loss 13.996705
iteration 800 / 1500: loss 10.802498
iteration 900 / 1500: loss 8.896283
iteration 1000 / 1500: loss 7.069470
iteration 1100 / 1500: loss 6.995631
iteration 1200 / 1500: loss 6.633333
iteration 1300 / 1500: loss 5.820007
iteration 1400 / 1500: loss 5.265941
iteration 0 / 1500: loss 332.201435
iteration 100 / 1500: loss 142.809481
iteration 200 / 1500: loss 66.245248
iteration 300 / 1500: loss 32.006369
iteration 400 / 1500: loss 17.202041
iteration 500 / 1500: loss 10.026863
iteration 600 / 1500: loss 7.648875
iteration 700 / 1500: loss 5.908842
iteration 800 / 1500: loss 5.120011
iteration 900 / 1500: loss 5.112144
iteration 1000 / 1500: loss 5.083110
iteration 1100 / 1500: loss 5.538273
iteration 1200 / 1500: loss 5.941043
iteration 1300 / 1500: loss 5.402443
iteration 1400 / 1500: loss 4.799985
iteration 0 / 1500: loss 481.060835
iteration 100 / 1500: loss 144.267666
iteration 200 / 1500: loss 45.836457
iteration 300 / 1500: loss 17.530832
iteration 400 / 1500: loss 9.341019
iteration 500 / 1500: loss 6.115951
iteration 600 / 1500: loss 5.612066
iteration 700 / 1500: loss 5.568188
iteration 800 / 1500: loss 5.397262
iteration 900 / 1500: loss 5.084852
iteration 1000 / 1500: loss 5.435457
iteration 1100 / 1500: loss 5.030245
iteration 1200 / 1500: loss 5.539864
iteration 1300 / 1500: loss 4.993139
iteration 1400 / 1500: loss 5.208256
iteration 0 / 1500: loss 622.805553
iteration 100 / 1500: loss 126.319147
iteration 200 / 1500: loss 29.272921
iteration 300 / 1500: loss 10.257683
iteration 400 / 1500: loss 6.560122
iteration 500 / 1500: loss 5.894140
iteration 600 / 1500: loss 5.454735
iteration 700 / 1500: loss 5.544993
iteration 800 / 1500: loss 5.455217
iteration 900 / 1500: loss 5.703175
iteration 1000 / 1500: loss 5.644603
iteration 1100 / 1500: loss 5.498679
iteration 1200 / 1500: loss 5.317610
iteration 1300 / 1500: loss 5.187199
iteration 1400 / 1500: loss 5.139568
iteration 0 / 1500: loss 802.359403
iteration 100 / 1500: loss 109.344829
iteration 200 / 1500: loss 18.734086
iteration 300 / 1500: loss 7.590997
iteration 400 / 1500: loss 5.365448
iteration 500 / 1500: loss 5.009096
iteration 600 / 1500: loss 5.138644
iteration 700 / 1500: loss 5.708924
iteration 800 / 1500: loss 5.728193
iteration 900 / 1500: loss 4.910279
iteration 1000 / 1500: loss 5.273518
iteration 1100 / 1500: loss 5.095637
iteration 1200 / 1500: loss 5.192948
iteration 1300 / 1500: loss 5.375208
iteration 1400 / 1500: loss 4.980010
iteration 0 / 1500: loss 944.997779
iteration 100 / 1500: loss 87.279923
iteration 200 / 1500: loss 12.889225
iteration 300 / 1500: loss 6.404995
iteration 400 / 1500: loss 5.492938
iteration 500 / 1500: loss 5.524193
iteration 600 / 1500: loss 5.439333
iteration 700 / 1500: loss 5.248099
iteration 800 / 1500: loss 5.459192
iteration 900 / 1500: loss 5.044554
iteration 1000 / 1500: loss 5.575539
iteration 1100 / 1500: loss 5.229299
iteration 1200 / 1500: loss 6.143764
iteration 1300 / 1500: loss 5.543644
iteration 1400 / 1500: loss 5.412907
iteration 0 / 1500: loss 1103.621685
iteration 100 / 1500: loss 69.549848
iteration 200 / 1500: loss 8.914633
iteration 300 / 1500: loss 5.760557
iteration 400 / 1500: loss 6.033440
iteration 500 / 1500: loss 4.977725
iteration 600 / 1500: loss 6.166196
iteration 700 / 1500: loss 5.729241
iteration 800 / 1500: loss 5.266661
iteration 900 / 1500: loss 4.831147
iteration 1000 / 1500: loss 5.429612
iteration 1100 / 1500: loss 5.147813
iteration 1200 / 1500: loss 5.107640
iteration 1300 / 1500: loss 5.632759
iteration 1400 / 1500: loss 6.284937
iteration 0 / 1500: loss 1253.782118
iteration 100 / 1500: loss 53.839271
iteration 200 / 1500: loss 7.381531
iteration 300 / 1500: loss 5.381197
iteration 400 / 1500: loss 5.540454
iteration 500 / 1500: loss 5.987198
iteration 600 / 1500: loss 5.404652
iteration 700 / 1500: loss 5.720546
iteration 800 / 1500: loss 5.644315
iteration 900 / 1500: loss 5.403384
iteration 1000 / 1500: loss 5.520004
iteration 1100 / 1500: loss 5.597001
iteration 1200 / 1500: loss 5.860383
iteration 1300 / 1500: loss 6.170345
iteration 1400 / 1500: loss 5.433047
iteration 0 / 1500: loss 1400.073142
iteration 100 / 1500: loss 41.411141
iteration 200 / 1500: loss 7.024557
iteration 300 / 1500: loss 5.938772
iteration 400 / 1500: loss 5.888664
iteration 500 / 1500: loss 6.474928
iteration 600 / 1500: loss 5.467244
iteration 700 / 1500: loss 5.976998
iteration 800 / 1500: loss 6.142909
iteration 900 / 1500: loss 5.502061
iteration 1000 / 1500: loss 5.469548
iteration 1100 / 1500: loss 5.460923
iteration 1200 / 1500: loss 5.689661
iteration 1300 / 1500: loss 5.392410
iteration 1400 / 1500: loss 5.351436
iteration 0 / 1500: loss 1567.738415
iteration 100 / 1500: loss 32.854742
iteration 200 / 1500: loss 6.295576
iteration 300 / 1500: loss 6.283173
iteration 400 / 1500: loss 5.975514
iteration 500 / 1500: loss 5.621556
iteration 600 / 1500: loss 5.059684
iteration 700 / 1500: loss 6.137941
iteration 800 / 1500: loss 6.118096
iteration 900 / 1500: loss 5.852718
iteration 1000 / 1500: loss 6.030805
iteration 1100 / 1500: loss 5.873834
iteration 1200 / 1500: loss 6.003965
iteration 1300 / 1500: loss 5.944714
iteration 1400 / 1500: loss 5.519338
lr 1.000000e-08 reg 1.000000e+04 train accuracy: 0.229082 val accuracy: 0.216000
lr 1.000000e-08 reg 2.000000e+04 train accuracy: 0.234102 val accuracy: 0.242000
lr 1.000000e-08 reg 3.000000e+04 train accuracy: 0.244714 val accuracy: 0.247000
lr 1.000000e-08 reg 4.000000e+04 train accuracy: 0.255510 val accuracy: 0.254000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.259367 val accuracy: 0.269000
lr 1.000000e-08 reg 6.000000e+04 train accuracy: 0.265286 val accuracy: 0.274000
lr 1.000000e-08 reg 7.000000e+04 train accuracy: 0.267857 val accuracy: 0.273000
lr 1.000000e-08 reg 8.000000e+04 train accuracy: 0.281449 val accuracy: 0.267000
lr 1.000000e-08 reg 9.000000e+04 train accuracy: 0.287327 val accuracy: 0.292000
lr 1.000000e-08 reg 1.000000e+05 train accuracy: 0.295571 val accuracy: 0.314000
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.369041 val accuracy: 0.374000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.383612 val accuracy: 0.385000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.374388 val accuracy: 0.365000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.373224 val accuracy: 0.389000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.368878 val accuracy: 0.381000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.361653 val accuracy: 0.381000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.366306 val accuracy: 0.382000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.362490 val accuracy: 0.370000
lr 1.000000e-07 reg 9.000000e+04 train accuracy: 0.359469 val accuracy: 0.367000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.361000 val accuracy: 0.353000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.393510 val accuracy: 0.393000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.373673 val accuracy: 0.388000
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.371388 val accuracy: 0.389000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.371286 val accuracy: 0.362000
lr 2.000000e-07 reg 5.000000e+04 train accuracy: 0.364796 val accuracy: 0.384000
lr 2.000000e-07 reg 6.000000e+04 train accuracy: 0.360388 val accuracy: 0.366000
lr 2.000000e-07 reg 7.000000e+04 train accuracy: 0.354429 val accuracy: 0.361000
lr 2.000000e-07 reg 8.000000e+04 train accuracy: 0.351735 val accuracy: 0.372000
lr 2.000000e-07 reg 9.000000e+04 train accuracy: 0.350755 val accuracy: 0.366000
lr 2.000000e-07 reg 1.000000e+05 train accuracy: 0.351102 val accuracy: 0.353000
best validation accuracy achieved during cross-validation: 0.393000

In [22]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [23]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.369000

In [24]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in