Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))


loss: 8.937335

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: 18.714627 analytic: 18.714627, relative error: 2.454202e-12
numerical: 18.883725 analytic: 18.883725, relative error: 6.841011e-12
numerical: -4.048733 analytic: -4.048733, relative error: 8.064880e-12
numerical: 9.950261 analytic: 9.950261, relative error: 2.157726e-12
numerical: -2.978959 analytic: -2.978959, relative error: 9.038844e-11
numerical: -6.964474 analytic: -6.964474, relative error: 2.054519e-11
numerical: -26.987591 analytic: -26.987591, relative error: 2.798719e-12
numerical: -16.076682 analytic: -16.076682, relative error: 1.522822e-11
numerical: 5.979325 analytic: 5.979325, relative error: 6.288910e-11
numerical: 2.363990 analytic: 2.363990, relative error: 8.216933e-11
numerical: 17.111617 analytic: 17.111617, relative error: 1.204258e-11
numerical: 16.635418 analytic: 16.635418, relative error: 3.052886e-12
numerical: 2.419817 analytic: 2.419817, relative error: 4.581043e-11
numerical: -18.583962 analytic: -18.583962, relative error: 7.152470e-12
numerical: 2.517953 analytic: 2.517953, relative error: 4.455817e-11
numerical: 15.624926 analytic: 15.624926, relative error: 6.867058e-12
numerical: 5.261285 analytic: 5.261285, relative error: 2.094475e-12
numerical: -20.515539 analytic: -20.515539, relative error: 2.652387e-12
numerical: 11.288701 analytic: 11.288701, relative error: 2.896550e-11
numerical: -22.827748 analytic: -22.827748, relative error: 1.389080e-11

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))


Naive loss: 8.937335e+00 computed in 0.223071s
[[ 1.  0.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  0.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 ..., 
 [ 1.  1.  0. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  1.]
 [ 1.  1.  1. ...,  1.  1.  0.]]
Vectorized loss: 8.937335e+00 computed in 0.024923s
difference: -0.000000

In [16]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)


Naive loss and gradient: computed in 0.305631s
Vectorized loss and gradient: computed in 0.022191s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [17]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))


iteration 0 / 1500: loss 786.327077
iteration 100 / 1500: loss 287.641516
iteration 200 / 1500: loss 107.392319
iteration 300 / 1500: loss 42.064975
iteration 400 / 1500: loss 19.061164
iteration 500 / 1500: loss 9.851646
iteration 600 / 1500: loss 7.360374
iteration 700 / 1500: loss 6.211405
iteration 800 / 1500: loss 5.625353
iteration 900 / 1500: loss 5.078105
iteration 1000 / 1500: loss 5.662972
iteration 1100 / 1500: loss 4.782022
iteration 1200 / 1500: loss 5.559518
iteration 1300 / 1500: loss 4.819758
iteration 1400 / 1500: loss 5.464011
That took 10.675910s

In [18]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [19]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))


training accuracy: 0.373918
validation accuracy: 0.385000

In [38]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [3e-9]
regularization_strengths = [4.5e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for lr in learning_rates:
    for rs in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=rs,
                      num_iters=30000, verbose=True)
        y_val_pred = svm.predict(X_val)
        ta = np.mean(y_train == y_train_pred)
        va = np.mean(y_val == y_val_pred)
        results[(lr, rs)] = (ta, va)
        if(va > best_val):
            best_val = va
            best_svm = svm
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)


iteration 0 / 30000: loss 1404.224196
iteration 100 / 30000: loss 1325.465836
iteration 200 / 30000: loss 1256.707455
iteration 300 / 30000: loss 1189.105717
iteration 400 / 30000: loss 1127.901394
iteration 500 / 30000: loss 1067.625819
iteration 600 / 30000: loss 1009.215481
iteration 700 / 30000: loss 957.361554
iteration 800 / 30000: loss 908.334460
iteration 900 / 30000: loss 859.748805
iteration 1000 / 30000: loss 813.421032
iteration 1100 / 30000: loss 770.545131
iteration 1200 / 30000: loss 729.517574
iteration 1300 / 30000: loss 693.549199
iteration 1400 / 30000: loss 657.535028
iteration 1500 / 30000: loss 621.912843
iteration 1600 / 30000: loss 589.610143
iteration 1700 / 30000: loss 558.364903
iteration 1800 / 30000: loss 527.356758
iteration 1900 / 30000: loss 502.140150
iteration 2000 / 30000: loss 476.107554
iteration 2100 / 30000: loss 450.368327
iteration 2200 / 30000: loss 426.426876
iteration 2300 / 30000: loss 404.533909
iteration 2400 / 30000: loss 383.824895
iteration 2500 / 30000: loss 363.865654
iteration 2600 / 30000: loss 344.981346
iteration 2700 / 30000: loss 325.679479
iteration 2800 / 30000: loss 309.622797
iteration 2900 / 30000: loss 293.783570
iteration 3000 / 30000: loss 277.782182
iteration 3100 / 30000: loss 263.660927
iteration 3200 / 30000: loss 250.355387
iteration 3300 / 30000: loss 236.989040
iteration 3400 / 30000: loss 225.000529
iteration 3500 / 30000: loss 213.068077
iteration 3600 / 30000: loss 201.826185
iteration 3700 / 30000: loss 192.778760
iteration 3800 / 30000: loss 181.680516
iteration 3900 / 30000: loss 171.733377
iteration 4000 / 30000: loss 164.225890
iteration 4100 / 30000: loss 156.215517
iteration 4200 / 30000: loss 147.574538
iteration 4300 / 30000: loss 139.923078
iteration 4400 / 30000: loss 132.587379
iteration 4500 / 30000: loss 125.778871
iteration 4600 / 30000: loss 119.396254
iteration 4700 / 30000: loss 114.367631
iteration 4800 / 30000: loss 108.560480
iteration 4900 / 30000: loss 103.223625
iteration 5000 / 30000: loss 97.206546
iteration 5100 / 30000: loss 92.992245
iteration 5200 / 30000: loss 88.358167
iteration 5300 / 30000: loss 83.573127
iteration 5400 / 30000: loss 79.374873
iteration 5500 / 30000: loss 75.602787
iteration 5600 / 30000: loss 72.441744
iteration 5700 / 30000: loss 68.428014
iteration 5800 / 30000: loss 64.817584
iteration 5900 / 30000: loss 61.671517
iteration 6000 / 30000: loss 58.519089
iteration 6100 / 30000: loss 55.931588
iteration 6200 / 30000: loss 53.253735
iteration 6300 / 30000: loss 50.378604
iteration 6400 / 30000: loss 48.672032
iteration 6500 / 30000: loss 46.582874
iteration 6600 / 30000: loss 44.193609
iteration 6700 / 30000: loss 42.026204
iteration 6800 / 30000: loss 39.685740
iteration 6900 / 30000: loss 37.991758
iteration 7000 / 30000: loss 36.600362
iteration 7100 / 30000: loss 33.762753
iteration 7200 / 30000: loss 33.455771
iteration 7300 / 30000: loss 31.483326
iteration 7400 / 30000: loss 30.501236
iteration 7500 / 30000: loss 28.938734
iteration 7600 / 30000: loss 27.838760
iteration 7700 / 30000: loss 26.442849
iteration 7800 / 30000: loss 25.360175
iteration 7900 / 30000: loss 24.283666
iteration 8000 / 30000: loss 23.787885
iteration 8100 / 30000: loss 21.527317
iteration 8200 / 30000: loss 21.716364
iteration 8300 / 30000: loss 20.951389
iteration 8400 / 30000: loss 19.996228
iteration 8500 / 30000: loss 19.081782
iteration 8600 / 30000: loss 18.714228
iteration 8700 / 30000: loss 17.822235
iteration 8800 / 30000: loss 17.470747
iteration 8900 / 30000: loss 16.541006
iteration 9000 / 30000: loss 15.856727
iteration 9100 / 30000: loss 16.050165
iteration 9200 / 30000: loss 14.883342
iteration 9300 / 30000: loss 14.661609
iteration 9400 / 30000: loss 13.987473
iteration 9500 / 30000: loss 13.333182
iteration 9600 / 30000: loss 13.065812
iteration 9700 / 30000: loss 13.182626
iteration 9800 / 30000: loss 12.367291
iteration 9900 / 30000: loss 11.649040
iteration 10000 / 30000: loss 11.897585
iteration 10100 / 30000: loss 11.594326
iteration 10200 / 30000: loss 11.306106
iteration 10300 / 30000: loss 10.585921
iteration 10400 / 30000: loss 10.323731
iteration 10500 / 30000: loss 9.822646
iteration 10600 / 30000: loss 9.773480
iteration 10700 / 30000: loss 9.536621
iteration 10800 / 30000: loss 9.361730
iteration 10900 / 30000: loss 9.199381
iteration 11000 / 30000: loss 9.286418
iteration 11100 / 30000: loss 9.199095
iteration 11200 / 30000: loss 8.752491
iteration 11300 / 30000: loss 8.324151
iteration 11400 / 30000: loss 8.373774
iteration 11500 / 30000: loss 8.355778
iteration 11600 / 30000: loss 8.173512
iteration 11700 / 30000: loss 7.705555
iteration 11800 / 30000: loss 7.434838
iteration 11900 / 30000: loss 8.395416
iteration 12000 / 30000: loss 7.784568
iteration 12100 / 30000: loss 7.738551
iteration 12200 / 30000: loss 7.517788
iteration 12300 / 30000: loss 7.414824
iteration 12400 / 30000: loss 7.133425
iteration 12500 / 30000: loss 7.061746
iteration 12600 / 30000: loss 6.377210
iteration 12700 / 30000: loss 6.773735
iteration 12800 / 30000: loss 6.680209
iteration 12900 / 30000: loss 6.892904
iteration 13000 / 30000: loss 6.810908
iteration 13100 / 30000: loss 7.242150
iteration 13200 / 30000: loss 6.254054
iteration 13300 / 30000: loss 6.638321
iteration 13400 / 30000: loss 6.430021
iteration 13500 / 30000: loss 6.745448
iteration 13600 / 30000: loss 6.655286
iteration 13700 / 30000: loss 6.368221
iteration 13800 / 30000: loss 6.071089
iteration 13900 / 30000: loss 6.625605
iteration 14000 / 30000: loss 6.886652
iteration 14100 / 30000: loss 5.943412
iteration 14200 / 30000: loss 5.277189
iteration 14300 / 30000: loss 6.770334
iteration 14400 / 30000: loss 6.152201
iteration 14500 / 30000: loss 6.263002
iteration 14600 / 30000: loss 6.232310
iteration 14700 / 30000: loss 6.240565
iteration 14800 / 30000: loss 6.351982
iteration 14900 / 30000: loss 5.914667
iteration 15000 / 30000: loss 5.540364
iteration 15100 / 30000: loss 6.194938
iteration 15200 / 30000: loss 6.465589
iteration 15300 / 30000: loss 6.339280
iteration 15400 / 30000: loss 5.955120
iteration 15500 / 30000: loss 6.448060
iteration 15600 / 30000: loss 5.487729
iteration 15700 / 30000: loss 5.660769
iteration 15800 / 30000: loss 5.681768
iteration 15900 / 30000: loss 5.805741
iteration 16000 / 30000: loss 5.911992
iteration 16100 / 30000: loss 5.786426
iteration 16200 / 30000: loss 5.551419
iteration 16300 / 30000: loss 5.874848
iteration 16400 / 30000: loss 6.417772
iteration 16500 / 30000: loss 5.149053
iteration 16600 / 30000: loss 5.141812
iteration 16700 / 30000: loss 5.647614
iteration 16800 / 30000: loss 5.407120
iteration 16900 / 30000: loss 5.644038
iteration 17000 / 30000: loss 5.170737
iteration 17100 / 30000: loss 5.563980
iteration 17200 / 30000: loss 5.376150
iteration 17300 / 30000: loss 6.212843
iteration 17400 / 30000: loss 5.526980
iteration 17500 / 30000: loss 5.652061
iteration 17600 / 30000: loss 5.745742
iteration 17700 / 30000: loss 5.658586
iteration 17800 / 30000: loss 5.329260
iteration 17900 / 30000: loss 5.322897
iteration 18000 / 30000: loss 6.013590
iteration 18100 / 30000: loss 5.505053
iteration 18200 / 30000: loss 5.286164
iteration 18300 / 30000: loss 5.653892
iteration 18400 / 30000: loss 5.930467
iteration 18500 / 30000: loss 5.340879
iteration 18600 / 30000: loss 5.256674
iteration 18700 / 30000: loss 5.719189
iteration 18800 / 30000: loss 5.272033
iteration 18900 / 30000: loss 5.738947
iteration 19000 / 30000: loss 5.893257
iteration 19100 / 30000: loss 5.382069
iteration 19200 / 30000: loss 4.970196
iteration 19300 / 30000: loss 5.234620
iteration 19400 / 30000: loss 6.164412
iteration 19500 / 30000: loss 5.725931
iteration 19600 / 30000: loss 5.765768
iteration 19700 / 30000: loss 5.323568
iteration 19800 / 30000: loss 5.428694
iteration 19900 / 30000: loss 6.117456
iteration 20000 / 30000: loss 5.746019
iteration 20100 / 30000: loss 5.584957
iteration 20200 / 30000: loss 5.680148
iteration 20300 / 30000: loss 5.204039
iteration 20400 / 30000: loss 5.651018
iteration 20500 / 30000: loss 5.606275
iteration 20600 / 30000: loss 5.651781
iteration 20700 / 30000: loss 6.257462
iteration 20800 / 30000: loss 5.539931
iteration 20900 / 30000: loss 5.336910
iteration 21000 / 30000: loss 5.818779
iteration 21100 / 30000: loss 5.770666
iteration 21200 / 30000: loss 5.503790
iteration 21300 / 30000: loss 5.549561
iteration 21400 / 30000: loss 5.583492
iteration 21500 / 30000: loss 5.157266
iteration 21600 / 30000: loss 5.366464
iteration 21700 / 30000: loss 5.948794
iteration 21800 / 30000: loss 5.753630
iteration 21900 / 30000: loss 5.501991
iteration 22000 / 30000: loss 5.617345
iteration 22100 / 30000: loss 5.633187
iteration 22200 / 30000: loss 5.305477
iteration 22300 / 30000: loss 5.617382
iteration 22400 / 30000: loss 5.316661
iteration 22500 / 30000: loss 5.475664
iteration 22600 / 30000: loss 5.418139
iteration 22700 / 30000: loss 5.423570
iteration 22800 / 30000: loss 5.424148
iteration 22900 / 30000: loss 5.419514
iteration 23000 / 30000: loss 5.641321
iteration 23100 / 30000: loss 5.369419
iteration 23200 / 30000: loss 5.845765
iteration 23300 / 30000: loss 5.422731
iteration 23400 / 30000: loss 5.170207
iteration 23500 / 30000: loss 5.559642
iteration 23600 / 30000: loss 5.002227
iteration 23700 / 30000: loss 5.756336
iteration 23800 / 30000: loss 5.805381
iteration 23900 / 30000: loss 5.512909
iteration 24000 / 30000: loss 5.242164
iteration 24100 / 30000: loss 5.350572
iteration 24200 / 30000: loss 5.121405
iteration 24300 / 30000: loss 5.845536
iteration 24400 / 30000: loss 5.836125
iteration 24500 / 30000: loss 5.939575
iteration 24600 / 30000: loss 5.830713
iteration 24700 / 30000: loss 5.626402
iteration 24800 / 30000: loss 5.276547
iteration 24900 / 30000: loss 4.916160
iteration 25000 / 30000: loss 5.423396
iteration 25100 / 30000: loss 5.520658
iteration 25200 / 30000: loss 5.217485
iteration 25300 / 30000: loss 5.636796
iteration 25400 / 30000: loss 5.190347
iteration 25500 / 30000: loss 5.940212
iteration 25600 / 30000: loss 5.831482
iteration 25700 / 30000: loss 5.589014
iteration 25800 / 30000: loss 5.402094
iteration 25900 / 30000: loss 5.603937
iteration 26000 / 30000: loss 5.546927
iteration 26100 / 30000: loss 5.279221
iteration 26200 / 30000: loss 6.083441
iteration 26300 / 30000: loss 5.715660
iteration 26400 / 30000: loss 5.498751
iteration 26500 / 30000: loss 5.896742
iteration 26600 / 30000: loss 5.334038
iteration 26700 / 30000: loss 5.827416
iteration 26800 / 30000: loss 5.222656
iteration 26900 / 30000: loss 5.059927
iteration 27000 / 30000: loss 5.751460
iteration 27100 / 30000: loss 5.756909
iteration 27200 / 30000: loss 5.299595
iteration 27300 / 30000: loss 5.226472
iteration 27400 / 30000: loss 5.317117
iteration 27500 / 30000: loss 5.719688
iteration 27600 / 30000: loss 5.210231
iteration 27700 / 30000: loss 5.547768
iteration 27800 / 30000: loss 5.256751
iteration 27900 / 30000: loss 5.324070
iteration 28000 / 30000: loss 5.553829
iteration 28100 / 30000: loss 5.526153
iteration 28200 / 30000: loss 5.308591
iteration 28300 / 30000: loss 5.556497
iteration 28400 / 30000: loss 5.309953
iteration 28500 / 30000: loss 5.064526
iteration 28600 / 30000: loss 5.655393
iteration 28700 / 30000: loss 5.559661
iteration 28800 / 30000: loss 5.528537
iteration 28900 / 30000: loss 5.566410
iteration 29000 / 30000: loss 5.468199
iteration 29100 / 30000: loss 5.377371
iteration 29200 / 30000: loss 5.502049
iteration 29300 / 30000: loss 5.091518
iteration 29400 / 30000: loss 5.673834
iteration 29500 / 30000: loss 5.148942
iteration 29600 / 30000: loss 5.691004
iteration 29700 / 30000: loss 5.768481
iteration 29800 / 30000: loss 5.178486
iteration 29900 / 30000: loss 4.995982
lr 3.000000e-09 reg 4.500000e+04 train accuracy: 0.373918 val accuracy: 0.384000
best validation accuracy achieved during cross-validation: 0.384000

In [39]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [40]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)


linear SVM on raw pixels final test set accuracy: 0.361000

In [41]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in