Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))


loss: 9.401892

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: -19.331308 analytic: -19.331308, relative error: 1.708431e-11
numerical: 2.834324 analytic: 2.834324, relative error: 4.275734e-11
numerical: -11.011608 analytic: -11.011608, relative error: 3.332441e-11
numerical: 8.558583 analytic: 8.587888, relative error: 1.709088e-03
numerical: 17.184664 analytic: 17.184664, relative error: 8.086742e-12
numerical: 8.170305 analytic: 8.170305, relative error: 4.901246e-11
numerical: -20.199335 analytic: -20.199335, relative error: 1.021509e-11
numerical: -18.448037 analytic: -18.448037, relative error: 1.426398e-11
numerical: 2.229220 analytic: 2.229220, relative error: 6.608228e-11
numerical: -0.161192 analytic: -0.148740, relative error: 4.017421e-02
numerical: 23.871422 analytic: 23.871422, relative error: 4.925804e-12
numerical: -22.968019 analytic: -22.986735, relative error: 4.072795e-04
numerical: 28.657145 analytic: 28.658441, relative error: 2.260773e-05
numerical: 2.274754 analytic: 2.274754, relative error: 1.613450e-10
numerical: 13.319137 analytic: 13.319137, relative error: 2.581806e-11
numerical: 7.858475 analytic: 7.858475, relative error: 3.172175e-11
numerical: -0.177730 analytic: -0.177730, relative error: 1.451967e-09
numerical: 9.285806 analytic: 9.274643, relative error: 6.014556e-04
numerical: 13.901215 analytic: 13.901215, relative error: 6.189168e-12
numerical: 18.502242 analytic: 18.502242, relative error: 1.901144e-12

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: This shouldn't be a concern and can be caused by a choice of h too large.


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))


Naive loss: 9.401892e+00 computed in 0.152883s
Vectorized loss: 9.401892e+00 computed in 0.008423s
difference: -0.000000

In [12]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)


Naive loss and gradient: computed in 0.207229s
Vectorized loss and gradient: computed in 0.013046s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [13]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))


iteration 0 / 1500: loss 784.247658
iteration 100 / 1500: loss 469.162769
iteration 200 / 1500: loss 284.795163
iteration 300 / 1500: loss 173.571443
iteration 400 / 1500: loss 106.142147
iteration 500 / 1500: loss 66.366560
iteration 600 / 1500: loss 42.106997
iteration 700 / 1500: loss 27.959855
iteration 800 / 1500: loss 18.656635
iteration 900 / 1500: loss 12.742883
iteration 1000 / 1500: loss 10.185770
iteration 1100 / 1500: loss 8.451680
iteration 1200 / 1500: loss 6.985449
iteration 1300 / 1500: loss 6.774249
iteration 1400 / 1500: loss 6.218362
That took 9.754735s

In [14]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [15]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))


training accuracy: 0.382857
validation accuracy: 0.390000

In [16]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = list(map(lambda x: x*1e-7, np.arange(0.9, 2, 0.1)))
regularization_strengths = list(map(lambda x: x*1e4, np.arange(1, 10)))

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1  # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.
iters = 2000

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for lr in learning_rates:
    for reg in regularization_strengths:
        print('Training with lr={0}, reg={1}'.format(lr, reg))
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=reg, num_iters=iters)
        y_train_pred = svm.predict(X_train)
        y_val_pred = svm.predict(X_val)
        train_accuracy = np.mean(y_train == y_train_pred)
        validation_accuracy = np.mean(y_val == y_val_pred)
        if validation_accuracy > best_val:
            best_val = validation_accuracy
            best_svm = svm
        results[(lr, reg)] = (validation_accuracy, train_accuracy)
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)


Training with lr=9e-08, reg=10000.0
Training with lr=9e-08, reg=20000.0
Training with lr=9e-08, reg=30000.0
Training with lr=9e-08, reg=40000.0
Training with lr=9e-08, reg=50000.0
Training with lr=9e-08, reg=60000.0
Training with lr=9e-08, reg=70000.0
Training with lr=9e-08, reg=80000.0
Training with lr=9e-08, reg=90000.0
Training with lr=1e-07, reg=10000.0
Training with lr=1e-07, reg=20000.0
Training with lr=1e-07, reg=30000.0
Training with lr=1e-07, reg=40000.0
Training with lr=1e-07, reg=50000.0
Training with lr=1e-07, reg=60000.0
Training with lr=1e-07, reg=70000.0
Training with lr=1e-07, reg=80000.0
Training with lr=1e-07, reg=90000.0
Training with lr=1.1e-07, reg=10000.0
Training with lr=1.1e-07, reg=20000.0
Training with lr=1.1e-07, reg=30000.0
Training with lr=1.1e-07, reg=40000.0
Training with lr=1.1e-07, reg=50000.0
Training with lr=1.1e-07, reg=60000.0
Training with lr=1.1e-07, reg=70000.0
Training with lr=1.1e-07, reg=80000.0
Training with lr=1.1e-07, reg=90000.0
Training with lr=1.2e-07, reg=10000.0
Training with lr=1.2e-07, reg=20000.0
Training with lr=1.2e-07, reg=30000.0
Training with lr=1.2e-07, reg=40000.0
Training with lr=1.2e-07, reg=50000.0
Training with lr=1.2e-07, reg=60000.0
Training with lr=1.2e-07, reg=70000.0
Training with lr=1.2e-07, reg=80000.0
Training with lr=1.2e-07, reg=90000.0
Training with lr=1.2999999999999997e-07, reg=10000.0
Training with lr=1.2999999999999997e-07, reg=20000.0
Training with lr=1.2999999999999997e-07, reg=30000.0
Training with lr=1.2999999999999997e-07, reg=40000.0
Training with lr=1.2999999999999997e-07, reg=50000.0
Training with lr=1.2999999999999997e-07, reg=60000.0
Training with lr=1.2999999999999997e-07, reg=70000.0
Training with lr=1.2999999999999997e-07, reg=80000.0
Training with lr=1.2999999999999997e-07, reg=90000.0
Training with lr=1.3999999999999998e-07, reg=10000.0
Training with lr=1.3999999999999998e-07, reg=20000.0
Training with lr=1.3999999999999998e-07, reg=30000.0
Training with lr=1.3999999999999998e-07, reg=40000.0
Training with lr=1.3999999999999998e-07, reg=50000.0
Training with lr=1.3999999999999998e-07, reg=60000.0
Training with lr=1.3999999999999998e-07, reg=70000.0
Training with lr=1.3999999999999998e-07, reg=80000.0
Training with lr=1.3999999999999998e-07, reg=90000.0
Training with lr=1.5e-07, reg=10000.0
Training with lr=1.5e-07, reg=20000.0
Training with lr=1.5e-07, reg=30000.0
Training with lr=1.5e-07, reg=40000.0
Training with lr=1.5e-07, reg=50000.0
Training with lr=1.5e-07, reg=60000.0
Training with lr=1.5e-07, reg=70000.0
Training with lr=1.5e-07, reg=80000.0
Training with lr=1.5e-07, reg=90000.0
Training with lr=1.5999999999999998e-07, reg=10000.0
Training with lr=1.5999999999999998e-07, reg=20000.0
Training with lr=1.5999999999999998e-07, reg=30000.0
Training with lr=1.5999999999999998e-07, reg=40000.0
Training with lr=1.5999999999999998e-07, reg=50000.0
Training with lr=1.5999999999999998e-07, reg=60000.0
Training with lr=1.5999999999999998e-07, reg=70000.0
Training with lr=1.5999999999999998e-07, reg=80000.0
Training with lr=1.5999999999999998e-07, reg=90000.0
Training with lr=1.6999999999999996e-07, reg=10000.0
Training with lr=1.6999999999999996e-07, reg=20000.0
Training with lr=1.6999999999999996e-07, reg=30000.0
Training with lr=1.6999999999999996e-07, reg=40000.0
Training with lr=1.6999999999999996e-07, reg=50000.0
Training with lr=1.6999999999999996e-07, reg=60000.0
Training with lr=1.6999999999999996e-07, reg=70000.0
Training with lr=1.6999999999999996e-07, reg=80000.0
Training with lr=1.6999999999999996e-07, reg=90000.0
Training with lr=1.7999999999999997e-07, reg=10000.0
Training with lr=1.7999999999999997e-07, reg=20000.0
Training with lr=1.7999999999999997e-07, reg=30000.0
Training with lr=1.7999999999999997e-07, reg=40000.0
Training with lr=1.7999999999999997e-07, reg=50000.0
Training with lr=1.7999999999999997e-07, reg=60000.0
Training with lr=1.7999999999999997e-07, reg=70000.0
Training with lr=1.7999999999999997e-07, reg=80000.0
Training with lr=1.7999999999999997e-07, reg=90000.0
Training with lr=1.8999999999999998e-07, reg=10000.0
Training with lr=1.8999999999999998e-07, reg=20000.0
Training with lr=1.8999999999999998e-07, reg=30000.0
Training with lr=1.8999999999999998e-07, reg=40000.0
Training with lr=1.8999999999999998e-07, reg=50000.0
Training with lr=1.8999999999999998e-07, reg=60000.0
Training with lr=1.8999999999999998e-07, reg=70000.0
Training with lr=1.8999999999999998e-07, reg=80000.0
Training with lr=1.8999999999999998e-07, reg=90000.0
lr 9.000000e-08 reg 1.000000e+04 train accuracy: 0.395000 val accuracy: 0.386694
lr 9.000000e-08 reg 2.000000e+04 train accuracy: 0.381000 val accuracy: 0.385020
lr 9.000000e-08 reg 3.000000e+04 train accuracy: 0.392000 val accuracy: 0.376286
lr 9.000000e-08 reg 4.000000e+04 train accuracy: 0.384000 val accuracy: 0.373245
lr 9.000000e-08 reg 5.000000e+04 train accuracy: 0.396000 val accuracy: 0.375857
lr 9.000000e-08 reg 6.000000e+04 train accuracy: 0.373000 val accuracy: 0.363388
lr 9.000000e-08 reg 7.000000e+04 train accuracy: 0.372000 val accuracy: 0.363673
lr 9.000000e-08 reg 8.000000e+04 train accuracy: 0.363000 val accuracy: 0.361551
lr 9.000000e-08 reg 9.000000e+04 train accuracy: 0.382000 val accuracy: 0.364857
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.380000 val accuracy: 0.391673
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.385000 val accuracy: 0.384735
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.384000 val accuracy: 0.379592
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.381000 val accuracy: 0.370347
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.383000 val accuracy: 0.369796
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.383000 val accuracy: 0.361714
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.381000 val accuracy: 0.362388
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.371000 val accuracy: 0.365388
lr 1.000000e-07 reg 9.000000e+04 train accuracy: 0.365000 val accuracy: 0.359469
lr 1.100000e-07 reg 1.000000e+04 train accuracy: 0.382000 val accuracy: 0.391898
lr 1.100000e-07 reg 2.000000e+04 train accuracy: 0.385000 val accuracy: 0.384102
lr 1.100000e-07 reg 3.000000e+04 train accuracy: 0.388000 val accuracy: 0.376673
lr 1.100000e-07 reg 4.000000e+04 train accuracy: 0.377000 val accuracy: 0.375143
lr 1.100000e-07 reg 5.000000e+04 train accuracy: 0.366000 val accuracy: 0.367837
lr 1.100000e-07 reg 6.000000e+04 train accuracy: 0.375000 val accuracy: 0.364429
lr 1.100000e-07 reg 7.000000e+04 train accuracy: 0.380000 val accuracy: 0.363531
lr 1.100000e-07 reg 8.000000e+04 train accuracy: 0.370000 val accuracy: 0.365469
lr 1.100000e-07 reg 9.000000e+04 train accuracy: 0.358000 val accuracy: 0.358245
lr 1.200000e-07 reg 1.000000e+04 train accuracy: 0.381000 val accuracy: 0.391592
lr 1.200000e-07 reg 2.000000e+04 train accuracy: 0.386000 val accuracy: 0.386898
lr 1.200000e-07 reg 3.000000e+04 train accuracy: 0.372000 val accuracy: 0.373286
lr 1.200000e-07 reg 4.000000e+04 train accuracy: 0.368000 val accuracy: 0.374367
lr 1.200000e-07 reg 5.000000e+04 train accuracy: 0.385000 val accuracy: 0.369694
lr 1.200000e-07 reg 6.000000e+04 train accuracy: 0.377000 val accuracy: 0.365796
lr 1.200000e-07 reg 7.000000e+04 train accuracy: 0.375000 val accuracy: 0.365571
lr 1.200000e-07 reg 8.000000e+04 train accuracy: 0.382000 val accuracy: 0.371694
lr 1.200000e-07 reg 9.000000e+04 train accuracy: 0.372000 val accuracy: 0.353714
lr 1.300000e-07 reg 1.000000e+04 train accuracy: 0.390000 val accuracy: 0.390837
lr 1.300000e-07 reg 2.000000e+04 train accuracy: 0.386000 val accuracy: 0.379531
lr 1.300000e-07 reg 3.000000e+04 train accuracy: 0.379000 val accuracy: 0.377918
lr 1.300000e-07 reg 4.000000e+04 train accuracy: 0.389000 val accuracy: 0.365327
lr 1.300000e-07 reg 5.000000e+04 train accuracy: 0.372000 val accuracy: 0.365735
lr 1.300000e-07 reg 6.000000e+04 train accuracy: 0.376000 val accuracy: 0.361122
lr 1.300000e-07 reg 7.000000e+04 train accuracy: 0.370000 val accuracy: 0.360122
lr 1.300000e-07 reg 8.000000e+04 train accuracy: 0.367000 val accuracy: 0.355510
lr 1.300000e-07 reg 9.000000e+04 train accuracy: 0.374000 val accuracy: 0.354776
lr 1.400000e-07 reg 1.000000e+04 train accuracy: 0.402000 val accuracy: 0.393102
lr 1.400000e-07 reg 2.000000e+04 train accuracy: 0.397000 val accuracy: 0.387878
lr 1.400000e-07 reg 3.000000e+04 train accuracy: 0.383000 val accuracy: 0.366061
lr 1.400000e-07 reg 4.000000e+04 train accuracy: 0.379000 val accuracy: 0.367347
lr 1.400000e-07 reg 5.000000e+04 train accuracy: 0.376000 val accuracy: 0.369102
lr 1.400000e-07 reg 6.000000e+04 train accuracy: 0.372000 val accuracy: 0.365551
lr 1.400000e-07 reg 7.000000e+04 train accuracy: 0.370000 val accuracy: 0.359816
lr 1.400000e-07 reg 8.000000e+04 train accuracy: 0.370000 val accuracy: 0.356816
lr 1.400000e-07 reg 9.000000e+04 train accuracy: 0.367000 val accuracy: 0.345184
lr 1.500000e-07 reg 1.000000e+04 train accuracy: 0.389000 val accuracy: 0.387776
lr 1.500000e-07 reg 2.000000e+04 train accuracy: 0.376000 val accuracy: 0.379184
lr 1.500000e-07 reg 3.000000e+04 train accuracy: 0.379000 val accuracy: 0.376143
lr 1.500000e-07 reg 4.000000e+04 train accuracy: 0.374000 val accuracy: 0.371000
lr 1.500000e-07 reg 5.000000e+04 train accuracy: 0.376000 val accuracy: 0.365939
lr 1.500000e-07 reg 6.000000e+04 train accuracy: 0.367000 val accuracy: 0.363531
lr 1.500000e-07 reg 7.000000e+04 train accuracy: 0.380000 val accuracy: 0.358143
lr 1.500000e-07 reg 8.000000e+04 train accuracy: 0.366000 val accuracy: 0.353469
lr 1.500000e-07 reg 9.000000e+04 train accuracy: 0.358000 val accuracy: 0.354163
lr 1.600000e-07 reg 1.000000e+04 train accuracy: 0.391000 val accuracy: 0.375878
lr 1.600000e-07 reg 2.000000e+04 train accuracy: 0.378000 val accuracy: 0.381776
lr 1.600000e-07 reg 3.000000e+04 train accuracy: 0.374000 val accuracy: 0.374531
lr 1.600000e-07 reg 4.000000e+04 train accuracy: 0.370000 val accuracy: 0.369980
lr 1.600000e-07 reg 5.000000e+04 train accuracy: 0.355000 val accuracy: 0.360449
lr 1.600000e-07 reg 6.000000e+04 train accuracy: 0.380000 val accuracy: 0.367490
lr 1.600000e-07 reg 7.000000e+04 train accuracy: 0.365000 val accuracy: 0.356857
lr 1.600000e-07 reg 8.000000e+04 train accuracy: 0.358000 val accuracy: 0.356714
lr 1.600000e-07 reg 9.000000e+04 train accuracy: 0.357000 val accuracy: 0.353714
lr 1.700000e-07 reg 1.000000e+04 train accuracy: 0.380000 val accuracy: 0.391878
lr 1.700000e-07 reg 2.000000e+04 train accuracy: 0.391000 val accuracy: 0.381000
lr 1.700000e-07 reg 3.000000e+04 train accuracy: 0.366000 val accuracy: 0.363082
lr 1.700000e-07 reg 4.000000e+04 train accuracy: 0.368000 val accuracy: 0.367878
lr 1.700000e-07 reg 5.000000e+04 train accuracy: 0.368000 val accuracy: 0.356816
lr 1.700000e-07 reg 6.000000e+04 train accuracy: 0.363000 val accuracy: 0.352490
lr 1.700000e-07 reg 7.000000e+04 train accuracy: 0.361000 val accuracy: 0.364694
lr 1.700000e-07 reg 8.000000e+04 train accuracy: 0.359000 val accuracy: 0.345776
lr 1.700000e-07 reg 9.000000e+04 train accuracy: 0.364000 val accuracy: 0.359469
lr 1.800000e-07 reg 1.000000e+04 train accuracy: 0.381000 val accuracy: 0.389857
lr 1.800000e-07 reg 2.000000e+04 train accuracy: 0.382000 val accuracy: 0.377367
lr 1.800000e-07 reg 3.000000e+04 train accuracy: 0.377000 val accuracy: 0.368388
lr 1.800000e-07 reg 4.000000e+04 train accuracy: 0.370000 val accuracy: 0.368143
lr 1.800000e-07 reg 5.000000e+04 train accuracy: 0.377000 val accuracy: 0.363959
lr 1.800000e-07 reg 6.000000e+04 train accuracy: 0.368000 val accuracy: 0.363857
lr 1.800000e-07 reg 7.000000e+04 train accuracy: 0.371000 val accuracy: 0.359531
lr 1.800000e-07 reg 8.000000e+04 train accuracy: 0.369000 val accuracy: 0.353898
lr 1.800000e-07 reg 9.000000e+04 train accuracy: 0.362000 val accuracy: 0.358469
lr 1.900000e-07 reg 1.000000e+04 train accuracy: 0.400000 val accuracy: 0.388673
lr 1.900000e-07 reg 2.000000e+04 train accuracy: 0.370000 val accuracy: 0.375082
lr 1.900000e-07 reg 3.000000e+04 train accuracy: 0.373000 val accuracy: 0.375816
lr 1.900000e-07 reg 4.000000e+04 train accuracy: 0.386000 val accuracy: 0.367041
lr 1.900000e-07 reg 5.000000e+04 train accuracy: 0.366000 val accuracy: 0.350592
lr 1.900000e-07 reg 6.000000e+04 train accuracy: 0.372000 val accuracy: 0.363429
lr 1.900000e-07 reg 7.000000e+04 train accuracy: 0.372000 val accuracy: 0.353551
lr 1.900000e-07 reg 8.000000e+04 train accuracy: 0.376000 val accuracy: 0.354286
lr 1.900000e-07 reg 9.000000e+04 train accuracy: 0.357000 val accuracy: 0.354122
best validation accuracy achieved during cross-validation: 0.402000

In [17]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [18]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)


linear SVM on raw pixels final test set accuracy: 0.385000

In [19]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: each one seems to give a very low resoltuion representation of the object that they represent and the context in which the object usually appears. For example, the frog looks like some green object on a more less blue/green background. For the car, the shape of the car is more less visible over a road. Similarly for the reset of the images.