Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [81]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

CIFAR-10 Data Loading and Preprocessing


In [82]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [83]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [84]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [ ]:


In [85]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape
print 'dev data shape: ', X_dev.shape


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [86]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [87]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [88]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape, X_dev.shape


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [40]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
print 'loss: %f' % (loss, )


loss: 8.784320

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [41]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: -4.971991 analytic: -5.006649, relative error: 3.473183e-03
numerical: -57.593034 analytic: -57.593034, relative error: 1.309420e-12
numerical: 8.074197 analytic: 8.074197, relative error: 5.161220e-11
numerical: 28.841661 analytic: 28.841661, relative error: 3.569446e-12
numerical: 1.068141 analytic: 1.068141, relative error: 1.649973e-10
numerical: -14.054093 analytic: -14.054093, relative error: 2.693336e-12
numerical: 7.724415 analytic: 7.724415, relative error: 3.692566e-11
numerical: 18.631085 analytic: 18.631085, relative error: 1.832611e-11
numerical: -11.843006 analytic: -11.843006, relative error: 3.007033e-11
numerical: 2.954647 analytic: 2.954647, relative error: 3.009256e-11
numerical: 10.918609 analytic: 10.919197, relative error: 2.692318e-05
numerical: -4.189513 analytic: -4.190614, relative error: 1.313696e-04
numerical: 10.982343 analytic: 10.987693, relative error: 2.435197e-04
numerical: 32.254719 analytic: 32.261003, relative error: 9.740233e-05
numerical: -2.138424 analytic: -2.136829, relative error: 3.729803e-04
numerical: -10.728144 analytic: -10.720884, relative error: 3.385168e-04
numerical: 3.974980 analytic: 3.961713, relative error: 1.671639e-03
numerical: 4.098251 analytic: 4.096430, relative error: 2.221938e-04
numerical: 4.414965 analytic: 4.410097, relative error: 5.515985e-04
numerical: 18.888732 analytic: 18.903887, relative error: 4.010191e-04

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: Possible. When on sharp point, the


In [50]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 8.784320e+00 computed in 0.155350s
Vectorized loss: 8.784320e+00 computed in 0.005456s
difference: -0.000000

In [ ]:


In [52]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 0.366322s
Vectorized loss and gradient: computed in 0.013466s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [89]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 793.762020
iteration 100 / 1500: loss 108.095626
iteration 200 / 1500: loss 19.188314
iteration 300 / 1500: loss 6.716104
iteration 400 / 1500: loss 5.534378
iteration 500 / 1500: loss 5.361041
iteration 600 / 1500: loss 5.555698
iteration 700 / 1500: loss 5.571662
iteration 800 / 1500: loss 5.570262
iteration 900 / 1500: loss 5.646202
iteration 1000 / 1500: loss 5.254488
iteration 1100 / 1500: loss 5.321432
iteration 1200 / 1500: loss 6.132455
iteration 1300 / 1500: loss 5.325005
iteration 1400 / 1500: loss 5.683714
That took 7.513692s

In [90]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [91]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.353020
validation accuracy: 0.371000

In [96]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [1e-7, 5e-5]
regularization_strengths = [5e4, 1e5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################

# cheat here for finer division to make life better
learning_rates = np.linspace(learning_rates[0], learning_rates[1], num=5)
regularization_strengths = np.linspace(regularization_strengths[0], regularization_strengths[1], num=5)

# it seem that high learning rate would make the loss -> inf
for learning_rate in learning_rates:
    for regularization_strength in regularization_strengths:
        svm = LinearSVM()
        loss_hist = svm.train(X_train, y_train, learning_rate=learning_rate, reg=regularization_strength,
                      num_iters=400, verbose=True)
        y_train_pred = svm.predict(X_train)
        y_val_pred = svm.predict(X_val)
        current_val = np.mean(y_val == y_val_pred)
        if current_val > best_val:
            best_val = current_val
            best_svm = svm
        results[(learning_rate, regularization_strength)] = (np.mean(y_train == y_train_pred),  current_val)



################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 400: loss 786.893079
iteration 100 / 400: loss 107.424530
iteration 200 / 400: loss 19.106815
iteration 300 / 400: loss 7.608106
iteration 0 / 400: loss 997.032673
iteration 100 / 400: loss 84.570925
iteration 200 / 400: loss 11.578979
iteration 300 / 400: loss 6.109057
iteration 0 / 400: loss 1173.404239
iteration 100 / 400: loss 61.892767
iteration 200 / 400: loss 8.070629
iteration 300 / 400: loss 5.531039
iteration 0 / 400: loss 1362.511567
iteration 100 / 400: loss 45.813034
iteration 200 / 400: loss 6.484383
iteration 300 / 400: loss 5.503638
iteration 0 / 400: loss 1563.963579
iteration 100 / 400: loss 32.213971
iteration 200 / 400: loss 6.162687
iteration 300 / 400: loss 5.270969
iteration 0 / 400: loss 798.189646
iteration 100 / 400: loss 159.039223
iteration 200 / 400: loss 188.300224
iteration 300 / 400: loss 198.341054
iteration 0 / 400: loss 970.054937
iteration 100 / 400: loss 404.619155
iteration 200 / 400: loss 505.244579
iteration 300 / 400: loss 528.707436
iteration 0 / 400: loss 1167.519140
iteration 100 / 400: loss 4838.967547
iteration 200 / 400: loss 4592.258604
iteration 300 / 400: loss 4766.060887
iteration 0 / 400: loss 1346.462583
iteration 100 / 400: loss 18649566899577966592.000000
iteration 200 / 400: loss 141953202752842807108798027837472768.000000
iteration 300 / 400: loss 1080492201503936968036616888960348300848577281785856.000000
iteration 0 / 400: loss 1535.594994
iteration 100 / 400: loss 2108379056703988647794152137837909114880.000000
iteration 200 / 400: loss 2549618849375723117236034624542464763865909430750166282212418144489189670912.000000
iteration 300 / 400: loss 3083200934112033339227767181453575466535791974807497831254756335342286723525358084630219097662436416089917227008.000000
iteration 0 / 400: loss 801.602970
iteration 100 / 400: loss 385338568247170465068866583711813992448.000000
iteration 200 / 400: loss 123920302633085074867434888800683926295021858241301773534240147282702893056.000000
iteration 300 / 400: loss 39851296158928286609471491191436986926220552419638946132016630261881370104193818438402974085172466774624436224.000000
iteration 0 / 400: loss 983.658454
iteration 100 / 400: loss 570921610062383414686997702347186058956769092093800550232096826720256.000000
iteration 200 / 400: loss 304407780934838363984678207507564191333769832022405912408951439981056277260959301470892209791615653054544722710083453763618430940872704.000000
iteration 300 / 400: loss 162306165085513800933498004788841693724038339957976523792833038068295930393726983359475516690196763367502277913273799033263630025244147643597103232823460606024717520966833185468174947891306500634705920.000000
iteration 0 / 400: loss 1187.664890
iteration 100 / 400: loss 15605992341970500325590312673502256798958706036404196674244374990531923763963002666979688448.000000
iteration 200 / 400: loss 197877134173482245834260389708858001248162168626042352342570430761011877035981029916760965176375931519619008631190625652516519260642905990513408470650134209433523284004158817959936.000000
iteration 300 / 400: loss 2508995222521448384354732819138530512211307199197599741432320339479578992151587115338756354008502145282051206036222429813826611524849056758183482818434653773255929678646639216380390361233035819133818361573394355759806274899239518800487884173545954076923711070714986496.000000
iteration 0 / 400: loss 1380.644129
iteration 100 / 400: loss 10748578679529345844114015825939205687765530272450211341100175175258896841563455544938166801834788330168909824.000000
iteration 200 / 400: loss 81471328016800775813813778653485906034730935940733611552675684069346059091988047810314837183879901812032310085514468758067097848524400290359944063014205107959162687072014319086627897189902432611336305024512525598720.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1558.054533
iteration 100 / 400: loss 6718712093550009306703367886288648857019027244411847907557678819826035163671250096435970683727762670683761724872603195146240.000000
iteration 200 / 400: loss 28586471682751846894353247090225781216331545826388510516212962492493820752250879927375389717373932938385563106833274911490578093086746616837465565444659859267784777140080450141717970048125449840618818379087990476626803327188003730107850106601472.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 791.756529
iteration 100 / 400: loss 7627684479642058802342923413983942046722849324408993000861062441478351525587804180573061120.000000
iteration 200 / 400: loss 67275703651905376877990628558613982942271130492958827031700600616758092389584773144413979754502521228560446533157951613962035163253690455101901420830088408942973488165260237996032.000000
iteration 300 / 400: loss 593367530334891985255661386213195408254706955101186192167573081102896436590445828252756018971547249157534315326828375536767385261209550542831209100567728287044342194145672140142894570489899319782320112540243640765009193791559404656952528039929546729261859083311382528.000000
iteration 0 / 400: loss 983.050454
iteration 100 / 400: loss 264522946271477941479366588218882784955744316114919553579703356051272579675394697967793964017693960501543951575023616.000000
iteration 200 / 400: loss 69574037986843788178992028636927179301117440151894247764316662069089407362101087111818212581037293583339988064902911426324551188362169810616711733705403302371092512028006323646598515258293838895608924391316493554774832037226348544.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1177.715481
iteration 100 / 400: loss 14875553276455418634132976009048674662877296084484264560922818877271725124274160200645368573746703350979172957322989224403385302755835904.000000
iteration 200 / 400: loss 184169682419618131208589509765015245803224795199290332339056949053067434224384670350943757062012553050808013400799656393518601426507305266928249027304082280490253561409435533612105446231934604038601854502272250434063068502093541646427920735565052471367165767354951925760.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1373.396179
iteration 100 / 400: loss 183070538328558363279377714485262110042493156168679924838615608691718604214137598233763187344215484597035627347978715953584137566082812423291642734182400.000000
iteration 200 / 400: loss 24261206288051711509591467029576603927624101210662861211886011446834220895776002170618266706488494586077084105904476898036988425195756028356055023514812650941862394250942710295836542314744035922867448455970610448830466656145788260427127872485923766715537183157214859508125682857413769174544665146294272.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1569.853365
iteration 100 / 400: loss 7030375978099262536015007753159706259579840811290846768241852220466847379943675565271368342884018109669934911693567678808463940410010573190762875134191249927815823360.000000
iteration 200 / 400: loss inf
iteration 300 / 400: loss inf
iteration 0 / 400: loss 780.315331
iteration 100 / 400: loss 2074575703641374225480580519796320313606602924083287251401687742562031638753383333271329410480029958877874840679272316665856.000000
iteration 200 / 400: loss 5357072857810001198037481512360725742762149370687549780293519085611732291148612392072890083523335385326353156817012423745878414737136678084955721326695071012618727916597778549879032529661453595364340821128769813337757382116281025795933765369856.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 972.394660
iteration 100 / 400: loss 1058981567302432102514020944980076439739937727124561551851575352461158450439720685282741341981428063979048139748269298225303933308067734848094076928.000000
iteration 200 / 400: loss 1139591187088160429692386496420458450450311760808546014131397515215892466813035654438383807451898712013635775656180746764315402785095172691439745304371050320644332006305022251958244911944469580929590376536976474061173647470818875889575780306145348043183248252451788226057237351292687133704192.000000
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1167.233264
iteration 100 / 400: loss 4492628850428192730116215872761316127163889224106055227288555476365858470486138057937165590555289285616590743390000073910725445608411001832067680701964345209803243520.000000
iteration 200 / 400: loss inf
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1372.474282
iteration 100 / 400: loss 10033266628346330600906794034850149203466604058578068869020398432985394035249769953084791207964001542440229907223043255403361396281094182113647648235876929145809514738946160967286784.000000
iteration 200 / 400: loss inf
iteration 300 / 400: loss inf
iteration 0 / 400: loss 1548.576884
iteration 100 / 400: loss 109321064336806221716516668937575783970116226015549131694091778202644170816138410567245481164760314202204567080228707240310778359945724922669748287355775933838450510038655243013918274613485240320.000000
iteration 200 / 400: loss inf
iteration 300 / 400: loss inf
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.354837 val accuracy: 0.368000
lr 1.000000e-07 reg 6.250000e+04 train accuracy: 0.348510 val accuracy: 0.365000
lr 1.000000e-07 reg 7.500000e+04 train accuracy: 0.346408 val accuracy: 0.362000
lr 1.000000e-07 reg 8.750000e+04 train accuracy: 0.345755 val accuracy: 0.354000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.341939 val accuracy: 0.359000
lr 1.257500e-05 reg 5.000000e+04 train accuracy: 0.169224 val accuracy: 0.190000
lr 1.257500e-05 reg 6.250000e+04 train accuracy: 0.112408 val accuracy: 0.119000
lr 1.257500e-05 reg 7.500000e+04 train accuracy: 0.146245 val accuracy: 0.138000
lr 1.257500e-05 reg 8.750000e+04 train accuracy: 0.083592 val accuracy: 0.089000
lr 1.257500e-05 reg 1.000000e+05 train accuracy: 0.049143 val accuracy: 0.048000
lr 2.505000e-05 reg 5.000000e+04 train accuracy: 0.101796 val accuracy: 0.092000
lr 2.505000e-05 reg 6.250000e+04 train accuracy: 0.050449 val accuracy: 0.055000
lr 2.505000e-05 reg 7.500000e+04 train accuracy: 0.049510 val accuracy: 0.050000
lr 2.505000e-05 reg 8.750000e+04 train accuracy: 0.052367 val accuracy: 0.052000
lr 2.505000e-05 reg 1.000000e+05 train accuracy: 0.064388 val accuracy: 0.074000
lr 3.752500e-05 reg 5.000000e+04 train accuracy: 0.051347 val accuracy: 0.048000
lr 3.752500e-05 reg 6.250000e+04 train accuracy: 0.056490 val accuracy: 0.051000
lr 3.752500e-05 reg 7.500000e+04 train accuracy: 0.050224 val accuracy: 0.046000
lr 3.752500e-05 reg 8.750000e+04 train accuracy: 0.076224 val accuracy: 0.066000
lr 3.752500e-05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.054857 val accuracy: 0.065000
lr 5.000000e-05 reg 6.250000e+04 train accuracy: 0.055714 val accuracy: 0.062000
lr 5.000000e-05 reg 7.500000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 8.750000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
best validation accuracy achieved during cross-validation: 0.368000

In [97]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [98]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.357000

In [99]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: I can see contour of the object (for some, like horse). Because the weight is thriving to match all the objects of a class, so it is more generalized pictures (two heads horse)