# Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

• implement a fully-vectorized loss function for the SVM
• implement the fully-vectorized expression for its analytic gradient
• check your implementation using numerical gradient
• use a validation set to tune the learning rate and regularization strength
• optimize the loss function with SGD
• visualize the final learned weights
``````

In [1]:

# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/q uestions/1907993/autoreload-of-modules-in-ipython

``````

``````

In [2]:

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape

``````
``````

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

``````
``````

In [3]:

# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y + 1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()

``````
``````

``````
``````

In [4]:

# Subsample the data for more efficient code execution in this exercise.
num_training = 49000
num_validation = 1000
num_test = 1000

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)

# Our training set will be the first num_train points from the original
# training set.

# We use the first num_test points of the original test set as our
# test set.

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape

``````
``````

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

``````
``````

In [5]:

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape

``````
``````

Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)

``````
``````

In [6]:

# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image

``````
``````

[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

Out[6]:

<matplotlib.image.AxesImage at 0x10d922250>

``````
``````

In [7]:

# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image

``````
``````

In [8]:

# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
# Also, lets transform both data matrices so that each image is a column.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]).T
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]).T
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]).T

print X_train.shape, X_val.shape, X_test.shape

``````
``````

(3073, 49000) (3073, 1000) (3073, 1000)

``````

## SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function `compute_loss_naive` which uses for loops to evaluate the multiclass SVM loss function.

``````

In [9]:

# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(10, 3073) * 0.0001
loss, grad = svm_loss_naive(W, X_train, y_train, 0.00001)
print 'loss: %f' % (loss, )

``````
``````

loss: 9.412323

``````

The `grad` returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function `svm_loss_naive`. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

``````

In [10]:

# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_train, y_train, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
f = lambda w: svm_loss_naive(w, X_train, y_train, 0.0)[0]

``````
``````

numerical: 14.764403 analytic: 14.765492, relative error: 3.687769e-05
numerical: 0.165788 analytic: 0.163933, relative error: 5.623352e-03
numerical: 16.241365 analytic: 16.242634, relative error: 3.905905e-05
numerical: -6.092678 analytic: -6.091747, relative error: 7.641090e-05
numerical: 19.380562 analytic: 19.379989, relative error: 1.479997e-05
numerical: 10.216694 analytic: 10.217574, relative error: 4.309118e-05
numerical: -26.268705 analytic: -26.272736, relative error: 7.672983e-05
numerical: 8.286611 analytic: 8.285782, relative error: 5.004147e-05
numerical: 20.309228 analytic: 20.310692, relative error: 3.604600e-05
numerical: 17.586527 analytic: 17.586901, relative error: 1.064673e-05

``````

### Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

``````

In [11]:

# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)

``````
``````

Naive loss: 9.412323e+00 computed in 4.908729s
Vectorized loss: 9.412323e+00 computed in 0.198058s
difference: -0.000000

``````
``````

In [12]:

# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_train, y_train, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
print 'difference: %f' % difference

``````
``````

Naive loss and gradient: computed in 5.215318s
Vectorized loss and gradient: computed in 0.197088s
difference: 0.000000

``````

### Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

``````

In [13]:

# Now implement SGD in LinearSVM.train() function and run it with the code below
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)

``````
``````

iteration 0 / 1500: loss 786.861178
iteration 100 / 1500: loss 287.908364
iteration 200 / 1500: loss 107.463909
iteration 300 / 1500: loss 42.860696
iteration 400 / 1500: loss 18.823382
iteration 500 / 1500: loss 10.279445
iteration 600 / 1500: loss 7.615086
iteration 700 / 1500: loss 5.743896
iteration 800 / 1500: loss 5.836987
iteration 900 / 1500: loss 5.865234
iteration 1000 / 1500: loss 5.370779
iteration 1100 / 1500: loss 5.340555
iteration 1200 / 1500: loss 5.342801
iteration 1300 / 1500: loss 5.469155
iteration 1400 / 1500: loss 5.381996
That took 4.697144s

``````
``````

In [14]:

# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')

``````
``````

Out[14]:

<matplotlib.text.Text at 0x10d6fd690>

``````
``````

In [15]:

# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )

``````
``````

training accuracy: 0.366571
validation accuracy: 0.384000

``````
``````

In [16]:

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
#learning_rates = [1e-7, 5e-5]
#regularization_strengths = [5e4, 1e5]
learning_rates = [4e-8, 4.5e-8, 5e-8, 1e-7]
regularization_strengths = [3.5e4, 4e4, 4.2e4, 4.5e4, 5e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for learning_rate in learning_rates:
for regularization_strength in regularization_strengths:
svm = LinearSVM()
loss_hist = svm.train(X_train, y_train, learning_rate, reg=regularization_strength,
num_iters=1500, verbose=True)
y_train_pred = svm.predict(X_train)
y_val_pred = svm.predict(X_val)
training_accuracy = np.mean(y_train == y_train_pred)
validation_accuracy = np.mean(y_val == y_val_pred)
results[(learning_rate, regularization_strength)] = (training_accuracy, validation_accuracy)
if best_val < validation_accuracy:
best_val = validation_accuracy
best_svm = svm
pass
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val

``````
``````

iteration 0 / 1500: loss 559.303376
iteration 100 / 1500: loss 416.005681
iteration 200 / 1500: loss 313.892579
iteration 300 / 1500: loss 238.193532
iteration 400 / 1500: loss 179.930548
iteration 500 / 1500: loss 137.239734
iteration 600 / 1500: loss 103.553186
iteration 700 / 1500: loss 79.961055
iteration 800 / 1500: loss 61.998010
iteration 900 / 1500: loss 47.762448
iteration 1000 / 1500: loss 37.191940
iteration 1100 / 1500: loss 29.251723
iteration 1200 / 1500: loss 23.701944
iteration 1300 / 1500: loss 18.772466
iteration 1400 / 1500: loss 15.662789
iteration 0 / 1500: loss 633.745639
iteration 100 / 1500: loss 456.685443
iteration 200 / 1500: loss 330.890071
iteration 300 / 1500: loss 240.974210
iteration 400 / 1500: loss 175.018154
iteration 500 / 1500: loss 128.331503
iteration 600 / 1500: loss 95.315227
iteration 700 / 1500: loss 68.992148
iteration 800 / 1500: loss 51.945108
iteration 900 / 1500: loss 39.160615
iteration 1000 / 1500: loss 29.364996
iteration 1100 / 1500: loss 22.452034
iteration 1200 / 1500: loss 18.032934
iteration 1300 / 1500: loss 14.738744
iteration 1400 / 1500: loss 12.286064
iteration 0 / 1500: loss 678.823579
iteration 100 / 1500: loss 477.257247
iteration 200 / 1500: loss 342.162912
iteration 300 / 1500: loss 244.437698
iteration 400 / 1500: loss 174.737571
iteration 500 / 1500: loss 125.744242
iteration 600 / 1500: loss 91.199187
iteration 700 / 1500: loss 67.545525
iteration 800 / 1500: loss 49.096932
iteration 900 / 1500: loss 35.845373
iteration 1000 / 1500: loss 26.776839
iteration 1100 / 1500: loss 21.022118
iteration 1200 / 1500: loss 16.473183
iteration 1300 / 1500: loss 12.875678
iteration 1400 / 1500: loss 11.156501
iteration 0 / 1500: loss 697.962006
iteration 100 / 1500: loss 484.264416
iteration 200 / 1500: loss 337.343366
iteration 300 / 1500: loss 237.063589
iteration 400 / 1500: loss 166.321456
iteration 500 / 1500: loss 116.637815
iteration 600 / 1500: loss 83.781756
iteration 700 / 1500: loss 59.398412
iteration 800 / 1500: loss 43.154076
iteration 900 / 1500: loss 31.675554
iteration 1000 / 1500: loss 23.057201
iteration 1100 / 1500: loss 17.575928
iteration 1200 / 1500: loss 14.088401
iteration 1300 / 1500: loss 11.669125
iteration 1400 / 1500: loss 9.461645
iteration 0 / 1500: loss 787.387296
iteration 100 / 1500: loss 523.193480
iteration 200 / 1500: loss 350.928115
iteration 300 / 1500: loss 236.161473
iteration 400 / 1500: loss 159.877632
iteration 500 / 1500: loss 108.654974
iteration 600 / 1500: loss 74.000015
iteration 700 / 1500: loss 51.337414
iteration 800 / 1500: loss 36.024754
iteration 900 / 1500: loss 25.825661
iteration 1000 / 1500: loss 18.678307
iteration 1100 / 1500: loss 14.186439
iteration 1200 / 1500: loss 11.296756
iteration 1300 / 1500: loss 9.294398
iteration 1400 / 1500: loss 8.892369
iteration 0 / 1500: loss 554.771944
iteration 100 / 1500: loss 401.465465
iteration 200 / 1500: loss 293.017027
iteration 300 / 1500: loss 213.378283
iteration 400 / 1500: loss 157.469484
iteration 500 / 1500: loss 115.395834
iteration 600 / 1500: loss 84.978190
iteration 700 / 1500: loss 62.611958
iteration 800 / 1500: loss 46.841990
iteration 900 / 1500: loss 35.527049
iteration 1000 / 1500: loss 27.602928
iteration 1100 / 1500: loss 21.154499
iteration 1200 / 1500: loss 16.727642
iteration 1300 / 1500: loss 14.030852
iteration 1400 / 1500: loss 11.131477
iteration 0 / 1500: loss 634.996256
iteration 100 / 1500: loss 439.160751
iteration 200 / 1500: loss 305.480920
iteration 300 / 1500: loss 213.597972
iteration 400 / 1500: loss 151.051934
iteration 500 / 1500: loss 106.122318
iteration 600 / 1500: loss 75.674901
iteration 700 / 1500: loss 54.137957
iteration 800 / 1500: loss 39.161591
iteration 900 / 1500: loss 28.265731
iteration 1000 / 1500: loss 21.542265
iteration 1100 / 1500: loss 16.602367
iteration 1200 / 1500: loss 12.557957
iteration 1300 / 1500: loss 10.086332
iteration 1400 / 1500: loss 8.703447
iteration 0 / 1500: loss 677.770327
iteration 100 / 1500: loss 454.235958
iteration 200 / 1500: loss 311.289987
iteration 300 / 1500: loss 214.157071
iteration 400 / 1500: loss 147.235833
iteration 500 / 1500: loss 103.020126
iteration 600 / 1500: loss 71.870653
iteration 700 / 1500: loss 50.394997
iteration 800 / 1500: loss 36.334690
iteration 900 / 1500: loss 26.189660
iteration 1000 / 1500: loss 19.381889
iteration 1100 / 1500: loss 15.358749
iteration 1200 / 1500: loss 12.286280
iteration 1300 / 1500: loss 10.000576
iteration 1400 / 1500: loss 8.513729
iteration 0 / 1500: loss 705.485304
iteration 100 / 1500: loss 464.135056
iteration 200 / 1500: loss 309.150363
iteration 300 / 1500: loss 207.948614
iteration 400 / 1500: loss 139.969177
iteration 500 / 1500: loss 94.807176
iteration 600 / 1500: loss 64.708922
iteration 700 / 1500: loss 44.857045
iteration 800 / 1500: loss 31.449983
iteration 900 / 1500: loss 22.939133
iteration 1000 / 1500: loss 16.988621
iteration 1100 / 1500: loss 13.676860
iteration 1200 / 1500: loss 10.312034
iteration 1300 / 1500: loss 9.085627
iteration 1400 / 1500: loss 7.489684
iteration 0 / 1500: loss 787.329372
iteration 100 / 1500: loss 497.185990
iteration 200 / 1500: loss 316.878608
iteration 300 / 1500: loss 203.428353
iteration 400 / 1500: loss 131.362743
iteration 500 / 1500: loss 84.222996
iteration 600 / 1500: loss 55.610410
iteration 700 / 1500: loss 37.864800
iteration 800 / 1500: loss 25.906750
iteration 900 / 1500: loss 17.717565
iteration 1000 / 1500: loss 12.704571
iteration 1100 / 1500: loss 11.249249
iteration 1200 / 1500: loss 8.508504
iteration 1300 / 1500: loss 7.436471
iteration 1400 / 1500: loss 6.257996
iteration 0 / 1500: loss 558.954056
iteration 100 / 1500: loss 389.433442
iteration 200 / 1500: loss 274.303358
iteration 300 / 1500: loss 193.353751
iteration 400 / 1500: loss 138.337466
iteration 500 / 1500: loss 97.976358
iteration 600 / 1500: loss 70.601718
iteration 700 / 1500: loss 50.797455
iteration 800 / 1500: loss 37.475812
iteration 900 / 1500: loss 27.272471
iteration 1000 / 1500: loss 21.065088
iteration 1100 / 1500: loss 16.017097
iteration 1200 / 1500: loss 13.445468
iteration 1300 / 1500: loss 10.762730
iteration 1400 / 1500: loss 9.474508
iteration 0 / 1500: loss 643.993673
iteration 100 / 1500: loss 424.941648
iteration 200 / 1500: loss 285.487786
iteration 300 / 1500: loss 191.759734
iteration 400 / 1500: loss 129.335787
iteration 500 / 1500: loss 88.158461
iteration 600 / 1500: loss 60.514963
iteration 700 / 1500: loss 42.472514
iteration 800 / 1500: loss 30.172704
iteration 900 / 1500: loss 21.357810
iteration 1000 / 1500: loss 15.872091
iteration 1100 / 1500: loss 11.816851
iteration 1200 / 1500: loss 9.796439
iteration 1300 / 1500: loss 8.863475
iteration 1400 / 1500: loss 7.651424
iteration 0 / 1500: loss 662.773981
iteration 100 / 1500: loss 431.087636
iteration 200 / 1500: loss 282.620740
iteration 300 / 1500: loss 186.906412
iteration 400 / 1500: loss 124.062454
iteration 500 / 1500: loss 83.457289
iteration 600 / 1500: loss 56.118060
iteration 700 / 1500: loss 38.837832
iteration 800 / 1500: loss 26.754141
iteration 900 / 1500: loss 19.771358
iteration 1000 / 1500: loss 14.487688
iteration 1100 / 1500: loss 11.228882
iteration 1200 / 1500: loss 9.482285
iteration 1300 / 1500: loss 7.760125
iteration 1400 / 1500: loss 7.246904
iteration 0 / 1500: loss 710.884195
iteration 100 / 1500: loss 448.217685
iteration 200 / 1500: loss 284.872847
iteration 300 / 1500: loss 184.182285
iteration 400 / 1500: loss 117.969752
iteration 500 / 1500: loss 75.789619
iteration 600 / 1500: loss 50.389173
iteration 700 / 1500: loss 33.980194
iteration 800 / 1500: loss 22.965417
iteration 900 / 1500: loss 16.874277
iteration 1000 / 1500: loss 12.497870
iteration 1100 / 1500: loss 10.311296
iteration 1200 / 1500: loss 8.076751
iteration 1300 / 1500: loss 7.056721
iteration 1400 / 1500: loss 6.496265
iteration 0 / 1500: loss 796.357750
iteration 100 / 1500: loss 477.740407
iteration 200 / 1500: loss 291.397173
iteration 300 / 1500: loss 177.533041
iteration 400 / 1500: loss 109.080533
iteration 500 / 1500: loss 67.793709
iteration 600 / 1500: loss 43.311710
iteration 700 / 1500: loss 28.015904
iteration 800 / 1500: loss 18.947429
iteration 900 / 1500: loss 13.080747
iteration 1000 / 1500: loss 10.420865
iteration 1100 / 1500: loss 8.470709
iteration 1200 / 1500: loss 6.680575
iteration 1300 / 1500: loss 6.414165
iteration 1400 / 1500: loss 5.947784
iteration 0 / 1500: loss 557.053580
iteration 100 / 1500: loss 273.151318
iteration 200 / 1500: loss 137.625396
iteration 300 / 1500: loss 69.906353
iteration 400 / 1500: loss 37.177643
iteration 500 / 1500: loss 21.535254
iteration 600 / 1500: loss 13.713506
iteration 700 / 1500: loss 9.069877
iteration 800 / 1500: loss 6.898965
iteration 900 / 1500: loss 5.540126
iteration 1000 / 1500: loss 5.444497
iteration 1100 / 1500: loss 4.658507
iteration 1200 / 1500: loss 5.896886
iteration 1300 / 1500: loss 5.409469
iteration 1400 / 1500: loss 5.170397
iteration 0 / 1500: loss 635.640367
iteration 100 / 1500: loss 281.031259
iteration 200 / 1500: loss 127.416730
iteration 300 / 1500: loss 59.678418
iteration 400 / 1500: loss 28.898294
iteration 500 / 1500: loss 15.670362
iteration 600 / 1500: loss 9.671593
iteration 700 / 1500: loss 7.585156
iteration 800 / 1500: loss 6.385190
iteration 900 / 1500: loss 5.715509
iteration 1000 / 1500: loss 5.226349
iteration 1100 / 1500: loss 5.659751
iteration 1200 / 1500: loss 5.405594
iteration 1300 / 1500: loss 5.336888
iteration 1400 / 1500: loss 4.973955
iteration 0 / 1500: loss 659.300538
iteration 100 / 1500: loss 281.788739
iteration 200 / 1500: loss 122.876973
iteration 300 / 1500: loss 56.015033
iteration 400 / 1500: loss 27.155486
iteration 500 / 1500: loss 14.863941
iteration 600 / 1500: loss 9.636187
iteration 700 / 1500: loss 6.965402
iteration 800 / 1500: loss 6.444806
iteration 900 / 1500: loss 5.481245
iteration 1000 / 1500: loss 5.810821
iteration 1100 / 1500: loss 5.100915
iteration 1200 / 1500: loss 4.993043
iteration 1300 / 1500: loss 5.299969
iteration 1400 / 1500: loss 5.272552
iteration 0 / 1500: loss 712.811602
iteration 100 / 1500: loss 287.412675
iteration 200 / 1500: loss 118.361118
iteration 300 / 1500: loss 50.537931
iteration 400 / 1500: loss 23.501743
iteration 500 / 1500: loss 13.207205
iteration 600 / 1500: loss 8.283343
iteration 700 / 1500: loss 6.857469
iteration 800 / 1500: loss 5.976765
iteration 900 / 1500: loss 5.690310
iteration 1000 / 1500: loss 5.288955
iteration 1100 / 1500: loss 5.432422
iteration 1200 / 1500: loss 5.553799
iteration 1300 / 1500: loss 5.029507
iteration 1400 / 1500: loss 5.672829
iteration 0 / 1500: loss 781.952032
iteration 100 / 1500: loss 284.165691
iteration 200 / 1500: loss 106.919176
iteration 300 / 1500: loss 41.554046
iteration 400 / 1500: loss 18.657291
iteration 500 / 1500: loss 10.555227
iteration 600 / 1500: loss 6.980123
iteration 700 / 1500: loss 5.690448
iteration 800 / 1500: loss 5.707124
iteration 900 / 1500: loss 6.091408
iteration 1000 / 1500: loss 5.124471
iteration 1100 / 1500: loss 5.354539
iteration 1200 / 1500: loss 5.469390
iteration 1300 / 1500: loss 5.184152
iteration 1400 / 1500: loss 5.241452
lr 4.000000e-08 reg 3.500000e+04 train accuracy: 0.367673 val accuracy: 0.386000
lr 4.000000e-08 reg 4.000000e+04 train accuracy: 0.370939 val accuracy: 0.383000
lr 4.000000e-08 reg 4.200000e+04 train accuracy: 0.369429 val accuracy: 0.386000
lr 4.000000e-08 reg 4.500000e+04 train accuracy: 0.368939 val accuracy: 0.378000
lr 4.000000e-08 reg 5.000000e+04 train accuracy: 0.374224 val accuracy: 0.383000
lr 4.500000e-08 reg 3.500000e+04 train accuracy: 0.369816 val accuracy: 0.380000
lr 4.500000e-08 reg 4.000000e+04 train accuracy: 0.374816 val accuracy: 0.381000
lr 4.500000e-08 reg 4.200000e+04 train accuracy: 0.372020 val accuracy: 0.378000
lr 4.500000e-08 reg 4.500000e+04 train accuracy: 0.374633 val accuracy: 0.369000
lr 4.500000e-08 reg 5.000000e+04 train accuracy: 0.371245 val accuracy: 0.379000
lr 5.000000e-08 reg 3.500000e+04 train accuracy: 0.373796 val accuracy: 0.391000
lr 5.000000e-08 reg 4.000000e+04 train accuracy: 0.374367 val accuracy: 0.366000
lr 5.000000e-08 reg 4.200000e+04 train accuracy: 0.373306 val accuracy: 0.381000
lr 5.000000e-08 reg 4.500000e+04 train accuracy: 0.367653 val accuracy: 0.383000
lr 5.000000e-08 reg 5.000000e+04 train accuracy: 0.373408 val accuracy: 0.390000
lr 1.000000e-07 reg 3.500000e+04 train accuracy: 0.374306 val accuracy: 0.379000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.368449 val accuracy: 0.361000
lr 1.000000e-07 reg 4.200000e+04 train accuracy: 0.373000 val accuracy: 0.381000
lr 1.000000e-07 reg 4.500000e+04 train accuracy: 0.360857 val accuracy: 0.380000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.367878 val accuracy: 0.383000
best validation accuracy achieved during cross-validation: 0.391000

``````
``````

In [17]:

# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
sz = [results[x][0]*1500 for x in results] # default size of markers is 20
plt.subplot(1,2,1)
plt.scatter(x_scatter, y_scatter, sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
sz = [results[x][1]*1500 for x in results] # default size of markers is 20
plt.subplot(1,2,2)
plt.scatter(x_scatter, y_scatter, sz)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')

``````
``````

Out[17]:

<matplotlib.text.Text at 0x1114470d0>

``````
``````

In [18]:

# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy

``````
``````

linear SVM on raw pixels final test set accuracy: 0.361000

``````
``````

In [19]:

# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:,:-1] # strip out the bias
w = w.reshape(10, 32, 32, 3)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
plt.subplot(2, 5, i + 1)

# Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])

``````
``````

``````

### Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

``````

In [ ]:

``````