Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing


In [2]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print 'Training data shape: ', X_train.shape
print 'Training labels shape: ', y_train.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()



In [4]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

In [5]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print 'Training data shape: ', X_train.shape
print 'Validation data shape: ', X_val.shape
print 'Test data shape: ', X_test.shape
print 'dev data shape: ', X_dev.shape


Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

In [6]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print mean_image[:10] # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()


[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]

In [7]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

In [8]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print X_train.shape, X_val.shape, X_test.shape, X_dev.shape


(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.


In [9]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
print 'loss: %f' % (loss, )


loss: 9.229963

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:


In [10]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad)


numerical: -4.993348 analytic: -4.993348, relative error: 5.627871e-11
numerical: -36.570101 analytic: -36.570101, relative error: 1.054396e-11
numerical: 4.039022 analytic: 4.039022, relative error: 2.542092e-11
numerical: 22.658329 analytic: 22.658329, relative error: 1.329583e-11
numerical: 13.711751 analytic: 13.711751, relative error: 8.934478e-12
numerical: 8.246012 analytic: 8.246012, relative error: 2.403399e-11
numerical: 24.136380 analytic: 24.136380, relative error: 2.968963e-12
numerical: -36.570101 analytic: -36.570101, relative error: 1.054396e-11
numerical: 22.760731 analytic: 22.760731, relative error: 2.016756e-12
numerical: -39.483707 analytic: -39.483707, relative error: 1.012986e-11
numerical: 8.345275 analytic: 8.345275, relative error: 8.670188e-11
numerical: -10.370353 analytic: -10.370353, relative error: 2.660111e-11
numerical: 37.593599 analytic: 37.593599, relative error: 6.740721e-12
numerical: 44.207399 analytic: 44.207399, relative error: 5.730002e-14
numerical: -51.043851 analytic: -51.043851, relative error: 2.877940e-12
numerical: 8.852396 analytic: 8.852396, relative error: 3.862280e-11
numerical: 12.138416 analytic: 12.138416, relative error: 2.001757e-11
numerical: -9.184341 analytic: -9.184341, relative error: 2.285205e-11
numerical: 14.561821 analytic: 14.561821, relative error: 1.893358e-11
numerical: 31.620198 analytic: 31.620198, relative error: 8.955266e-12

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.


In [11]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# The losses should match but your vectorized implementation should be much faster.
print 'difference: %f' % (loss_naive - loss_vectorized)


Naive loss: 9.229963e+00 computed in 0.199664s
Vectorized loss: 9.229963e+00 computed in 0.047812s
difference: -0.000000

In [12]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Naive loss and gradient: computed in %fs' % (toc - tic)

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'difference: %f' % difference


Naive loss and gradient: computed in 0.193300s
Vectorized loss and gradient: computed in 0.007072s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.


In [14]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print 'That took %fs' % (toc - tic)


iteration 0 / 1500: loss 779.944348
iteration 100 / 1500: loss 284.043164
iteration 200 / 1500: loss 107.576681
iteration 300 / 1500: loss 42.545806
iteration 400 / 1500: loss 18.458919
iteration 500 / 1500: loss 10.201366
iteration 600 / 1500: loss 6.940790
iteration 700 / 1500: loss 6.149391
iteration 800 / 1500: loss 5.783997
iteration 900 / 1500: loss 5.207000
iteration 1000 / 1500: loss 5.592372
iteration 1100 / 1500: loss 5.553865
iteration 1200 / 1500: loss 5.226478
iteration 1300 / 1500: loss 4.733123
iteration 1400 / 1500: loss 5.102931
That took 6.957608s

In [15]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()



In [16]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
y_val_pred = svm.predict(X_val)
print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )


training accuracy: 0.366429
validation accuracy: 0.379000

In [18]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [1e-7, 2e-7, 3e-7, 5e-5, 8e-7]
regularization_strengths = [1e4, 2e4, 3e4, 4e4, 5e4, 6e4, 7e4, 8e4, 1e5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
################################################################################
for lr in learning_rates:
    for rs in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train, y_train, learning_rate=lr, reg=rs , num_iters = 2000, verbose = True)
        train_accuracy = np.mean(y_train == svm.predict(X_train))
        val_accuracy = np.mean(y_val == svm.predict(X_val))
        results[(lr, rs)] = (train_accuracy, val_accuracy)
        if val_accuracy > best_val:
            best_val = val_accuracy
            best_svm = svm
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 2000: loss 172.896973
iteration 100 / 2000: loss 134.432869
iteration 200 / 2000: loss 110.668074
iteration 300 / 2000: loss 90.316146
iteration 400 / 2000: loss 74.564921
iteration 500 / 2000: loss 62.105886
iteration 600 / 2000: loss 51.444907
iteration 700 / 2000: loss 42.769136
iteration 800 / 2000: loss 35.724220
iteration 900 / 2000: loss 29.929987
iteration 1000 / 2000: loss 25.189405
iteration 1100 / 2000: loss 21.599907
iteration 1200 / 2000: loss 18.428614
iteration 1300 / 2000: loss 15.872154
iteration 1400 / 2000: loss 13.720776
iteration 1500 / 2000: loss 11.615432
iteration 1600 / 2000: loss 10.461508
iteration 1700 / 2000: loss 9.469544
iteration 1800 / 2000: loss 8.957387
iteration 1900 / 2000: loss 7.876502
iteration 0 / 2000: loss 326.968424
iteration 100 / 2000: loss 214.932675
iteration 200 / 2000: loss 144.335895
iteration 300 / 2000: loss 98.814036
iteration 400 / 2000: loss 66.407740
iteration 500 / 2000: loss 46.187012
iteration 600 / 2000: loss 32.087010
iteration 700 / 2000: loss 23.144325
iteration 800 / 2000: loss 16.643969
iteration 900 / 2000: loss 12.096660
iteration 1000 / 2000: loss 10.196563
iteration 1100 / 2000: loss 8.504757
iteration 1200 / 2000: loss 6.955389
iteration 1300 / 2000: loss 6.319580
iteration 1400 / 2000: loss 6.276571
iteration 1500 / 2000: loss 5.200926
iteration 1600 / 2000: loss 5.008029
iteration 1700 / 2000: loss 5.143802
iteration 1800 / 2000: loss 5.311964
iteration 1900 / 2000: loss 4.363526
iteration 0 / 2000: loss 475.408270
iteration 100 / 2000: loss 256.340132
iteration 200 / 2000: loss 142.137225
iteration 300 / 2000: loss 79.702002
iteration 400 / 2000: loss 45.812405
iteration 500 / 2000: loss 26.896466
iteration 600 / 2000: loss 17.364256
iteration 700 / 2000: loss 11.328711
iteration 800 / 2000: loss 8.636754
iteration 900 / 2000: loss 7.009739
iteration 1000 / 2000: loss 6.575672
iteration 1100 / 2000: loss 5.816622
iteration 1200 / 2000: loss 5.499847
iteration 1300 / 2000: loss 5.506267
iteration 1400 / 2000: loss 5.162973
iteration 1500 / 2000: loss 5.592010
iteration 1600 / 2000: loss 4.815629
iteration 1700 / 2000: loss 5.040621
iteration 1800 / 2000: loss 4.990934
iteration 1900 / 2000: loss 5.220673
iteration 0 / 2000: loss 626.487957
iteration 100 / 2000: loss 277.721479
iteration 200 / 2000: loss 126.274414
iteration 300 / 2000: loss 59.153967
iteration 400 / 2000: loss 29.487873
iteration 500 / 2000: loss 15.515276
iteration 600 / 2000: loss 9.672121
iteration 700 / 2000: loss 7.674430
iteration 800 / 2000: loss 6.077070
iteration 900 / 2000: loss 5.539737
iteration 1000 / 2000: loss 4.750750
iteration 1100 / 2000: loss 5.346731
iteration 1200 / 2000: loss 5.143237
iteration 1300 / 2000: loss 5.341186
iteration 1400 / 2000: loss 5.623238
iteration 1500 / 2000: loss 5.592225
iteration 1600 / 2000: loss 5.785487
iteration 1700 / 2000: loss 5.027811
iteration 1800 / 2000: loss 5.324659
iteration 1900 / 2000: loss 5.254040
iteration 0 / 2000: loss 792.626554
iteration 100 / 2000: loss 288.578127
iteration 200 / 2000: loss 109.009159
iteration 300 / 2000: loss 42.119308
iteration 400 / 2000: loss 18.811963
iteration 500 / 2000: loss 9.577460
iteration 600 / 2000: loss 7.966576
iteration 700 / 2000: loss 5.866986
iteration 800 / 2000: loss 5.623822
iteration 900 / 2000: loss 5.233873
iteration 1000 / 2000: loss 5.230725
iteration 1100 / 2000: loss 5.332421
iteration 1200 / 2000: loss 5.008694
iteration 1300 / 2000: loss 5.526711
iteration 1400 / 2000: loss 5.404666
iteration 1500 / 2000: loss 5.391669
iteration 1600 / 2000: loss 5.582650
iteration 1700 / 2000: loss 5.383716
iteration 1800 / 2000: loss 5.469272
iteration 1900 / 2000: loss 5.281695
iteration 0 / 2000: loss 940.185480
iteration 100 / 2000: loss 281.100721
iteration 200 / 2000: loss 87.067158
iteration 300 / 2000: loss 29.533107
iteration 400 / 2000: loss 12.466131
iteration 500 / 2000: loss 7.154661
iteration 600 / 2000: loss 6.224177
iteration 700 / 2000: loss 5.201932
iteration 800 / 2000: loss 5.198218
iteration 900 / 2000: loss 5.740315
iteration 1000 / 2000: loss 5.827007
iteration 1100 / 2000: loss 5.370228
iteration 1200 / 2000: loss 5.834143
iteration 1300 / 2000: loss 5.574865
iteration 1400 / 2000: loss 5.898814
iteration 1500 / 2000: loss 5.748186
iteration 1600 / 2000: loss 5.554839
iteration 1700 / 2000: loss 5.579266
iteration 1800 / 2000: loss 5.676684
iteration 1900 / 2000: loss 5.613332
iteration 0 / 2000: loss 1092.856423
iteration 100 / 2000: loss 267.760944
iteration 200 / 2000: loss 69.391800
iteration 300 / 2000: loss 20.910541
iteration 400 / 2000: loss 8.835968
iteration 500 / 2000: loss 6.739813
iteration 600 / 2000: loss 6.363295
iteration 700 / 2000: loss 5.087427
iteration 800 / 2000: loss 5.694421
iteration 900 / 2000: loss 5.393089
iteration 1000 / 2000: loss 5.630716
iteration 1100 / 2000: loss 6.026007
iteration 1200 / 2000: loss 4.801178
iteration 1300 / 2000: loss 5.443795
iteration 1400 / 2000: loss 5.699370
iteration 1500 / 2000: loss 5.750261
iteration 1600 / 2000: loss 5.655400
iteration 1700 / 2000: loss 5.178623
iteration 1800 / 2000: loss 5.358200
iteration 1900 / 2000: loss 5.559058
iteration 0 / 2000: loss 1250.614880
iteration 100 / 2000: loss 250.469997
iteration 200 / 2000: loss 53.859126
iteration 300 / 2000: loss 15.114467
iteration 400 / 2000: loss 7.296348
iteration 500 / 2000: loss 6.374242
iteration 600 / 2000: loss 5.580239
iteration 700 / 2000: loss 6.388365
iteration 800 / 2000: loss 5.974861
iteration 900 / 2000: loss 5.694204
iteration 1000 / 2000: loss 5.207846
iteration 1100 / 2000: loss 5.618363
iteration 1200 / 2000: loss 5.802286
iteration 1300 / 2000: loss 5.275415
iteration 1400 / 2000: loss 6.376112
iteration 1500 / 2000: loss 5.118300
iteration 1600 / 2000: loss 5.698176
iteration 1700 / 2000: loss 5.161125
iteration 1800 / 2000: loss 5.438395
iteration 1900 / 2000: loss 5.469756
iteration 0 / 2000: loss 1550.589117
iteration 100 / 2000: loss 209.828393
iteration 200 / 2000: loss 33.219160
iteration 300 / 2000: loss 8.956647
iteration 400 / 2000: loss 5.591171
iteration 500 / 2000: loss 5.565076
iteration 600 / 2000: loss 5.262998
iteration 700 / 2000: loss 5.429268
iteration 800 / 2000: loss 5.592916
iteration 900 / 2000: loss 5.391928
iteration 1000 / 2000: loss 5.646755
iteration 1100 / 2000: loss 5.241269
iteration 1200 / 2000: loss 5.487419
iteration 1300 / 2000: loss 5.692707
iteration 1400 / 2000: loss 5.968674
iteration 1500 / 2000: loss 5.808020
iteration 1600 / 2000: loss 5.751925
iteration 1700 / 2000: loss 6.370336
iteration 1800 / 2000: loss 5.771116
iteration 1900 / 2000: loss 5.228094
iteration 0 / 2000: loss 177.448994
iteration 100 / 2000: loss 112.772905
iteration 200 / 2000: loss 74.476904
iteration 300 / 2000: loss 51.645203
iteration 400 / 2000: loss 36.519458
iteration 500 / 2000: loss 25.028796
iteration 600 / 2000: loss 19.633941
iteration 700 / 2000: loss 13.679071
iteration 800 / 2000: loss 11.032384
iteration 900 / 2000: loss 8.798424
iteration 1000 / 2000: loss 7.153928
iteration 1100 / 2000: loss 6.843930
iteration 1200 / 2000: loss 5.840942
iteration 1300 / 2000: loss 5.520296
iteration 1400 / 2000: loss 5.420579
iteration 1500 / 2000: loss 4.277846
iteration 1600 / 2000: loss 5.406242
iteration 1700 / 2000: loss 4.789527
iteration 1800 / 2000: loss 5.099745
iteration 1900 / 2000: loss 4.399656
iteration 0 / 2000: loss 333.400373
iteration 100 / 2000: loss 142.514570
iteration 200 / 2000: loss 66.111091
iteration 300 / 2000: loss 31.918383
iteration 400 / 2000: loss 17.169469
iteration 500 / 2000: loss 10.644329
iteration 600 / 2000: loss 6.908382
iteration 700 / 2000: loss 5.624355
iteration 800 / 2000: loss 5.568043
iteration 900 / 2000: loss 5.129265
iteration 1000 / 2000: loss 5.250476
iteration 1100 / 2000: loss 5.384524
iteration 1200 / 2000: loss 5.186343
iteration 1300 / 2000: loss 4.826588
iteration 1400 / 2000: loss 5.062177
iteration 1500 / 2000: loss 4.773652
iteration 1600 / 2000: loss 4.941873
iteration 1700 / 2000: loss 4.556731
iteration 1800 / 2000: loss 5.641061
iteration 1900 / 2000: loss 5.358117
iteration 0 / 2000: loss 485.627672
iteration 100 / 2000: loss 142.879917
iteration 200 / 2000: loss 46.049923
iteration 300 / 2000: loss 17.282666
iteration 400 / 2000: loss 8.481746
iteration 500 / 2000: loss 6.407639
iteration 600 / 2000: loss 5.205529
iteration 700 / 2000: loss 5.045789
iteration 800 / 2000: loss 4.869735
iteration 900 / 2000: loss 4.730392
iteration 1000 / 2000: loss 5.508883
iteration 1100 / 2000: loss 4.877095
iteration 1200 / 2000: loss 5.581256
iteration 1300 / 2000: loss 5.299078
iteration 1400 / 2000: loss 5.764324
iteration 1500 / 2000: loss 5.090882
iteration 1600 / 2000: loss 6.091145
iteration 1700 / 2000: loss 4.984272
iteration 1800 / 2000: loss 4.848872
iteration 1900 / 2000: loss 5.321631
iteration 0 / 2000: loss 627.630709
iteration 100 / 2000: loss 127.218370
iteration 200 / 2000: loss 29.077785
iteration 300 / 2000: loss 10.157147
iteration 400 / 2000: loss 6.242827
iteration 500 / 2000: loss 5.869745
iteration 600 / 2000: loss 4.982241
iteration 700 / 2000: loss 4.941677
iteration 800 / 2000: loss 5.500538
iteration 900 / 2000: loss 4.917475
iteration 1000 / 2000: loss 5.236854
iteration 1100 / 2000: loss 4.864875
iteration 1200 / 2000: loss 5.791631
iteration 1300 / 2000: loss 5.060718
iteration 1400 / 2000: loss 4.721264
iteration 1500 / 2000: loss 4.812337
iteration 1600 / 2000: loss 5.231560
iteration 1700 / 2000: loss 5.032827
iteration 1800 / 2000: loss 5.096111
iteration 1900 / 2000: loss 5.318513
iteration 0 / 2000: loss 782.634159
iteration 100 / 2000: loss 106.418069
iteration 200 / 2000: loss 18.443821
iteration 300 / 2000: loss 7.340541
iteration 400 / 2000: loss 5.179851
iteration 500 / 2000: loss 5.627744
iteration 600 / 2000: loss 5.371608
iteration 700 / 2000: loss 5.349556
iteration 800 / 2000: loss 5.259811
iteration 900 / 2000: loss 5.504790
iteration 1000 / 2000: loss 4.835267
iteration 1100 / 2000: loss 5.307626
iteration 1200 / 2000: loss 5.523106
iteration 1300 / 2000: loss 5.718040
iteration 1400 / 2000: loss 5.294046
iteration 1500 / 2000: loss 4.994396
iteration 1600 / 2000: loss 5.356729
iteration 1700 / 2000: loss 5.351089
iteration 1800 / 2000: loss 5.086211
iteration 1900 / 2000: loss 5.502960
iteration 0 / 2000: loss 934.948321
iteration 100 / 2000: loss 86.612748
iteration 200 / 2000: loss 12.486820
iteration 300 / 2000: loss 5.983814
iteration 400 / 2000: loss 5.315699
iteration 500 / 2000: loss 5.474425
iteration 600 / 2000: loss 5.912576
iteration 700 / 2000: loss 5.701217
iteration 800 / 2000: loss 5.419671
iteration 900 / 2000: loss 5.046295
iteration 1000 / 2000: loss 5.312710
iteration 1100 / 2000: loss 5.656854
iteration 1200 / 2000: loss 5.823485
iteration 1300 / 2000: loss 6.037468
iteration 1400 / 2000: loss 5.389626
iteration 1500 / 2000: loss 5.981961
iteration 1600 / 2000: loss 4.779463
iteration 1700 / 2000: loss 5.171796
iteration 1800 / 2000: loss 6.046677
iteration 1900 / 2000: loss 5.877114
iteration 0 / 2000: loss 1093.551373
iteration 100 / 2000: loss 68.733500
iteration 200 / 2000: loss 9.099286
iteration 300 / 2000: loss 5.645886
iteration 400 / 2000: loss 5.481962
iteration 500 / 2000: loss 5.291999
iteration 600 / 2000: loss 5.234950
iteration 700 / 2000: loss 5.845590
iteration 800 / 2000: loss 5.860647
iteration 900 / 2000: loss 5.177874
iteration 1000 / 2000: loss 5.655824
iteration 1100 / 2000: loss 5.428767
iteration 1200 / 2000: loss 6.078190
iteration 1300 / 2000: loss 5.763387
iteration 1400 / 2000: loss 5.600996
iteration 1500 / 2000: loss 5.365527
iteration 1600 / 2000: loss 5.587716
iteration 1700 / 2000: loss 4.982773
iteration 1800 / 2000: loss 5.382374
iteration 1900 / 2000: loss 5.273429
iteration 0 / 2000: loss 1244.805456
iteration 100 / 2000: loss 53.568953
iteration 200 / 2000: loss 6.799987
iteration 300 / 2000: loss 5.296315
iteration 400 / 2000: loss 5.470760
iteration 500 / 2000: loss 5.557290
iteration 600 / 2000: loss 5.461127
iteration 700 / 2000: loss 5.358640
iteration 800 / 2000: loss 5.382618
iteration 900 / 2000: loss 5.531626
iteration 1000 / 2000: loss 5.285312
iteration 1100 / 2000: loss 5.587889
iteration 1200 / 2000: loss 5.490078
iteration 1300 / 2000: loss 5.536446
iteration 1400 / 2000: loss 5.362931
iteration 1500 / 2000: loss 6.017304
iteration 1600 / 2000: loss 5.229478
iteration 1700 / 2000: loss 5.525424
iteration 1800 / 2000: loss 5.554954
iteration 1900 / 2000: loss 6.023170
iteration 0 / 2000: loss 1563.199629
iteration 100 / 2000: loss 32.550275
iteration 200 / 2000: loss 6.080864
iteration 300 / 2000: loss 5.506387
iteration 400 / 2000: loss 5.963115
iteration 500 / 2000: loss 5.244128
iteration 600 / 2000: loss 5.694828
iteration 700 / 2000: loss 5.424837
iteration 800 / 2000: loss 5.510264
iteration 900 / 2000: loss 5.721810
iteration 1000 / 2000: loss 5.810768
iteration 1100 / 2000: loss 6.104173
iteration 1200 / 2000: loss 5.908364
iteration 1300 / 2000: loss 6.129623
iteration 1400 / 2000: loss 5.790143
iteration 1500 / 2000: loss 5.878835
iteration 1600 / 2000: loss 5.671972
iteration 1700 / 2000: loss 5.971552
iteration 1800 / 2000: loss 5.695610
iteration 1900 / 2000: loss 5.686303
iteration 0 / 2000: loss 175.042222
iteration 100 / 2000: loss 91.651396
iteration 200 / 2000: loss 51.493294
iteration 300 / 2000: loss 29.630683
iteration 400 / 2000: loss 17.983668
iteration 500 / 2000: loss 12.125590
iteration 600 / 2000: loss 8.664828
iteration 700 / 2000: loss 6.360742
iteration 800 / 2000: loss 6.195568
iteration 900 / 2000: loss 5.453249
iteration 1000 / 2000: loss 4.624299
iteration 1100 / 2000: loss 4.975385
iteration 1200 / 2000: loss 5.274965
iteration 1300 / 2000: loss 4.873362
iteration 1400 / 2000: loss 5.173989
iteration 1500 / 2000: loss 4.478314
iteration 1600 / 2000: loss 5.170376
iteration 1700 / 2000: loss 4.512948
iteration 1800 / 2000: loss 5.289302
iteration 1900 / 2000: loss 5.124049
iteration 0 / 2000: loss 331.998796
iteration 100 / 2000: loss 97.743551
iteration 200 / 2000: loss 32.721419
iteration 300 / 2000: loss 13.048985
iteration 400 / 2000: loss 7.806119
iteration 500 / 2000: loss 6.192099
iteration 600 / 2000: loss 5.480958
iteration 700 / 2000: loss 5.266316
iteration 800 / 2000: loss 4.746604
iteration 900 / 2000: loss 5.114151
iteration 1000 / 2000: loss 5.252001
iteration 1100 / 2000: loss 5.029254
iteration 1200 / 2000: loss 4.758241
iteration 1300 / 2000: loss 4.781408
iteration 1400 / 2000: loss 5.800520
iteration 1500 / 2000: loss 4.258104
iteration 1600 / 2000: loss 4.886581
iteration 1700 / 2000: loss 5.087703
iteration 1800 / 2000: loss 5.669396
iteration 1900 / 2000: loss 4.854180
iteration 0 / 2000: loss 480.526881
iteration 100 / 2000: loss 80.214405
iteration 200 / 2000: loss 17.620312
iteration 300 / 2000: loss 6.869956
iteration 400 / 2000: loss 5.935694
iteration 500 / 2000: loss 5.119147
iteration 600 / 2000: loss 5.449134
iteration 700 / 2000: loss 5.165636
iteration 800 / 2000: loss 4.769330
iteration 900 / 2000: loss 5.034217
iteration 1000 / 2000: loss 5.676286
iteration 1100 / 2000: loss 5.083310
iteration 1200 / 2000: loss 5.094056
iteration 1300 / 2000: loss 5.299983
iteration 1400 / 2000: loss 5.217546
iteration 1500 / 2000: loss 6.181078
iteration 1600 / 2000: loss 5.314310
iteration 1700 / 2000: loss 5.279560
iteration 1800 / 2000: loss 5.245941
iteration 1900 / 2000: loss 5.004309
iteration 0 / 2000: loss 637.757972
iteration 100 / 2000: loss 59.873087
iteration 200 / 2000: loss 10.548888
iteration 300 / 2000: loss 5.498387
iteration 400 / 2000: loss 5.135702
iteration 500 / 2000: loss 5.339261
iteration 600 / 2000: loss 5.694129
iteration 700 / 2000: loss 5.864000
iteration 800 / 2000: loss 5.757772
iteration 900 / 2000: loss 5.152000
iteration 1000 / 2000: loss 5.136921
iteration 1100 / 2000: loss 5.583724
iteration 1200 / 2000: loss 5.466046
iteration 1300 / 2000: loss 5.669260
iteration 1400 / 2000: loss 5.726149
iteration 1500 / 2000: loss 5.313287
iteration 1600 / 2000: loss 5.580135
iteration 1700 / 2000: loss 5.208672
iteration 1800 / 2000: loss 5.368757
iteration 1900 / 2000: loss 5.682848
iteration 0 / 2000: loss 795.767287
iteration 100 / 2000: loss 42.540993
iteration 200 / 2000: loss 6.997062
iteration 300 / 2000: loss 4.997390
iteration 400 / 2000: loss 5.870765
iteration 500 / 2000: loss 5.593953
iteration 600 / 2000: loss 5.511012
iteration 700 / 2000: loss 5.617143
iteration 800 / 2000: loss 5.733043
iteration 900 / 2000: loss 5.655669
iteration 1000 / 2000: loss 5.240021
iteration 1100 / 2000: loss 5.811073
iteration 1200 / 2000: loss 6.209830
iteration 1300 / 2000: loss 4.713314
iteration 1400 / 2000: loss 5.843977
iteration 1500 / 2000: loss 5.115616
iteration 1600 / 2000: loss 5.561892
iteration 1700 / 2000: loss 5.329966
iteration 1800 / 2000: loss 5.044552
iteration 1900 / 2000: loss 5.425612
iteration 0 / 2000: loss 944.500821
iteration 100 / 2000: loss 29.873627
iteration 200 / 2000: loss 6.171588
iteration 300 / 2000: loss 5.070434
iteration 400 / 2000: loss 5.475237
iteration 500 / 2000: loss 5.273248
iteration 600 / 2000: loss 5.039628
iteration 700 / 2000: loss 6.053193
iteration 800 / 2000: loss 5.769083
iteration 900 / 2000: loss 5.646855
iteration 1000 / 2000: loss 5.729383
iteration 1100 / 2000: loss 5.890617
iteration 1200 / 2000: loss 5.803104
iteration 1300 / 2000: loss 5.796169
iteration 1400 / 2000: loss 5.339391
iteration 1500 / 2000: loss 5.128995
iteration 1600 / 2000: loss 5.329072
iteration 1700 / 2000: loss 5.869028
iteration 1800 / 2000: loss 5.570978
iteration 1900 / 2000: loss 5.325881
iteration 0 / 2000: loss 1088.032750
iteration 100 / 2000: loss 20.897003
iteration 200 / 2000: loss 6.143261
iteration 300 / 2000: loss 5.813653
iteration 400 / 2000: loss 5.672845
iteration 500 / 2000: loss 5.033688
iteration 600 / 2000: loss 5.213662
iteration 700 / 2000: loss 5.909844
iteration 800 / 2000: loss 5.530017
iteration 900 / 2000: loss 5.336250
iteration 1000 / 2000: loss 5.791522
iteration 1100 / 2000: loss 5.597058
iteration 1200 / 2000: loss 5.310236
iteration 1300 / 2000: loss 5.556316
iteration 1400 / 2000: loss 5.100752
iteration 1500 / 2000: loss 5.273065
iteration 1600 / 2000: loss 5.278268
iteration 1700 / 2000: loss 5.684228
iteration 1800 / 2000: loss 5.711322
iteration 1900 / 2000: loss 5.460710
iteration 0 / 2000: loss 1234.980570
iteration 100 / 2000: loss 14.843332
iteration 200 / 2000: loss 5.729096
iteration 300 / 2000: loss 5.762867
iteration 400 / 2000: loss 5.686184
iteration 500 / 2000: loss 5.757040
iteration 600 / 2000: loss 5.626032
iteration 700 / 2000: loss 5.599962
iteration 800 / 2000: loss 5.992452
iteration 900 / 2000: loss 5.660717
iteration 1000 / 2000: loss 5.450823
iteration 1100 / 2000: loss 5.589898
iteration 1200 / 2000: loss 5.868794
iteration 1300 / 2000: loss 5.239661
iteration 1400 / 2000: loss 5.402339
iteration 1500 / 2000: loss 5.748235
iteration 1600 / 2000: loss 5.988135
iteration 1700 / 2000: loss 6.066850
iteration 1800 / 2000: loss 5.730659
iteration 1900 / 2000: loss 5.782011
iteration 0 / 2000: loss 1584.330689
iteration 100 / 2000: loss 9.066417
iteration 200 / 2000: loss 6.501657
iteration 300 / 2000: loss 5.735633
iteration 400 / 2000: loss 5.416980
iteration 500 / 2000: loss 5.672152
iteration 600 / 2000: loss 6.382546
iteration 700 / 2000: loss 5.736080
iteration 800 / 2000: loss 6.352848
iteration 900 / 2000: loss 6.274732
iteration 1000 / 2000: loss 5.532928
iteration 1100 / 2000: loss 5.902536
iteration 1200 / 2000: loss 5.881916
iteration 1300 / 2000: loss 5.439567
iteration 1400 / 2000: loss 6.188214
iteration 1500 / 2000: loss 5.615466
iteration 1600 / 2000: loss 5.693457
iteration 1700 / 2000: loss 6.149999
iteration 1800 / 2000: loss 5.925210
iteration 1900 / 2000: loss 6.064336
iteration 0 / 2000: loss 175.101907
iteration 100 / 2000: loss 214.909220
iteration 200 / 2000: loss 216.187148
iteration 300 / 2000: loss 316.639300
iteration 400 / 2000: loss 305.338256
iteration 500 / 2000: loss 251.308103
iteration 600 / 2000: loss 312.303165
iteration 700 / 2000: loss 247.069624
iteration 800 / 2000: loss 268.088820
iteration 900 / 2000: loss 194.299008
iteration 1000 / 2000: loss 270.935959
iteration 1100 / 2000: loss 222.634237
iteration 1200 / 2000: loss 201.650342
iteration 1300 / 2000: loss 217.743996
iteration 1400 / 2000: loss 127.553153
iteration 1500 / 2000: loss 259.770398
iteration 1600 / 2000: loss 246.222046
iteration 1700 / 2000: loss 278.331381
iteration 1800 / 2000: loss 331.915890
iteration 1900 / 2000: loss 285.584869
iteration 0 / 2000: loss 330.933560
iteration 100 / 2000: loss 524.839109
iteration 200 / 2000: loss 679.845616
iteration 300 / 2000: loss 496.125042
iteration 400 / 2000: loss 513.600413
iteration 500 / 2000: loss 642.380767
iteration 600 / 2000: loss 611.386158
iteration 700 / 2000: loss 609.769254
iteration 800 / 2000: loss 534.427959
iteration 900 / 2000: loss 568.914693
iteration 1000 / 2000: loss 515.725270
iteration 1100 / 2000: loss 572.961988
iteration 1200 / 2000: loss 586.513126
iteration 1300 / 2000: loss 547.397672
iteration 1400 / 2000: loss 441.045295
iteration 1500 / 2000: loss 538.580871
iteration 1600 / 2000: loss 530.757617
iteration 1700 / 2000: loss 655.013156
iteration 1800 / 2000: loss 557.913559
iteration 1900 / 2000: loss 365.728011
iteration 0 / 2000: loss 476.923216
iteration 100 / 2000: loss 1940.988542
iteration 200 / 2000: loss 2420.304540
iteration 300 / 2000: loss 1764.256415
iteration 400 / 2000: loss 2302.956778
iteration 500 / 2000: loss 1854.608621
iteration 600 / 2000: loss 1960.790856
iteration 700 / 2000: loss 2071.259177
iteration 800 / 2000: loss 1835.149840
iteration 900 / 2000: loss 1661.363886
iteration 1000 / 2000: loss 2076.276440
iteration 1100 / 2000: loss 2273.937113
iteration 1200 / 2000: loss 2200.307351
iteration 1300 / 2000: loss 1956.316845
iteration 1400 / 2000: loss 2030.938026
iteration 1500 / 2000: loss 1929.684860
iteration 1600 / 2000: loss 1780.966174
iteration 1700 / 2000: loss 2096.645147
iteration 1800 / 2000: loss 2276.681166
iteration 1900 / 2000: loss 1837.450763
iteration 0 / 2000: loss 635.344597
iteration 100 / 2000: loss 3945785.113740
iteration 200 / 2000: loss 15677897.982145
iteration 300 / 2000: loss 35352159.074771
iteration 400 / 2000: loss 63062362.063484
iteration 500 / 2000: loss 98314490.490589
iteration 600 / 2000: loss 142325127.641480
iteration 700 / 2000: loss 194100413.411808
iteration 800 / 2000: loss 253659498.858688
iteration 900 / 2000: loss 321723102.830869
iteration 1000 / 2000: loss 398201449.716653
iteration 1100 / 2000: loss 481637740.348903
iteration 1200 / 2000: loss 573872800.685704
iteration 1300 / 2000: loss 674297288.648116
iteration 1400 / 2000: loss 780691615.205039
iteration 1500 / 2000: loss 895507300.257710
iteration 1600 / 2000: loss 1018225452.147115
iteration 1700 / 2000: loss 1149616903.288821
iteration 1800 / 2000: loss 1289462669.349022
iteration 1900 / 2000: loss 1438409338.957622
iteration 0 / 2000: loss 787.021779
iteration 100 / 2000: loss 406712384975353085760761554470495584256.000000
iteration 200 / 2000: loss 67226299908993087280089422618799054898520502612256575756073500159791595520.000000
iteration 300 / 2000: loss 11111968964819589007763385598186878956478266359174561533602957962861867585688346012683813881089197842211274752.000000
iteration 400 / 2000: loss 1836719475001123203540812342719018669770263345218161077791882607256334738140723762391325605065180932264143999144089868910595780777039729537318912.000000
iteration 500 / 2000: loss 303595019076186908379943432986010890626962874111408564386224496757768321619745368195720400921825195747933221561285109238774271023490571163423870059377776462188783157382537367519232.000000
iteration 600 / 2000: loss 50181825184716315718256447985789892034540135752227854687404453316732768231761744564440823264936074976385872656677472960539726069220155422207870283855710482117391098992040243759551990277926694452793638692614412697600.000000
iteration 700 / 2000: loss 8294653800751205424511670772791008867320095852267680254519921226474662717972133332002186843259342523788199465195335571964442780208890807395233082842866513536167538630665669392008332707332269408949971852166004288635567373719019815389859432418190032896.000000
iteration 800 / 2000: loss 1371039842035697380988034929371132228897908036897059259706207866451999119598297075059149265559500739009305925538092638125659595940589924582131547175244598772125056083286015109831537515453473957840770214079967440766374682530881303354936142772656890576622727797324678253508458716206202880.000000
cs231n/classifiers/linear_svm.py:95: RuntimeWarning: overflow encountered in double_scalars
  loss += 0.5 * reg * np.sum(W * W)
cs231n/classifiers/linear_svm.py:95: RuntimeWarning: overflow encountered in multiply
  loss += 0.5 * reg * np.sum(W * W)
iteration 900 / 2000: loss inf
iteration 1000 / 2000: loss inf
iteration 1100 / 2000: loss inf
iteration 1200 / 2000: loss inf
iteration 1300 / 2000: loss inf
iteration 1400 / 2000: loss inf
iteration 1500 / 2000: loss inf
iteration 1600 / 2000: loss inf
iteration 1700 / 2000: loss inf
cs231n/classifiers/linear_svm.py:84: RuntimeWarning: overflow encountered in subtract
  margin = scores - correct_class_score[:, None] + 1 # (500, 10)
cs231n/classifiers/linear_svm.py:84: RuntimeWarning: invalid value encountered in subtract
  margin = scores - correct_class_score[:, None] + 1 # (500, 10)
cs231n/classifiers/linear_svm.py:90: RuntimeWarning: invalid value encountered in less
  margin[margin < 0] = 0
cs231n/classifiers/linear_svm.py:111: RuntimeWarning: invalid value encountered in greater
  binary[binary > 0] = 1 # (500, 10)
cs231n/classifiers/linear_svm.py:130: RuntimeWarning: overflow encountered in multiply
  dW += reg*W
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 942.166017
iteration 100 / 2000: loss 2330244737358079680215729557270456519545231845885573021070000128.000000
iteration 200 / 2000: loss 3744558920894997015530586096169725011571324065011678277860306973267841938170811934471736862339133768900493692445475238576128.000000
iteration 300 / 2000: loss 6017274188955561580228265154392038127691456547095568623060422427224373094335492730770986190680524526842488540652609076755720896189923841142421056057798331602065931932860145943999152128.000000
iteration 400 / 2000: loss 9669386816970352028862694846791681084643936899373481960686386105698657418832882414137480830442084538566450975604813908448080697125944915607341069028063755136389927758388947501871104577388337330723589908071784763700330290918425979892091351728128.000000
iteration 500 / 2000: loss 15538105540846000651301017651455567212051543028176282209337640726730617508981424427729370588626287129803719080184806349225204737281524791806076748421971766926143477248984727209149658237474550035661636301350501948461009430362298676286530307107909772686601180065261240259009434264114895301306389581296304128.000000
iteration 600 / 2000: loss inf
iteration 700 / 2000: loss inf
iteration 800 / 2000: loss inf
iteration 900 / 2000: loss inf
iteration 1000 / 2000: loss inf
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 1094.106332
iteration 100 / 2000: loss 51172617842149141031546417875623231856683666524595087816639808347173456838379175936.000000
iteration 200 / 2000: loss 1981706661171834168247232694726024505667382339626887153106115297078731507529038064796524564481645374449131338886650049103195173259596238985545952471238517775663104.000000
iteration 300 / 2000: loss 76743411936571867330778738341287988257684475573963116583483272463649836953704034891412239645443916358497003568084722944834870560810880927790473071437816607156458884083455547384597152245570934398328708983963587821012211780557249187434047995904.000000
iteration 400 / 2000: loss inf
iteration 500 / 2000: loss inf
iteration 600 / 2000: loss inf
iteration 700 / 2000: loss inf
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 1241.459833
iteration 100 / 2000: loss 391289451045944183908339030267352116787522578975926287298592551180999858345733245212797878755393536.000000
iteration 200 / 2000: loss 103931951897365795428208335274108222910478636727218203896300169048034254261136199012392467460628949552427459608338251669369823507674789337358995406459596281326916693249958208891199094815843155968.000000
iteration 300 / 2000: loss 27605780315115192768655590149218804814516772272150776997692126797105375015974887826632091016179128117760705043009992801886074284143453275501813665617501537322877590866895895189676023528593918463652923282622503863567569846080204653439334412904960601206319915658265168678113715765453277626368.000000
iteration 400 / 2000: loss inf
iteration 500 / 2000: loss inf
iteration 600 / 2000: loss inf
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 1544.381238
iteration 100 / 2000: loss 4208686213676437863604420202511290572486811817387373709860363810448636010527982234936791205379052974477105209561316050599936.000000
iteration 200 / 2000: loss 10867879462172034587902024401962690221763589459516231553700194397680108787715857223570725253502342286414633916338723303243213844366679735183122852149222516165146277897505670174807351716153919783611198338161721025678502069925931593859724986023936.000000
iteration 300 / 2000: loss inf
iteration 400 / 2000: loss inf
iteration 500 / 2000: loss inf
iteration 600 / 2000: loss nan
iteration 700 / 2000: loss nan
iteration 800 / 2000: loss nan
iteration 900 / 2000: loss nan
iteration 1000 / 2000: loss nan
iteration 1100 / 2000: loss nan
iteration 1200 / 2000: loss nan
iteration 1300 / 2000: loss nan
iteration 1400 / 2000: loss nan
iteration 1500 / 2000: loss nan
iteration 1600 / 2000: loss nan
iteration 1700 / 2000: loss nan
iteration 1800 / 2000: loss nan
iteration 1900 / 2000: loss nan
iteration 0 / 2000: loss 170.533915
iteration 100 / 2000: loss 35.881045
iteration 200 / 2000: loss 11.391623
iteration 300 / 2000: loss 5.732061
iteration 400 / 2000: loss 5.912371
iteration 500 / 2000: loss 5.278174
iteration 600 / 2000: loss 4.863043
iteration 700 / 2000: loss 5.884557
iteration 800 / 2000: loss 5.778189
iteration 900 / 2000: loss 4.948151
iteration 1000 / 2000: loss 5.415226
iteration 1100 / 2000: loss 5.617073
iteration 1200 / 2000: loss 5.197404
iteration 1300 / 2000: loss 4.923185
iteration 1400 / 2000: loss 5.823097
iteration 1500 / 2000: loss 5.510972
iteration 1600 / 2000: loss 5.421964
iteration 1700 / 2000: loss 5.179905
iteration 1800 / 2000: loss 5.143603
iteration 1900 / 2000: loss 4.926058
iteration 0 / 2000: loss 330.071293
iteration 100 / 2000: loss 17.546240
iteration 200 / 2000: loss 6.292885
iteration 300 / 2000: loss 5.800811
iteration 400 / 2000: loss 5.279648
iteration 500 / 2000: loss 5.618375
iteration 600 / 2000: loss 6.378680
iteration 700 / 2000: loss 5.303304
iteration 800 / 2000: loss 5.471146
iteration 900 / 2000: loss 6.136221
iteration 1000 / 2000: loss 5.806043
iteration 1100 / 2000: loss 5.364964
iteration 1200 / 2000: loss 6.029710
iteration 1300 / 2000: loss 6.055906
iteration 1400 / 2000: loss 6.661690
iteration 1500 / 2000: loss 6.141282
iteration 1600 / 2000: loss 5.465093
iteration 1700 / 2000: loss 5.443296
iteration 1800 / 2000: loss 6.381360
iteration 1900 / 2000: loss 5.871106
iteration 0 / 2000: loss 480.728364
iteration 100 / 2000: loss 9.007399
iteration 200 / 2000: loss 5.261420
iteration 300 / 2000: loss 5.920937
iteration 400 / 2000: loss 5.661759
iteration 500 / 2000: loss 6.040306
iteration 600 / 2000: loss 6.008069
iteration 700 / 2000: loss 5.810481
iteration 800 / 2000: loss 5.671051
iteration 900 / 2000: loss 5.601871
iteration 1000 / 2000: loss 6.458834
iteration 1100 / 2000: loss 5.563797
iteration 1200 / 2000: loss 5.278096
iteration 1300 / 2000: loss 6.659194
iteration 1400 / 2000: loss 5.497619
iteration 1500 / 2000: loss 6.104186
iteration 1600 / 2000: loss 7.209037
iteration 1700 / 2000: loss 6.436102
iteration 1800 / 2000: loss 6.425208
iteration 1900 / 2000: loss 5.292565
iteration 0 / 2000: loss 645.167411
iteration 100 / 2000: loss 7.452815
iteration 200 / 2000: loss 5.781177
iteration 300 / 2000: loss 5.847212
iteration 400 / 2000: loss 7.112724
iteration 500 / 2000: loss 6.579869
iteration 600 / 2000: loss 6.378128
iteration 700 / 2000: loss 7.578419
iteration 800 / 2000: loss 6.049548
iteration 900 / 2000: loss 5.031802
iteration 1000 / 2000: loss 6.210720
iteration 1100 / 2000: loss 5.341493
iteration 1200 / 2000: loss 6.149364
iteration 1300 / 2000: loss 6.531442
iteration 1400 / 2000: loss 5.923199
iteration 1500 / 2000: loss 6.236699
iteration 1600 / 2000: loss 5.536322
iteration 1700 / 2000: loss 5.565055
iteration 1800 / 2000: loss 6.457776
iteration 1900 / 2000: loss 6.277189
iteration 0 / 2000: loss 788.473493
iteration 100 / 2000: loss 6.399697
iteration 200 / 2000: loss 6.341939
iteration 300 / 2000: loss 6.385611
iteration 400 / 2000: loss 5.741864
iteration 500 / 2000: loss 5.926609
iteration 600 / 2000: loss 6.585833
iteration 700 / 2000: loss 6.127218
iteration 800 / 2000: loss 6.229681
iteration 900 / 2000: loss 5.770981
iteration 1000 / 2000: loss 6.030352
iteration 1100 / 2000: loss 5.298414
iteration 1200 / 2000: loss 7.110127
iteration 1300 / 2000: loss 6.044270
iteration 1400 / 2000: loss 6.619136
iteration 1500 / 2000: loss 6.325554
iteration 1600 / 2000: loss 6.526095
iteration 1700 / 2000: loss 5.455893
iteration 1800 / 2000: loss 5.801539
iteration 1900 / 2000: loss 6.589365
iteration 0 / 2000: loss 940.268202
iteration 100 / 2000: loss 7.205635
iteration 200 / 2000: loss 6.237630
iteration 300 / 2000: loss 6.881908
iteration 400 / 2000: loss 5.842260
iteration 500 / 2000: loss 6.182730
iteration 600 / 2000: loss 5.820889
iteration 700 / 2000: loss 6.803298
iteration 800 / 2000: loss 6.394169
iteration 900 / 2000: loss 6.122674
iteration 1000 / 2000: loss 7.521858
iteration 1100 / 2000: loss 5.975702
iteration 1200 / 2000: loss 5.852145
iteration 1300 / 2000: loss 5.923618
iteration 1400 / 2000: loss 6.169015
iteration 1500 / 2000: loss 6.518453
iteration 1600 / 2000: loss 6.562681
iteration 1700 / 2000: loss 6.476888
iteration 1800 / 2000: loss 7.099156
iteration 1900 / 2000: loss 6.525728
iteration 0 / 2000: loss 1101.717968
iteration 100 / 2000: loss 6.079356
iteration 200 / 2000: loss 6.367110
iteration 300 / 2000: loss 6.680942
iteration 400 / 2000: loss 7.033647
iteration 500 / 2000: loss 6.011012
iteration 600 / 2000: loss 5.531439
iteration 700 / 2000: loss 6.259532
iteration 800 / 2000: loss 7.554317
iteration 900 / 2000: loss 7.160292
iteration 1000 / 2000: loss 6.240054
iteration 1100 / 2000: loss 6.653245
iteration 1200 / 2000: loss 6.385453
iteration 1300 / 2000: loss 6.046964
iteration 1400 / 2000: loss 6.309649
iteration 1500 / 2000: loss 6.176947
iteration 1600 / 2000: loss 6.914073
iteration 1700 / 2000: loss 6.461597
iteration 1800 / 2000: loss 6.364832
iteration 1900 / 2000: loss 6.538632
iteration 0 / 2000: loss 1254.373779
iteration 100 / 2000: loss 7.106340
iteration 200 / 2000: loss 6.187885
iteration 300 / 2000: loss 6.526955
iteration 400 / 2000: loss 7.226965
iteration 500 / 2000: loss 6.654286
iteration 600 / 2000: loss 6.062488
iteration 700 / 2000: loss 6.529577
iteration 800 / 2000: loss 6.087311
iteration 900 / 2000: loss 6.798635
iteration 1000 / 2000: loss 6.317358
iteration 1100 / 2000: loss 6.663275
iteration 1200 / 2000: loss 5.917112
iteration 1300 / 2000: loss 6.097815
iteration 1400 / 2000: loss 5.820000
iteration 1500 / 2000: loss 6.282774
iteration 1600 / 2000: loss 6.555558
iteration 1700 / 2000: loss 6.386176
iteration 1800 / 2000: loss 6.944238
iteration 1900 / 2000: loss 6.225370
iteration 0 / 2000: loss 1552.962830
iteration 100 / 2000: loss 6.616761
iteration 200 / 2000: loss 6.589412
iteration 300 / 2000: loss 7.347998
iteration 400 / 2000: loss 6.838567
iteration 500 / 2000: loss 6.527761
iteration 600 / 2000: loss 6.928134
iteration 700 / 2000: loss 6.914308
iteration 800 / 2000: loss 6.692904
iteration 900 / 2000: loss 6.233480
iteration 1000 / 2000: loss 6.336893
iteration 1100 / 2000: loss 6.440131
iteration 1200 / 2000: loss 6.957904
iteration 1300 / 2000: loss 6.719182
iteration 1400 / 2000: loss 6.841457
iteration 1500 / 2000: loss 6.974108
iteration 1600 / 2000: loss 6.032128
iteration 1700 / 2000: loss 7.059088
iteration 1800 / 2000: loss 6.275041
iteration 1900 / 2000: loss 7.732075
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.385980 val accuracy: 0.393000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.385082 val accuracy: 0.385000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.379918 val accuracy: 0.389000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.372143 val accuracy: 0.368000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.370898 val accuracy: 0.386000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.363551 val accuracy: 0.381000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.365816 val accuracy: 0.373000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.357224 val accuracy: 0.368000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.355245 val accuracy: 0.371000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.378857 val accuracy: 0.385000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.378510 val accuracy: 0.368000
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.371918 val accuracy: 0.370000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.367796 val accuracy: 0.378000
lr 2.000000e-07 reg 5.000000e+04 train accuracy: 0.362653 val accuracy: 0.370000
lr 2.000000e-07 reg 6.000000e+04 train accuracy: 0.366388 val accuracy: 0.378000
lr 2.000000e-07 reg 7.000000e+04 train accuracy: 0.352122 val accuracy: 0.352000
lr 2.000000e-07 reg 8.000000e+04 train accuracy: 0.360102 val accuracy: 0.354000
lr 2.000000e-07 reg 1.000000e+05 train accuracy: 0.342347 val accuracy: 0.342000
lr 3.000000e-07 reg 1.000000e+04 train accuracy: 0.369245 val accuracy: 0.367000
lr 3.000000e-07 reg 2.000000e+04 train accuracy: 0.371408 val accuracy: 0.378000
lr 3.000000e-07 reg 3.000000e+04 train accuracy: 0.364082 val accuracy: 0.367000
lr 3.000000e-07 reg 4.000000e+04 train accuracy: 0.360490 val accuracy: 0.388000
lr 3.000000e-07 reg 5.000000e+04 train accuracy: 0.350102 val accuracy: 0.346000
lr 3.000000e-07 reg 6.000000e+04 train accuracy: 0.347367 val accuracy: 0.354000
lr 3.000000e-07 reg 7.000000e+04 train accuracy: 0.337633 val accuracy: 0.339000
lr 3.000000e-07 reg 8.000000e+04 train accuracy: 0.331959 val accuracy: 0.345000
lr 3.000000e-07 reg 1.000000e+05 train accuracy: 0.343367 val accuracy: 0.350000
lr 8.000000e-07 reg 1.000000e+04 train accuracy: 0.326490 val accuracy: 0.325000
lr 8.000000e-07 reg 2.000000e+04 train accuracy: 0.326571 val accuracy: 0.341000
lr 8.000000e-07 reg 3.000000e+04 train accuracy: 0.324469 val accuracy: 0.350000
lr 8.000000e-07 reg 4.000000e+04 train accuracy: 0.315673 val accuracy: 0.333000
lr 8.000000e-07 reg 5.000000e+04 train accuracy: 0.292673 val accuracy: 0.310000
lr 8.000000e-07 reg 6.000000e+04 train accuracy: 0.311367 val accuracy: 0.332000
lr 8.000000e-07 reg 7.000000e+04 train accuracy: 0.336918 val accuracy: 0.347000
lr 8.000000e-07 reg 8.000000e+04 train accuracy: 0.287224 val accuracy: 0.296000
lr 8.000000e-07 reg 1.000000e+05 train accuracy: 0.309776 val accuracy: 0.342000
lr 5.000000e-05 reg 1.000000e+04 train accuracy: 0.160898 val accuracy: 0.153000
lr 5.000000e-05 reg 2.000000e+04 train accuracy: 0.153224 val accuracy: 0.156000
lr 5.000000e-05 reg 3.000000e+04 train accuracy: 0.167429 val accuracy: 0.175000
lr 5.000000e-05 reg 4.000000e+04 train accuracy: 0.049673 val accuracy: 0.046000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 6.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 7.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 8.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
best validation accuracy achieved during cross-validation: 0.393000

In [20]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()



In [21]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy


linear SVM on raw pixels final test set accuracy: 0.372000

In [22]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
    
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in