Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

  • implement a fully-vectorized loss function for the Softmax classifier
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation with numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

In [2]:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
  """
  Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
  it for the linear classifier. These are the same steps as we used for the
  SVM, but condensed to a single function.  
  """
  # Load the raw CIFAR-10 data
  cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
  X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
  
  # subsample the data
  mask = range(num_training, num_training + num_validation)
  X_val = X_train[mask]
  y_val = y_train[mask]
  mask = range(num_training)
  X_train = X_train[mask]
  y_train = y_train[mask]
  mask = range(num_test)
  X_test = X_test[mask]
  y_test = y_test[mask]
  mask = np.random.choice(num_training, num_dev, replace=False)
  X_dev = X_train[mask]
  y_dev = y_train[mask]
  
  # Preprocessing: reshape the image data into rows
  X_train = np.reshape(X_train, (X_train.shape[0], -1))
  X_val = np.reshape(X_val, (X_val.shape[0], -1))
  X_test = np.reshape(X_test, (X_test.shape[0], -1))
  X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
  
  # Normalize the data: subtract the mean image
  mean_image = np.mean(X_train, axis = 0)
  X_train -= mean_image
  X_val -= mean_image
  X_test -= mean_image
  X_dev -= mean_image
  
  # add bias dimension and transform into columns
  X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
  X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
  X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
  X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
  
  return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
print 'dev data shape: ', X_dev.shape
print 'dev labels shape: ', y_dev.shape


Train data shape:  (49000L, 3073L)
Train labels shape:  (49000L,)
Validation data shape:  (1000L, 3073L)
Validation labels shape:  (1000L,)
Test data shape:  (1000L, 3073L)
Test labels shape:  (1000L,)
dev data shape:  (500L, 3073L)
dev labels shape:  (500L,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.


In [10]:
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print 'loss: %f' % loss
print 'sanity check: %f' % (-np.log(0.1))


loss: 2.374535
sanity check: 2.302585

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer:

We have 10 classes and that is why the probabilty should be somewhere around 1/10 which is 0.1 hence the log loss -log(0.1)


In [11]:
# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)


numerical: -2.157108 analytic: -2.157108, relative error: 5.541218e-09
numerical: -0.599390 analytic: -0.599390, relative error: 8.755965e-09
numerical: 2.839557 analytic: 2.839557, relative error: 3.628772e-08
numerical: -1.914480 analytic: -1.914480, relative error: 8.217373e-09
numerical: -1.363861 analytic: -1.363861, relative error: 7.694770e-09
numerical: -0.838559 analytic: -0.838559, relative error: 1.000825e-07
numerical: -2.653345 analytic: -2.653345, relative error: 7.485435e-10
numerical: 0.361926 analytic: 0.361926, relative error: 7.845319e-09
numerical: 3.034460 analytic: 3.034460, relative error: 2.725330e-09
numerical: 1.007739 analytic: 1.007739, relative error: 3.466655e-09
numerical: 4.396009 analytic: 4.396846, relative error: 9.510850e-05
numerical: -0.569818 analytic: -0.569258, relative error: 4.912697e-04
numerical: 1.424515 analytic: 1.428835, relative error: 1.514132e-03
numerical: -1.167395 analytic: -1.175531, relative error: 3.472728e-03
numerical: -0.101537 analytic: -0.086950, relative error: 7.739157e-02
numerical: 2.732516 analytic: 2.730698, relative error: 3.328181e-04
numerical: -0.966960 analytic: -0.972946, relative error: 3.086097e-03
numerical: 2.390397 analytic: 2.403653, relative error: 2.765064e-03
numerical: -2.811307 analytic: -2.815803, relative error: 7.988267e-04
numerical: 3.084682 analytic: 3.076607, relative error: 1.310615e-03

In [12]:
# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
print 'Gradient difference: %f' % grad_difference


naive loss: 2.374535e+00 computed in 0.058000s
vectorized loss: 2.374535e+00 computed in 0.005000s
Loss difference: 0.000000
Gradient difference: 0.000000

In [14]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 2e-7, 3e-7, 5e-5, 8e-7]
regularization_strengths = [1e4, 2e4, 3e4, 4e4, 5e4, 6e4, 7e4, 8e4, 1e5]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
for learning_rate in learning_rates:
    for regularization_strength  in regularization_strengths:
        softmax_classifier = Softmax()
        softmax_classifier.train(X_train, y_train, learning_rate= learning_rate, reg=regularization_strength,
                      num_iters=1500)
        y_train_predict = softmax_classifier.predict(X_train)
        y_val_predict = softmax_classifier.predict(X_val)
        accuracy_train = np.mean(y_train_predict == y_train)
        accuracy_validation = np.mean(y_val_predict == y_val)
        results[(learning_rate,regularization_strength)] = (accuracy_train,accuracy_validation)
        if accuracy_validation > best_val:
            best_val = accuracy_validation
            best_softmax = softmax_classifier

################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


cs231n\classifiers\softmax.py:99: RuntimeWarning: divide by zero encountered in log
cs231n\classifiers\softmax.py:101: RuntimeWarning: overflow encountered in double_scalars
cs231n\classifiers\softmax.py:101: RuntimeWarning: overflow encountered in multiply
cs231n\classifiers\softmax.py:95: RuntimeWarning: overflow encountered in exp
cs231n\classifiers\softmax.py:93: RuntimeWarning: overflow encountered in subtract
cs231n\classifiers\softmax.py:93: RuntimeWarning: invalid value encountered in subtract
cs231n\classifiers\softmax.py:109: RuntimeWarning: overflow encountered in multiply
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.336592 val accuracy: 0.338000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.354612 val accuracy: 0.361000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.343245 val accuracy: 0.358000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.336776 val accuracy: 0.348000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.333224 val accuracy: 0.341000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.326980 val accuracy: 0.340000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.313490 val accuracy: 0.335000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.320592 val accuracy: 0.334000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.310041 val accuracy: 0.329000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.373020 val accuracy: 0.384000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.357286 val accuracy: 0.372000
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.347510 val accuracy: 0.358000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.339265 val accuracy: 0.361000
lr 2.000000e-07 reg 5.000000e+04 train accuracy: 0.333918 val accuracy: 0.350000
lr 2.000000e-07 reg 6.000000e+04 train accuracy: 0.316755 val accuracy: 0.331000
lr 2.000000e-07 reg 7.000000e+04 train accuracy: 0.310347 val accuracy: 0.335000
lr 2.000000e-07 reg 8.000000e+04 train accuracy: 0.308408 val accuracy: 0.328000
lr 2.000000e-07 reg 1.000000e+05 train accuracy: 0.307082 val accuracy: 0.320000
lr 3.000000e-07 reg 1.000000e+04 train accuracy: 0.374490 val accuracy: 0.391000
lr 3.000000e-07 reg 2.000000e+04 train accuracy: 0.358939 val accuracy: 0.376000
lr 3.000000e-07 reg 3.000000e+04 train accuracy: 0.344327 val accuracy: 0.346000
lr 3.000000e-07 reg 4.000000e+04 train accuracy: 0.336469 val accuracy: 0.347000
lr 3.000000e-07 reg 5.000000e+04 train accuracy: 0.337469 val accuracy: 0.349000
lr 3.000000e-07 reg 6.000000e+04 train accuracy: 0.318143 val accuracy: 0.333000
lr 3.000000e-07 reg 7.000000e+04 train accuracy: 0.311286 val accuracy: 0.328000
lr 3.000000e-07 reg 8.000000e+04 train accuracy: 0.305102 val accuracy: 0.324000
lr 3.000000e-07 reg 1.000000e+05 train accuracy: 0.303429 val accuracy: 0.318000
lr 8.000000e-07 reg 1.000000e+04 train accuracy: 0.366184 val accuracy: 0.373000
lr 8.000000e-07 reg 2.000000e+04 train accuracy: 0.353918 val accuracy: 0.373000
lr 8.000000e-07 reg 3.000000e+04 train accuracy: 0.346000 val accuracy: 0.350000
lr 8.000000e-07 reg 4.000000e+04 train accuracy: 0.322449 val accuracy: 0.347000
lr 8.000000e-07 reg 5.000000e+04 train accuracy: 0.319837 val accuracy: 0.332000
lr 8.000000e-07 reg 6.000000e+04 train accuracy: 0.310122 val accuracy: 0.322000
lr 8.000000e-07 reg 7.000000e+04 train accuracy: 0.305367 val accuracy: 0.322000
lr 8.000000e-07 reg 8.000000e+04 train accuracy: 0.299367 val accuracy: 0.306000
lr 8.000000e-07 reg 1.000000e+05 train accuracy: 0.297020 val accuracy: 0.310000
lr 5.000000e-05 reg 1.000000e+04 train accuracy: 0.144041 val accuracy: 0.120000
lr 5.000000e-05 reg 2.000000e+04 train accuracy: 0.066429 val accuracy: 0.061000
lr 5.000000e-05 reg 3.000000e+04 train accuracy: 0.096653 val accuracy: 0.086000
lr 5.000000e-05 reg 4.000000e+04 train accuracy: 0.075980 val accuracy: 0.087000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.081449 val accuracy: 0.088000
lr 5.000000e-05 reg 6.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 7.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 8.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.000000e-05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
best validation accuracy achieved during cross-validation: 0.391000

In [15]:
# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )


softmax on raw pixels final test set accuracy: 0.374000

In [16]:
# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
  
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])


 
This time the shapes are much better and some how clearly visible

In [ ]: