Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

  • implement a fully-vectorized loss function for the Softmax classifier
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation with numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [1]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

In [2]:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
  """
  Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
  it for the linear classifier. These are the same steps as we used for the
  SVM, but condensed to a single function.  
  """
  # Load the raw CIFAR-10 data
  cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
  X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
  
  # subsample the data
  mask = range(num_training, num_training + num_validation)
  X_val = X_train[mask]
  y_val = y_train[mask]
  mask = range(num_training)
  X_train = X_train[mask]
  y_train = y_train[mask]
  mask = range(num_test)
  X_test = X_test[mask]
  y_test = y_test[mask]
  
  # Preprocessing: reshape the image data into rows
  X_train = np.reshape(X_train, (X_train.shape[0], -1))
  X_val = np.reshape(X_val, (X_val.shape[0], -1))
  X_test = np.reshape(X_test, (X_test.shape[0], -1))
  
  # Normalize the data: subtract the mean image
  mean_image = np.mean(X_train, axis = 0)
  X_train -= mean_image
  X_val -= mean_image
  X_test -= mean_image
  
  # add bias dimension and transform into columns
  X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))]).T
  X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))]).T
  X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))]).T
  
  return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (3073, 49000)
Train labels shape:  (49000,)
Validation data shape:  (3073, 1000)
Validation labels shape:  (1000,)
Test data shape:  (3073, 1000)
Test labels shape:  (1000,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.


In [3]:
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(10, 3073) * 0.0001
loss, grad = softmax_loss_naive(W, X_train, y_train, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print 'loss: %f' % loss
print 'sanity check: %f' % (-np.log(0.1))


loss: 2.326010
sanity check: 2.302585

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: We have 10 classes. By the probabilistic cross-entropy loss interpretation of Softmax, we expect the average probability of each class to be ~0.1 (they are all a priori just as likely). We're using the -log of this as our loss.


In [4]:
# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_train, y_train, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_train, y_train, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)


numerical: 2.277232 analytic: 2.277232, relative error: 1.402498e-08
numerical: -2.255556 analytic: -2.255556, relative error: 9.135667e-10
numerical: 0.804586 analytic: 0.804586, relative error: 9.264338e-08
numerical: -0.545294 analytic: -0.545294, relative error: 4.645225e-08
numerical: -0.300083 analytic: -0.300083, relative error: 8.303649e-08
numerical: 0.868097 analytic: 0.868097, relative error: 9.635123e-09
numerical: -0.748742 analytic: -0.748742, relative error: 2.846529e-08
numerical: -0.489089 analytic: -0.489089, relative error: 1.674468e-09
numerical: -0.754321 analytic: -0.754321, relative error: 2.906739e-08
numerical: -0.039133 analytic: -0.039133, relative error: 9.652433e-07

In [5]:
# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_train, y_train, 0.00001)
toc = time.time()
print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_train, y_train, 0.00001)
toc = time.time()
print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
print 'Gradient difference: %f' % grad_difference


naive loss: 2.326010e+00 computed in 10.461693s
vectorized loss: 1.313062e+01 computed in 0.300150s
Loss difference: 10.804607
Gradient difference: 0.000000

In [6]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = np.logspace(-10, 10, 10) # np.logspace(-10, 10, 8) #-10, -9, -8, -7, -6, -5, -4
regularization_strengths = np.logspace(-3, 6, 10) # causes numeric issues: np.logspace(-5, 5, 8) #[-4, -3, -2, -1, 1, 2, 3, 4, 5, 6]

################################################################################
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
iters = 2000 #100
for lr in learning_rates:
    for rs in regularization_strengths:
        softmax = Softmax()
        softmax.train(X_train, y_train, learning_rate=lr, reg=rs, num_iters=iters)
        
        y_train_pred = softmax.predict(X_train)
        acc_train = np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        acc_val = np.mean(y_val == y_val_pred)
        
        results[(lr, rs)] = (acc_train, acc_val)
        
        if best_val < acc_val:
            best_val = acc_val
            best_softmax = softmax
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


lr 1.000000e-10 reg 1.000000e-03 train accuracy: 0.117408 val accuracy: 0.116000
lr 1.000000e-10 reg 1.000000e-02 train accuracy: 0.068204 val accuracy: 0.069000
lr 1.000000e-10 reg 1.000000e-01 train accuracy: 0.129490 val accuracy: 0.138000
lr 1.000000e-10 reg 1.000000e+00 train accuracy: 0.100714 val accuracy: 0.108000
lr 1.000000e-10 reg 1.000000e+01 train accuracy: 0.073959 val accuracy: 0.075000
lr 1.000000e-10 reg 1.000000e+02 train accuracy: 0.099204 val accuracy: 0.115000
lr 1.000000e-10 reg 1.000000e+03 train accuracy: 0.123265 val accuracy: 0.135000
lr 1.000000e-10 reg 1.000000e+04 train accuracy: 0.094735 val accuracy: 0.107000
lr 1.000000e-10 reg 1.000000e+05 train accuracy: 0.091612 val accuracy: 0.090000
lr 1.000000e-10 reg 1.000000e+06 train accuracy: 0.083327 val accuracy: 0.080000
lr 1.668101e-08 reg 1.000000e-03 train accuracy: 0.185653 val accuracy: 0.217000
lr 1.668101e-08 reg 1.000000e-02 train accuracy: 0.191673 val accuracy: 0.194000
lr 1.668101e-08 reg 1.000000e-01 train accuracy: 0.200306 val accuracy: 0.225000
lr 1.668101e-08 reg 1.000000e+00 train accuracy: 0.176510 val accuracy: 0.179000
lr 1.668101e-08 reg 1.000000e+01 train accuracy: 0.174143 val accuracy: 0.180000
lr 1.668101e-08 reg 1.000000e+02 train accuracy: 0.182143 val accuracy: 0.178000
lr 1.668101e-08 reg 1.000000e+03 train accuracy: 0.183714 val accuracy: 0.181000
lr 1.668101e-08 reg 1.000000e+04 train accuracy: 0.198776 val accuracy: 0.212000
lr 1.668101e-08 reg 1.000000e+05 train accuracy: 0.298020 val accuracy: 0.311000
lr 1.668101e-08 reg 1.000000e+06 train accuracy: 0.251367 val accuracy: 0.261000
lr 2.782559e-06 reg 1.000000e-03 train accuracy: 0.390571 val accuracy: 0.368000
lr 2.782559e-06 reg 1.000000e-02 train accuracy: 0.398082 val accuracy: 0.362000
lr 2.782559e-06 reg 1.000000e-01 train accuracy: 0.390714 val accuracy: 0.360000
lr 2.782559e-06 reg 1.000000e+00 train accuracy: 0.385694 val accuracy: 0.340000
lr 2.782559e-06 reg 1.000000e+01 train accuracy: 0.400653 val accuracy: 0.380000
lr 2.782559e-06 reg 1.000000e+02 train accuracy: 0.417082 val accuracy: 0.387000
lr 2.782559e-06 reg 1.000000e+03 train accuracy: 0.394020 val accuracy: 0.379000
lr 2.782559e-06 reg 1.000000e+04 train accuracy: 0.351980 val accuracy: 0.360000
lr 2.782559e-06 reg 1.000000e+05 train accuracy: 0.274388 val accuracy: 0.277000
lr 2.782559e-06 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 4.641589e-04 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 7.742637e-02 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.291550e+01 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 2.154435e+03 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 3.593814e+05 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 5.994843e+07 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e-03 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e-02 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e-01 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+00 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+01 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+02 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+03 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+05 train accuracy: 0.100265 val accuracy: 0.087000
lr 1.000000e+10 reg 1.000000e+06 train accuracy: 0.100265 val accuracy: 0.087000
best validation accuracy achieved during cross-validation: 0.387000
cs231n/classifiers/softmax.py:95: RuntimeWarning: divide by zero encountered in log
  loss = -np.mean( np.log(np.exp(f_correct)/np.sum(np.exp(f))) )
cs231n/classifiers/softmax.py:98: RuntimeWarning: invalid value encountered in divide
  p = np.exp(f)/np.sum(np.exp(f), axis=0)

In [9]:
# evaluate on test set
# Evaluate the best svm on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )


softmax on raw pixels final test set accuracy: 0.384000

In [10]:
# Visualize the learned weights for each class
w = best_softmax.W[:,:-1] # strip out the bias
w = w.reshape(10, 32, 32, 3)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
  
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])



In [8]:


In [ ]: