Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

  • implement a fully-vectorized loss function for the Softmax classifier
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation with numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [3]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [4]:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the linear classifier. These are the same steps as we used for the
    SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    mask = np.random.choice(num_training, num_dev, replace=False)
    X_dev = X_train[mask]
    y_dev = y_train[mask]
    
    # Preprocessing: reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    
    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis = 0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image
    
    # add bias dimension and transform into columns
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)


Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.


In [5]:
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))


loss: 2.398292
sanity check: 2.302585

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: we have 10 classes, therefore we expect the probability of each class to be ~1/10 = 0.1. We use -log per the softmax loss definition.


In [15]:
# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)


numerical: -3.858420 analytic: -3.858420, relative error: 6.329539e-09
numerical: 0.073088 analytic: 0.073088, relative error: 5.330130e-08
numerical: 3.714374 analytic: 3.714373, relative error: 8.440832e-09
numerical: 4.350307 analytic: 4.350307, relative error: 1.940116e-08
numerical: 3.138300 analytic: 3.138300, relative error: 7.742988e-09
numerical: 2.309959 analytic: 2.309958, relative error: 4.247350e-08
numerical: 1.651914 analytic: 1.651914, relative error: 9.894849e-09
numerical: 1.424464 analytic: 1.424464, relative error: 3.024481e-08
numerical: 0.476611 analytic: 0.476611, relative error: 3.511041e-08
numerical: 1.799707 analytic: 1.799707, relative error: 2.685741e-08
numerical: -0.223454 analytic: -0.223454, relative error: 4.910353e-08
numerical: -1.809537 analytic: -1.809536, relative error: 7.672139e-09
numerical: -3.255154 analytic: -3.255154, relative error: 1.119983e-08
numerical: 0.635473 analytic: 0.635473, relative error: 1.352214e-08
numerical: 0.733640 analytic: 0.733640, relative error: 1.073086e-08
numerical: 0.774970 analytic: 0.774970, relative error: 2.118625e-08
numerical: 1.456021 analytic: 1.456020, relative error: 5.152788e-08
numerical: -2.439088 analytic: -2.439088, relative error: 1.186395e-08
numerical: -4.385681 analytic: -4.385681, relative error: 2.557418e-09
numerical: -1.333349 analytic: -1.333349, relative error: 1.659926e-08

In [29]:
# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))
print('Gradient difference: %f' % grad_difference)


naive loss: 2.398292e+00 computed in 0.226072s
vectorized loss: 2.398292e+00 computed in 0.009082s
Loss difference: 0.000000
Gradient difference: 0.000000

In [33]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
iters = 2000

learning_rates = list(map(lambda x: x*1e-7, np.arange(0.9, 2, 0.1)))
regularization_strengths = list(map(lambda x: x*1e4, np.arange(1, 10)))

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
for lr in learning_rates:
    for reg in regularization_strengths:
        print('Training with lr={0}, reg={1}'.format(lr, reg))
        softmax = Softmax()
        loss_hist = softmax.train(X_train, y_train, learning_rate=lr, reg=reg, num_iters=iters)
        y_train_pred = softmax.predict(X_train)
        y_val_pred = softmax.predict(X_val)
        train_accuracy = np.mean(y_train == y_train_pred)
        validation_accuracy = np.mean(y_val == y_val_pred)
        if validation_accuracy > best_val:
            best_val = validation_accuracy
            best_softmax = softmax
        results[(lr, reg)] = (train_accuracy, validation_accuracy)
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)


Training with lr=9e-08, reg=10000.0
Training with lr=9e-08, reg=20000.0
Training with lr=9e-08, reg=30000.0
Training with lr=9e-08, reg=40000.0
Training with lr=9e-08, reg=50000.0
Training with lr=9e-08, reg=60000.0
Training with lr=9e-08, reg=70000.0
Training with lr=9e-08, reg=80000.0
Training with lr=9e-08, reg=90000.0
Training with lr=1e-07, reg=10000.0
Training with lr=1e-07, reg=20000.0
Training with lr=1e-07, reg=30000.0
Training with lr=1e-07, reg=40000.0
Training with lr=1e-07, reg=50000.0
Training with lr=1e-07, reg=60000.0
Training with lr=1e-07, reg=70000.0
Training with lr=1e-07, reg=80000.0
Training with lr=1e-07, reg=90000.0
Training with lr=1.1e-07, reg=10000.0
Training with lr=1.1e-07, reg=20000.0
Training with lr=1.1e-07, reg=30000.0
Training with lr=1.1e-07, reg=40000.0
Training with lr=1.1e-07, reg=50000.0
Training with lr=1.1e-07, reg=60000.0
Training with lr=1.1e-07, reg=70000.0
Training with lr=1.1e-07, reg=80000.0
Training with lr=1.1e-07, reg=90000.0
Training with lr=1.2e-07, reg=10000.0
Training with lr=1.2e-07, reg=20000.0
Training with lr=1.2e-07, reg=30000.0
Training with lr=1.2e-07, reg=40000.0
Training with lr=1.2e-07, reg=50000.0
Training with lr=1.2e-07, reg=60000.0
Training with lr=1.2e-07, reg=70000.0
Training with lr=1.2e-07, reg=80000.0
Training with lr=1.2e-07, reg=90000.0
Training with lr=1.2999999999999997e-07, reg=10000.0
Training with lr=1.2999999999999997e-07, reg=20000.0
Training with lr=1.2999999999999997e-07, reg=30000.0
Training with lr=1.2999999999999997e-07, reg=40000.0
Training with lr=1.2999999999999997e-07, reg=50000.0
Training with lr=1.2999999999999997e-07, reg=60000.0
Training with lr=1.2999999999999997e-07, reg=70000.0
Training with lr=1.2999999999999997e-07, reg=80000.0
Training with lr=1.2999999999999997e-07, reg=90000.0
Training with lr=1.3999999999999998e-07, reg=10000.0
Training with lr=1.3999999999999998e-07, reg=20000.0
Training with lr=1.3999999999999998e-07, reg=30000.0
Training with lr=1.3999999999999998e-07, reg=40000.0
Training with lr=1.3999999999999998e-07, reg=50000.0
Training with lr=1.3999999999999998e-07, reg=60000.0
Training with lr=1.3999999999999998e-07, reg=70000.0
Training with lr=1.3999999999999998e-07, reg=80000.0
Training with lr=1.3999999999999998e-07, reg=90000.0
Training with lr=1.5e-07, reg=10000.0
Training with lr=1.5e-07, reg=20000.0
Training with lr=1.5e-07, reg=30000.0
Training with lr=1.5e-07, reg=40000.0
Training with lr=1.5e-07, reg=50000.0
Training with lr=1.5e-07, reg=60000.0
Training with lr=1.5e-07, reg=70000.0
Training with lr=1.5e-07, reg=80000.0
Training with lr=1.5e-07, reg=90000.0
Training with lr=1.5999999999999998e-07, reg=10000.0
Training with lr=1.5999999999999998e-07, reg=20000.0
Training with lr=1.5999999999999998e-07, reg=30000.0
Training with lr=1.5999999999999998e-07, reg=40000.0
Training with lr=1.5999999999999998e-07, reg=50000.0
Training with lr=1.5999999999999998e-07, reg=60000.0
Training with lr=1.5999999999999998e-07, reg=70000.0
Training with lr=1.5999999999999998e-07, reg=80000.0
Training with lr=1.5999999999999998e-07, reg=90000.0
Training with lr=1.6999999999999996e-07, reg=10000.0
Training with lr=1.6999999999999996e-07, reg=20000.0
Training with lr=1.6999999999999996e-07, reg=30000.0
Training with lr=1.6999999999999996e-07, reg=40000.0
Training with lr=1.6999999999999996e-07, reg=50000.0
Training with lr=1.6999999999999996e-07, reg=60000.0
Training with lr=1.6999999999999996e-07, reg=70000.0
Training with lr=1.6999999999999996e-07, reg=80000.0
Training with lr=1.6999999999999996e-07, reg=90000.0
Training with lr=1.7999999999999997e-07, reg=10000.0
Training with lr=1.7999999999999997e-07, reg=20000.0
Training with lr=1.7999999999999997e-07, reg=30000.0
Training with lr=1.7999999999999997e-07, reg=40000.0
Training with lr=1.7999999999999997e-07, reg=50000.0
Training with lr=1.7999999999999997e-07, reg=60000.0
Training with lr=1.7999999999999997e-07, reg=70000.0
Training with lr=1.7999999999999997e-07, reg=80000.0
Training with lr=1.7999999999999997e-07, reg=90000.0
Training with lr=1.8999999999999998e-07, reg=10000.0
Training with lr=1.8999999999999998e-07, reg=20000.0
Training with lr=1.8999999999999998e-07, reg=30000.0
Training with lr=1.8999999999999998e-07, reg=40000.0
Training with lr=1.8999999999999998e-07, reg=50000.0
Training with lr=1.8999999999999998e-07, reg=60000.0
Training with lr=1.8999999999999998e-07, reg=70000.0
Training with lr=1.8999999999999998e-07, reg=80000.0
Training with lr=1.8999999999999998e-07, reg=90000.0
lr 9.000000e-08 reg 1.000000e+04 train accuracy: 0.351449 val accuracy: 0.361000
lr 9.000000e-08 reg 2.000000e+04 train accuracy: 0.355490 val accuracy: 0.362000
lr 9.000000e-08 reg 3.000000e+04 train accuracy: 0.347694 val accuracy: 0.356000
lr 9.000000e-08 reg 4.000000e+04 train accuracy: 0.339388 val accuracy: 0.355000
lr 9.000000e-08 reg 5.000000e+04 train accuracy: 0.333429 val accuracy: 0.350000
lr 9.000000e-08 reg 6.000000e+04 train accuracy: 0.317735 val accuracy: 0.339000
lr 9.000000e-08 reg 7.000000e+04 train accuracy: 0.323020 val accuracy: 0.334000
lr 9.000000e-08 reg 8.000000e+04 train accuracy: 0.311796 val accuracy: 0.326000
lr 9.000000e-08 reg 9.000000e+04 train accuracy: 0.307061 val accuracy: 0.323000
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.355694 val accuracy: 0.362000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.357653 val accuracy: 0.375000
lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.343959 val accuracy: 0.355000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.333857 val accuracy: 0.353000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.320714 val accuracy: 0.329000
lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.323837 val accuracy: 0.338000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.317878 val accuracy: 0.329000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.307959 val accuracy: 0.326000
lr 1.000000e-07 reg 9.000000e+04 train accuracy: 0.311837 val accuracy: 0.330000
lr 1.100000e-07 reg 1.000000e+04 train accuracy: 0.360143 val accuracy: 0.366000
lr 1.100000e-07 reg 2.000000e+04 train accuracy: 0.362306 val accuracy: 0.369000
lr 1.100000e-07 reg 3.000000e+04 train accuracy: 0.348367 val accuracy: 0.360000
lr 1.100000e-07 reg 4.000000e+04 train accuracy: 0.334612 val accuracy: 0.350000
lr 1.100000e-07 reg 5.000000e+04 train accuracy: 0.328490 val accuracy: 0.346000
lr 1.100000e-07 reg 6.000000e+04 train accuracy: 0.326245 val accuracy: 0.342000
lr 1.100000e-07 reg 7.000000e+04 train accuracy: 0.322735 val accuracy: 0.330000
lr 1.100000e-07 reg 8.000000e+04 train accuracy: 0.317857 val accuracy: 0.323000
lr 1.100000e-07 reg 9.000000e+04 train accuracy: 0.311755 val accuracy: 0.327000
lr 1.200000e-07 reg 1.000000e+04 train accuracy: 0.367837 val accuracy: 0.374000
lr 1.200000e-07 reg 2.000000e+04 train accuracy: 0.357082 val accuracy: 0.375000
lr 1.200000e-07 reg 3.000000e+04 train accuracy: 0.350082 val accuracy: 0.367000
lr 1.200000e-07 reg 4.000000e+04 train accuracy: 0.332204 val accuracy: 0.345000
lr 1.200000e-07 reg 5.000000e+04 train accuracy: 0.331816 val accuracy: 0.351000
lr 1.200000e-07 reg 6.000000e+04 train accuracy: 0.328143 val accuracy: 0.335000
lr 1.200000e-07 reg 7.000000e+04 train accuracy: 0.314531 val accuracy: 0.328000
lr 1.200000e-07 reg 8.000000e+04 train accuracy: 0.316122 val accuracy: 0.337000
lr 1.200000e-07 reg 9.000000e+04 train accuracy: 0.313735 val accuracy: 0.333000
lr 1.300000e-07 reg 1.000000e+04 train accuracy: 0.370020 val accuracy: 0.384000
lr 1.300000e-07 reg 2.000000e+04 train accuracy: 0.353490 val accuracy: 0.367000
lr 1.300000e-07 reg 3.000000e+04 train accuracy: 0.348041 val accuracy: 0.365000
lr 1.300000e-07 reg 4.000000e+04 train accuracy: 0.337490 val accuracy: 0.353000
lr 1.300000e-07 reg 5.000000e+04 train accuracy: 0.330653 val accuracy: 0.349000
lr 1.300000e-07 reg 6.000000e+04 train accuracy: 0.325878 val accuracy: 0.344000
lr 1.300000e-07 reg 7.000000e+04 train accuracy: 0.320041 val accuracy: 0.332000
lr 1.300000e-07 reg 8.000000e+04 train accuracy: 0.314224 val accuracy: 0.330000
lr 1.300000e-07 reg 9.000000e+04 train accuracy: 0.311857 val accuracy: 0.329000
lr 1.400000e-07 reg 1.000000e+04 train accuracy: 0.370531 val accuracy: 0.380000
lr 1.400000e-07 reg 2.000000e+04 train accuracy: 0.359776 val accuracy: 0.368000
lr 1.400000e-07 reg 3.000000e+04 train accuracy: 0.348898 val accuracy: 0.370000
lr 1.400000e-07 reg 4.000000e+04 train accuracy: 0.336980 val accuracy: 0.346000
lr 1.400000e-07 reg 5.000000e+04 train accuracy: 0.327531 val accuracy: 0.345000
lr 1.400000e-07 reg 6.000000e+04 train accuracy: 0.314388 val accuracy: 0.332000
lr 1.400000e-07 reg 7.000000e+04 train accuracy: 0.313429 val accuracy: 0.328000
lr 1.400000e-07 reg 8.000000e+04 train accuracy: 0.319347 val accuracy: 0.324000
lr 1.400000e-07 reg 9.000000e+04 train accuracy: 0.311122 val accuracy: 0.329000
lr 1.500000e-07 reg 1.000000e+04 train accuracy: 0.371306 val accuracy: 0.384000
lr 1.500000e-07 reg 2.000000e+04 train accuracy: 0.356265 val accuracy: 0.375000
lr 1.500000e-07 reg 3.000000e+04 train accuracy: 0.341061 val accuracy: 0.356000
lr 1.500000e-07 reg 4.000000e+04 train accuracy: 0.337388 val accuracy: 0.348000
lr 1.500000e-07 reg 5.000000e+04 train accuracy: 0.330673 val accuracy: 0.347000
lr 1.500000e-07 reg 6.000000e+04 train accuracy: 0.323959 val accuracy: 0.340000
lr 1.500000e-07 reg 7.000000e+04 train accuracy: 0.307061 val accuracy: 0.316000
lr 1.500000e-07 reg 8.000000e+04 train accuracy: 0.315082 val accuracy: 0.332000
lr 1.500000e-07 reg 9.000000e+04 train accuracy: 0.313306 val accuracy: 0.326000
lr 1.600000e-07 reg 1.000000e+04 train accuracy: 0.372735 val accuracy: 0.380000
lr 1.600000e-07 reg 2.000000e+04 train accuracy: 0.353633 val accuracy: 0.363000
lr 1.600000e-07 reg 3.000000e+04 train accuracy: 0.349776 val accuracy: 0.365000
lr 1.600000e-07 reg 4.000000e+04 train accuracy: 0.331224 val accuracy: 0.342000
lr 1.600000e-07 reg 5.000000e+04 train accuracy: 0.325204 val accuracy: 0.340000
lr 1.600000e-07 reg 6.000000e+04 train accuracy: 0.319388 val accuracy: 0.336000
lr 1.600000e-07 reg 7.000000e+04 train accuracy: 0.320939 val accuracy: 0.332000
lr 1.600000e-07 reg 8.000000e+04 train accuracy: 0.317367 val accuracy: 0.326000
lr 1.600000e-07 reg 9.000000e+04 train accuracy: 0.314490 val accuracy: 0.327000
lr 1.700000e-07 reg 1.000000e+04 train accuracy: 0.373571 val accuracy: 0.389000
lr 1.700000e-07 reg 2.000000e+04 train accuracy: 0.359286 val accuracy: 0.379000
lr 1.700000e-07 reg 3.000000e+04 train accuracy: 0.348776 val accuracy: 0.361000
lr 1.700000e-07 reg 4.000000e+04 train accuracy: 0.334714 val accuracy: 0.341000
lr 1.700000e-07 reg 5.000000e+04 train accuracy: 0.324306 val accuracy: 0.344000
lr 1.700000e-07 reg 6.000000e+04 train accuracy: 0.316245 val accuracy: 0.329000
lr 1.700000e-07 reg 7.000000e+04 train accuracy: 0.315408 val accuracy: 0.333000
lr 1.700000e-07 reg 8.000000e+04 train accuracy: 0.319061 val accuracy: 0.329000
lr 1.700000e-07 reg 9.000000e+04 train accuracy: 0.312878 val accuracy: 0.341000
lr 1.800000e-07 reg 1.000000e+04 train accuracy: 0.371082 val accuracy: 0.392000
lr 1.800000e-07 reg 2.000000e+04 train accuracy: 0.355327 val accuracy: 0.378000
lr 1.800000e-07 reg 3.000000e+04 train accuracy: 0.346388 val accuracy: 0.354000
lr 1.800000e-07 reg 4.000000e+04 train accuracy: 0.337592 val accuracy: 0.359000
lr 1.800000e-07 reg 5.000000e+04 train accuracy: 0.329265 val accuracy: 0.340000
lr 1.800000e-07 reg 6.000000e+04 train accuracy: 0.322898 val accuracy: 0.332000
lr 1.800000e-07 reg 7.000000e+04 train accuracy: 0.311612 val accuracy: 0.326000
lr 1.800000e-07 reg 8.000000e+04 train accuracy: 0.312878 val accuracy: 0.332000
lr 1.800000e-07 reg 9.000000e+04 train accuracy: 0.304592 val accuracy: 0.321000
lr 1.900000e-07 reg 1.000000e+04 train accuracy: 0.374939 val accuracy: 0.386000
lr 1.900000e-07 reg 2.000000e+04 train accuracy: 0.352469 val accuracy: 0.371000
lr 1.900000e-07 reg 3.000000e+04 train accuracy: 0.341939 val accuracy: 0.353000
lr 1.900000e-07 reg 4.000000e+04 train accuracy: 0.337388 val accuracy: 0.337000
lr 1.900000e-07 reg 5.000000e+04 train accuracy: 0.328143 val accuracy: 0.335000
lr 1.900000e-07 reg 6.000000e+04 train accuracy: 0.313959 val accuracy: 0.338000
lr 1.900000e-07 reg 7.000000e+04 train accuracy: 0.327449 val accuracy: 0.336000
lr 1.900000e-07 reg 8.000000e+04 train accuracy: 0.308082 val accuracy: 0.324000
lr 1.900000e-07 reg 9.000000e+04 train accuracy: 0.310245 val accuracy: 0.322000
best validation accuracy achieved during cross-validation: 0.392000

In [34]:
# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('softmax on raw pixels final test set accuracy: %f' % (test_accuracy, ))


softmax on raw pixels final test set accuracy: 0.371000

In [35]:
# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])