Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

  • implement a fully-vectorized loss function for the Softmax classifier
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation with numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

In [7]:
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [8]:
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
  """
  Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
  it for the linear classifier. These are the same steps as we used for the
  SVM, but condensed to a single function.  
  """
  # Load the raw CIFAR-10 data
  cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
  X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
  
  # subsample the data
  mask = range(num_training, num_training + num_validation)
  X_val = X_train[mask]
  y_val = y_train[mask]
  mask = range(num_training)
  X_train = X_train[mask]
  y_train = y_train[mask]
  mask = range(num_test)
  X_test = X_test[mask]
  y_test = y_test[mask]
  mask = np.random.choice(num_training, num_dev, replace=False)
  X_dev = X_train[mask]
  y_dev = y_train[mask]
  
  # Preprocessing: reshape the image data into rows
  X_train = np.reshape(X_train, (X_train.shape[0], -1))
  X_val = np.reshape(X_val, (X_val.shape[0], -1))
  X_test = np.reshape(X_test, (X_test.shape[0], -1))
  X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
  
  # Normalize the data: subtract the mean image
  mean_image = np.mean(X_train, axis = 0)
  X_train -= mean_image
  X_val -= mean_image
  X_test -= mean_image
  X_dev -= mean_image
  
  # add bias dimension and transform into columns
  X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
  X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
  X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
  X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
  
  return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
print 'dev data shape: ', X_dev.shape
print 'dev labels shape: ', y_dev.shape


Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.


In [29]:
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print 'loss: %f' % loss
print 'sanity check: %f' % (-np.log(0.1))


loss: 2.352936
sanity check: 2.302585

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: Fill this in


In [30]:
# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)


numerical: -3.526817 analytic: -3.526818, relative error: 2.606384e-08
numerical: -1.787475 analytic: -1.787475, relative error: 4.351633e-08
numerical: 0.572813 analytic: 0.572813, relative error: 1.522215e-07
numerical: 1.221848 analytic: 1.221848, relative error: 2.559102e-09
numerical: -0.001186 analytic: -0.001186, relative error: 1.730477e-05
numerical: -1.764124 analytic: -1.764124, relative error: 2.566734e-10
numerical: 1.130933 analytic: 1.130933, relative error: 5.629979e-08
numerical: -0.384060 analytic: -0.384060, relative error: 6.643542e-08
numerical: -1.018997 analytic: -1.018997, relative error: 1.970066e-08
numerical: -0.062565 analytic: -0.062565, relative error: 5.702677e-07
numerical: -2.314452 analytic: -2.314453, relative error: 3.361752e-08
numerical: 3.190563 analytic: 3.190563, relative error: 2.609034e-08
numerical: 4.848757 analytic: 4.848757, relative error: 1.241718e-08
numerical: 0.478147 analytic: 0.478146, relative error: 2.146887e-07
numerical: 0.905092 analytic: 0.905092, relative error: 3.255442e-08
numerical: 3.312822 analytic: 3.312822, relative error: 1.452412e-08
numerical: -3.595134 analytic: -3.595135, relative error: 1.740275e-08
numerical: 4.025925 analytic: 4.025925, relative error: 3.484822e-08
numerical: -1.759930 analytic: -1.759930, relative error: 1.564894e-08
numerical: -4.337332 analytic: -4.337332, relative error: 3.945069e-09

In [32]:
# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic)

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
print 'Gradient difference: %f' % grad_difference


naive loss: 2.352936e+00 computed in 0.057744s
vectorized loss: 2.352936e+00 computed in 0.005741s
Loss difference: 0.000000
Gradient difference: 0.000000

In [33]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-8, 1e-7, 5e-7]
regularization_strengths = [1e4, 2e4, 4e4, 5e4,7e4, 8e4, 1e5, 5e5, 9e5]
################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
for each_learning_rate in learning_rates:
    for each_regularization_strengths in regularization_strengths:
        softmax = Softmax()
        loss_hist = softmax.train(X_train, y_train, learning_rate=each_learning_rate, 
                              reg=each_regularization_strengths,
                              num_iters=1500, verbose=True)
        y_train_pred = softmax.predict(X_train)
        training_accuracy = np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        validation_accuracy = np.mean(y_val == y_val_pred)
        results[(each_learning_rate,each_regularization_strengths)]=(training_accuracy,validation_accuracy)
        if best_val < validation_accuracy:
            best_val = validation_accuracy
            best_softmax = softmax
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val


iteration 0 / 1500: loss 159.205425
iteration 100 / 1500: loss 155.433778
iteration 200 / 1500: loss 152.166464
iteration 300 / 1500: loss 149.355417
iteration 400 / 1500: loss 145.888098
iteration 500 / 1500: loss 143.240661
iteration 600 / 1500: loss 140.018775
iteration 700 / 1500: loss 137.010135
iteration 800 / 1500: loss 134.410951
iteration 900 / 1500: loss 131.811636
iteration 1000 / 1500: loss 128.712677
iteration 1100 / 1500: loss 126.408886
iteration 1200 / 1500: loss 124.079169
iteration 1300 / 1500: loss 121.232097
iteration 1400 / 1500: loss 118.921965
iteration 0 / 1500: loss 309.135203
iteration 100 / 1500: loss 297.075757
iteration 200 / 1500: loss 284.974500
iteration 300 / 1500: loss 273.777336
iteration 400 / 1500: loss 262.743252
iteration 500 / 1500: loss 252.485089
iteration 600 / 1500: loss 242.552700
iteration 700 / 1500: loss 232.825034
iteration 800 / 1500: loss 223.793820
iteration 900 / 1500: loss 214.981709
iteration 1000 / 1500: loss 206.603389
iteration 1100 / 1500: loss 198.645996
iteration 1200 / 1500: loss 190.664796
iteration 1300 / 1500: loss 183.027012
iteration 1400 / 1500: loss 176.312630
iteration 0 / 1500: loss 615.375500
iteration 100 / 1500: loss 567.852578
iteration 200 / 1500: loss 524.459296
iteration 300 / 1500: loss 483.925550
iteration 400 / 1500: loss 446.806275
iteration 500 / 1500: loss 412.475512
iteration 600 / 1500: loss 380.923836
iteration 700 / 1500: loss 351.751573
iteration 800 / 1500: loss 324.408869
iteration 900 / 1500: loss 299.423943
iteration 1000 / 1500: loss 276.727795
iteration 1100 / 1500: loss 255.359514
iteration 1200 / 1500: loss 235.812699
iteration 1300 / 1500: loss 217.573256
iteration 1400 / 1500: loss 201.035480
iteration 0 / 1500: loss 770.792696
iteration 100 / 1500: loss 697.196882
iteration 200 / 1500: loss 630.929962
iteration 300 / 1500: loss 570.323613
iteration 400 / 1500: loss 516.255414
iteration 500 / 1500: loss 467.196651
iteration 600 / 1500: loss 422.376244
iteration 700 / 1500: loss 382.513423
iteration 800 / 1500: loss 346.051918
iteration 900 / 1500: loss 313.312574
iteration 1000 / 1500: loss 283.556352
iteration 1100 / 1500: loss 256.636968
iteration 1200 / 1500: loss 232.565512
iteration 1300 / 1500: loss 210.658908
iteration 1400 / 1500: loss 190.607516
iteration 0 / 1500: loss 1084.283951
iteration 100 / 1500: loss 941.780685
iteration 200 / 1500: loss 818.990309
iteration 300 / 1500: loss 711.903049
iteration 400 / 1500: loss 618.683559
iteration 500 / 1500: loss 537.901980
iteration 600 / 1500: loss 467.670180
iteration 700 / 1500: loss 406.528585
iteration 800 / 1500: loss 353.523032
iteration 900 / 1500: loss 307.659161
iteration 1000 / 1500: loss 267.625024
iteration 1100 / 1500: loss 232.656382
iteration 1200 / 1500: loss 202.498304
iteration 1300 / 1500: loss 176.325342
iteration 1400 / 1500: loss 153.620673
iteration 0 / 1500: loss 1241.257923
iteration 100 / 1500: loss 1057.909774
iteration 200 / 1500: loss 900.666895
iteration 300 / 1500: loss 767.603010
iteration 400 / 1500: loss 654.247729
iteration 500 / 1500: loss 557.609817
iteration 600 / 1500: loss 475.059579
iteration 700 / 1500: loss 404.878086
iteration 800 / 1500: loss 345.135568
iteration 900 / 1500: loss 294.278896
iteration 1000 / 1500: loss 251.027337
iteration 1100 / 1500: loss 214.013640
iteration 1200 / 1500: loss 182.721004
iteration 1300 / 1500: loss 155.919197
iteration 1400 / 1500: loss 133.148153
iteration 0 / 1500: loss 1544.519634
iteration 100 / 1500: loss 1264.235243
iteration 200 / 1500: loss 1035.132208
iteration 300 / 1500: loss 847.652590
iteration 400 / 1500: loss 694.008702
iteration 500 / 1500: loss 568.338165
iteration 600 / 1500: loss 465.718045
iteration 700 / 1500: loss 381.328666
iteration 800 / 1500: loss 312.473531
iteration 900 / 1500: loss 256.165894
iteration 1000 / 1500: loss 209.892949
iteration 1100 / 1500: loss 172.243309
iteration 1200 / 1500: loss 141.512518
iteration 1300 / 1500: loss 116.063904
iteration 1400 / 1500: loss 95.423835
iteration 0 / 1500: loss 7710.793120
iteration 100 / 1500: loss 2830.216281
iteration 200 / 1500: loss 1039.506509
iteration 300 / 1500: loss 382.764571
iteration 400 / 1500: loss 141.788013
iteration 500 / 1500: loss 53.430930
iteration 600 / 1500: loss 21.041891
iteration 700 / 1500: loss 9.135996
iteration 800 / 1500: loss 4.756922
iteration 900 / 1500: loss 3.161991
iteration 1000 / 1500: loss 2.574077
iteration 1100 / 1500: loss 2.347307
iteration 1200 / 1500: loss 2.271683
iteration 1300 / 1500: loss 2.265957
iteration 1400 / 1500: loss 2.239610
iteration 0 / 1500: loss 13732.687493
iteration 100 / 1500: loss 2252.732708
iteration 200 / 1500: loss 371.088124
iteration 300 / 1500: loss 62.682690
iteration 400 / 1500: loss 12.168478
iteration 500 / 1500: loss 3.874697
iteration 600 / 1500: loss 2.534513
iteration 700 / 1500: loss 2.308658
iteration 800 / 1500: loss 2.265235
iteration 900 / 1500: loss 2.248112
iteration 1000 / 1500: loss 2.260053
iteration 1100 / 1500: loss 2.262487
iteration 1200 / 1500: loss 2.265760
iteration 1300 / 1500: loss 2.236564
iteration 1400 / 1500: loss 2.274046
iteration 0 / 1500: loss 159.994315
iteration 100 / 1500: loss 130.176992
iteration 200 / 1500: loss 106.263889
iteration 300 / 1500: loss 87.047652
iteration 400 / 1500: loss 71.706599
iteration 500 / 1500: loss 58.778192
iteration 600 / 1500: loss 48.391164
iteration 700 / 1500: loss 39.839249
iteration 800 / 1500: loss 32.863524
iteration 900 / 1500: loss 27.276918
iteration 1000 / 1500: loss 22.717243
iteration 1100 / 1500: loss 18.906190
iteration 1200 / 1500: loss 15.762623
iteration 1300 / 1500: loss 13.267202
iteration 1400 / 1500: loss 11.115246
iteration 0 / 1500: loss 315.152177
iteration 100 / 1500: loss 210.635754
iteration 200 / 1500: loss 141.111579
iteration 300 / 1500: loss 95.252043
iteration 400 / 1500: loss 64.138708
iteration 500 / 1500: loss 43.539737
iteration 600 / 1500: loss 29.844271
iteration 700 / 1500: loss 20.534024
iteration 800 / 1500: loss 14.412860
iteration 900 / 1500: loss 10.270425
iteration 1000 / 1500: loss 7.558812
iteration 1100 / 1500: loss 5.758345
iteration 1200 / 1500: loss 4.490360
iteration 1300 / 1500: loss 3.714864
iteration 1400 / 1500: loss 3.117840
iteration 0 / 1500: loss 622.639999
iteration 100 / 1500: loss 278.889155
iteration 200 / 1500: loss 125.769788
iteration 300 / 1500: loss 57.341702
iteration 400 / 1500: loss 26.813333
iteration 500 / 1500: loss 13.148192
iteration 600 / 1500: loss 7.054771
iteration 700 / 1500: loss 4.304510
iteration 800 / 1500: loss 3.042950
iteration 900 / 1500: loss 2.495801
iteration 1000 / 1500: loss 2.332903
iteration 1100 / 1500: loss 2.125242
iteration 1200 / 1500: loss 2.140071
iteration 1300 / 1500: loss 2.134718
iteration 1400 / 1500: loss 2.091882
iteration 0 / 1500: loss 769.964876
iteration 100 / 1500: loss 282.412874
iteration 200 / 1500: loss 104.708933
iteration 300 / 1500: loss 39.703116
iteration 400 / 1500: loss 15.837832
iteration 500 / 1500: loss 7.134351
iteration 600 / 1500: loss 3.879752
iteration 700 / 1500: loss 2.753676
iteration 800 / 1500: loss 2.300140
iteration 900 / 1500: loss 2.140734
iteration 1000 / 1500: loss 2.153199
iteration 1100 / 1500: loss 2.051594
iteration 1200 / 1500: loss 2.103493
iteration 1300 / 1500: loss 2.049923
iteration 1400 / 1500: loss 2.112760
iteration 0 / 1500: loss 1084.957036
iteration 100 / 1500: loss 267.218628
iteration 200 / 1500: loss 66.999712
iteration 300 / 1500: loss 17.998862
iteration 400 / 1500: loss 5.989598
iteration 500 / 1500: loss 3.126416
iteration 600 / 1500: loss 2.299893
iteration 700 / 1500: loss 2.156408
iteration 800 / 1500: loss 2.159880
iteration 900 / 1500: loss 2.147064
iteration 1000 / 1500: loss 2.101964
iteration 1100 / 1500: loss 2.126116
iteration 1200 / 1500: loss 2.093875
iteration 1300 / 1500: loss 2.126641
iteration 1400 / 1500: loss 2.127250
iteration 0 / 1500: loss 1239.185750
iteration 100 / 1500: loss 249.303824
iteration 200 / 1500: loss 51.598861
iteration 300 / 1500: loss 12.032171
iteration 400 / 1500: loss 4.133626
iteration 500 / 1500: loss 2.493720
iteration 600 / 1500: loss 2.274049
iteration 700 / 1500: loss 2.120226
iteration 800 / 1500: loss 2.128837
iteration 900 / 1500: loss 2.187552
iteration 1000 / 1500: loss 2.094202
iteration 1100 / 1500: loss 2.174428
iteration 1200 / 1500: loss 2.109132
iteration 1300 / 1500: loss 2.101276
iteration 1400 / 1500: loss 2.201819
iteration 0 / 1500: loss 1545.815261
iteration 100 / 1500: loss 208.383313
iteration 200 / 1500: loss 29.679054
iteration 300 / 1500: loss 5.816233
iteration 400 / 1500: loss 2.623350
iteration 500 / 1500: loss 2.173627
iteration 600 / 1500: loss 2.109599
iteration 700 / 1500: loss 2.113380
iteration 800 / 1500: loss 2.155737
iteration 900 / 1500: loss 2.170100
iteration 1000 / 1500: loss 2.145141
iteration 1100 / 1500: loss 2.130463
iteration 1200 / 1500: loss 2.160911
iteration 1300 / 1500: loss 2.158564
iteration 1400 / 1500: loss 2.162456
iteration 0 / 1500: loss 7764.867606
iteration 100 / 1500: loss 2.515367
iteration 200 / 1500: loss 2.214917
iteration 300 / 1500: loss 2.245443
iteration 400 / 1500: loss 2.242726
iteration 500 / 1500: loss 2.252723
iteration 600 / 1500: loss 2.247719
iteration 700 / 1500: loss 2.238744
iteration 800 / 1500: loss 2.262130
iteration 900 / 1500: loss 2.248127
iteration 1000 / 1500: loss 2.254710
iteration 1100 / 1500: loss 2.270015
iteration 1200 / 1500: loss 2.263871
iteration 1300 / 1500: loss 2.254361
iteration 1400 / 1500: loss 2.276497
iteration 0 / 1500: loss 13973.215166
iteration 100 / 1500: loss 2.270913
iteration 200 / 1500: loss 2.264900
iteration 300 / 1500: loss 2.253899
iteration 400 / 1500: loss 2.273594
iteration 500 / 1500: loss 2.287884
iteration 600 / 1500: loss 2.269746
iteration 700 / 1500: loss 2.278998
iteration 800 / 1500: loss 2.256806
iteration 900 / 1500: loss 2.251959
iteration 1000 / 1500: loss 2.252563
iteration 1100 / 1500: loss 2.263518
iteration 1200 / 1500: loss 2.271828
iteration 1300 / 1500: loss 2.244810
iteration 1400 / 1500: loss 2.276376
iteration 0 / 1500: loss 160.049470
iteration 100 / 1500: loss 58.608440
iteration 200 / 1500: loss 22.564711
iteration 300 / 1500: loss 9.465723
iteration 400 / 1500: loss 4.639770
iteration 500 / 1500: loss 2.898959
iteration 600 / 1500: loss 2.339073
iteration 700 / 1500: loss 2.112580
iteration 800 / 1500: loss 2.089002
iteration 900 / 1500: loss 1.972827
iteration 1000 / 1500: loss 1.986978
iteration 1100 / 1500: loss 1.935220
iteration 1200 / 1500: loss 1.920282
iteration 1300 / 1500: loss 1.893546
iteration 1400 / 1500: loss 1.984326
iteration 0 / 1500: loss 308.804355
iteration 100 / 1500: loss 42.422420
iteration 200 / 1500: loss 7.286356
iteration 300 / 1500: loss 2.679014
iteration 400 / 1500: loss 2.122769
iteration 500 / 1500: loss 1.950609
iteration 600 / 1500: loss 2.115000
iteration 700 / 1500: loss 2.059215
iteration 800 / 1500: loss 2.014663
iteration 900 / 1500: loss 1.980800
iteration 1000 / 1500: loss 1.939608
iteration 1100 / 1500: loss 2.098412
iteration 1200 / 1500: loss 2.014520
iteration 1300 / 1500: loss 2.080888
iteration 1400 / 1500: loss 2.068928
iteration 0 / 1500: loss 622.157982
iteration 100 / 1500: loss 12.761975
iteration 200 / 1500: loss 2.218163
iteration 300 / 1500: loss 2.057839
iteration 400 / 1500: loss 2.068165
iteration 500 / 1500: loss 2.027080
iteration 600 / 1500: loss 2.004811
iteration 700 / 1500: loss 2.045854
iteration 800 / 1500: loss 2.115432
iteration 900 / 1500: loss 2.076386
iteration 1000 / 1500: loss 2.034054
iteration 1100 / 1500: loss 2.064267
iteration 1200 / 1500: loss 2.075759
iteration 1300 / 1500: loss 2.106084
iteration 1400 / 1500: loss 2.097296
iteration 0 / 1500: loss 781.455687
iteration 100 / 1500: loss 6.921494
iteration 200 / 1500: loss 2.134059
iteration 300 / 1500: loss 2.066791
iteration 400 / 1500: loss 2.116358
iteration 500 / 1500: loss 2.135433
iteration 600 / 1500: loss 2.126751
iteration 700 / 1500: loss 2.117793
iteration 800 / 1500: loss 2.117208
iteration 900 / 1500: loss 2.110075
iteration 1000 / 1500: loss 2.109842
iteration 1100 / 1500: loss 2.142552
iteration 1200 / 1500: loss 2.087974
iteration 1300 / 1500: loss 2.143957
iteration 1400 / 1500: loss 2.020720
iteration 0 / 1500: loss 1070.218728
iteration 100 / 1500: loss 2.945513
iteration 200 / 1500: loss 2.107378
iteration 300 / 1500: loss 2.085094
iteration 400 / 1500: loss 2.057311
iteration 500 / 1500: loss 2.173754
iteration 600 / 1500: loss 2.070272
iteration 700 / 1500: loss 2.117827
iteration 800 / 1500: loss 2.125816
iteration 900 / 1500: loss 2.119686
iteration 1000 / 1500: loss 2.076408
iteration 1100 / 1500: loss 2.105236
iteration 1200 / 1500: loss 2.126231
iteration 1300 / 1500: loss 2.125897
iteration 1400 / 1500: loss 2.141845
iteration 0 / 1500: loss 1241.697559
iteration 100 / 1500: loss 2.458240
iteration 200 / 1500: loss 2.120202
iteration 300 / 1500: loss 2.135582
iteration 400 / 1500: loss 2.195644
iteration 500 / 1500: loss 2.132111
iteration 600 / 1500: loss 2.093153
iteration 700 / 1500: loss 2.149607
iteration 800 / 1500: loss 2.107424
iteration 900 / 1500: loss 2.115212
iteration 1000 / 1500: loss 2.141821
iteration 1100 / 1500: loss 2.143306
iteration 1200 / 1500: loss 2.153681
iteration 1300 / 1500: loss 2.128703
iteration 1400 / 1500: loss 2.140697
iteration 0 / 1500: loss 1547.628076
iteration 100 / 1500: loss 2.157834
iteration 200 / 1500: loss 2.104486
iteration 300 / 1500: loss 2.154733
iteration 400 / 1500: loss 2.157894
iteration 500 / 1500: loss 2.168180
iteration 600 / 1500: loss 2.156256
iteration 700 / 1500: loss 2.172735
iteration 800 / 1500: loss 2.165624
iteration 900 / 1500: loss 2.122681
iteration 1000 / 1500: loss 2.151093
iteration 1100 / 1500: loss 2.222942
iteration 1200 / 1500: loss 2.119937
iteration 1300 / 1500: loss 2.144201
iteration 1400 / 1500: loss 2.157420
iteration 0 / 1500: loss 7627.912842
iteration 100 / 1500: loss 2.255520
iteration 200 / 1500: loss 2.243902
iteration 300 / 1500: loss 2.231339
iteration 400 / 1500: loss 2.251074
iteration 500 / 1500: loss 2.242980
iteration 600 / 1500: loss 2.237043
iteration 700 / 1500: loss 2.250524
iteration 800 / 1500: loss 2.250289
iteration 900 / 1500: loss 2.241106
iteration 1000 / 1500: loss 2.238557
iteration 1100 / 1500: loss 2.251839
iteration 1200 / 1500: loss 2.265126
iteration 1300 / 1500: loss 2.250693
iteration 1400 / 1500: loss 2.217953
iteration 0 / 1500: loss 13790.917683
iteration 100 / 1500: loss 2.282027
iteration 200 / 1500: loss 2.270903
iteration 300 / 1500: loss 2.266035
iteration 400 / 1500: loss 2.264666
iteration 500 / 1500: loss 2.279512
iteration 600 / 1500: loss 2.277880
iteration 700 / 1500: loss 2.274390
iteration 800 / 1500: loss 2.268928
iteration 900 / 1500: loss 2.263280
iteration 1000 / 1500: loss 2.266723
iteration 1100 / 1500: loss 2.284131
iteration 1200 / 1500: loss 2.281562
iteration 1300 / 1500: loss 2.268039
iteration 1400 / 1500: loss 2.285796
lr 1.000000e-08 reg 1.000000e+04 train accuracy: 0.143367 val accuracy: 0.146000
lr 1.000000e-08 reg 2.000000e+04 train accuracy: 0.163510 val accuracy: 0.167000
lr 1.000000e-08 reg 4.000000e+04 train accuracy: 0.168837 val accuracy: 0.160000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.159653 val accuracy: 0.150000
lr 1.000000e-08 reg 7.000000e+04 train accuracy: 0.174939 val accuracy: 0.184000
lr 1.000000e-08 reg 8.000000e+04 train accuracy: 0.200224 val accuracy: 0.206000
lr 1.000000e-08 reg 1.000000e+05 train accuracy: 0.214776 val accuracy: 0.225000
lr 1.000000e-08 reg 5.000000e+05 train accuracy: 0.266878 val accuracy: 0.279000
lr 1.000000e-08 reg 9.000000e+05 train accuracy: 0.255959 val accuracy: 0.267000
lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.333020 val accuracy: 0.320000
lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.353224 val accuracy: 0.365000
lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.340163 val accuracy: 0.354000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.326265 val accuracy: 0.336000
lr 1.000000e-07 reg 7.000000e+04 train accuracy: 0.317592 val accuracy: 0.333000
lr 1.000000e-07 reg 8.000000e+04 train accuracy: 0.312939 val accuracy: 0.337000
lr 1.000000e-07 reg 1.000000e+05 train accuracy: 0.302429 val accuracy: 0.317000
lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.260449 val accuracy: 0.279000
lr 1.000000e-07 reg 9.000000e+05 train accuracy: 0.264306 val accuracy: 0.275000
lr 5.000000e-07 reg 1.000000e+04 train accuracy: 0.375347 val accuracy: 0.370000
lr 5.000000e-07 reg 2.000000e+04 train accuracy: 0.354755 val accuracy: 0.363000
lr 5.000000e-07 reg 4.000000e+04 train accuracy: 0.331469 val accuracy: 0.353000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.332082 val accuracy: 0.348000
lr 5.000000e-07 reg 7.000000e+04 train accuracy: 0.312102 val accuracy: 0.327000
lr 5.000000e-07 reg 8.000000e+04 train accuracy: 0.307898 val accuracy: 0.322000
lr 5.000000e-07 reg 1.000000e+05 train accuracy: 0.302531 val accuracy: 0.324000
lr 5.000000e-07 reg 5.000000e+05 train accuracy: 0.232204 val accuracy: 0.232000
lr 5.000000e-07 reg 9.000000e+05 train accuracy: 0.250939 val accuracy: 0.262000
best validation accuracy achieved during cross-validation: 0.370000

In [34]:
# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )


softmax on raw pixels final test set accuracy: 0.371000

In [35]:
# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
  plt.subplot(2, 5, i + 1)
  
  # Rescale the weights to be between 0 and 255
  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
  plt.imshow(wimg.astype('uint8'))
  plt.axis('off')
  plt.title(classes[i])