Train a ConvNet!

We now have a generic solver and a bunch of modularized layers. It's time to put it all together, and train a ConvNet to recognize the classes in CIFAR-10. In this notebook we will walk you through training a simple two-layer ConvNet and then set you free to build the best net that you can to perform well on CIFAR-10.

Open up the file cs231n/classifiers/convnet.py; you will see that the two_layer_convnet function computes the loss and gradients for a two-layer ConvNet. Note that this function uses the "sandwich" layers defined in cs231n/layer_utils.py.


In [2]:
# As usual, a bit of setup

import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifier_trainer import ClassifierTrainer
from cs231n.gradient_check import eval_numerical_gradient
from cs231n.classifiers.convnet import *

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-18, np.abs(x) + np.abs(y))))

In [40]:
from cs231n.data_utils import load_CIFAR10
# Modify load_CIFAR10 and the following function to load less data if you have memory issues.
# Load batches 1, 2 and 3; and call the function as follows:
#def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    
    # Transpose so that channels come first
    X_train = X_train.transpose(0, 3, 1, 2).copy()
    X_val = X_val.transpose(0, 3, 1, 2).copy()
    X_test = X_test.transpose(0, 3, 1, 2).copy()

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 3, 32, 32)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3, 32, 32)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3, 32, 32)
Test labels shape:  (1000,)

Sanity check loss

After you build a new network, one of the first things you should do is sanity check the loss. When we use the softmax loss, we expect the loss for random weights (and no regularization) to be about log(C) for C classes. When we add regularization this should go up.


In [4]:
model = init_two_layer_convnet()

X = np.random.randn(100, 3, 32, 32)
y = np.random.randint(10, size=100)

loss, _ = two_layer_convnet(X, model, y, reg=0)

# Sanity check: Loss should be about log(10) = 2.3026
print 'Sanity check loss (no regularization): ', loss

# Sanity check: Loss should go up when you add regularization
loss, _ = two_layer_convnet(X, model, y, reg=1)
print 'Sanity check loss (with regularization): ', loss


Sanity check loss (no regularization):  2.30242366817
Sanity check loss (with regularization):  2.34476238327

Gradient check

After the loss looks reasonable, you should always use numeric gradient checking to make sure that your backward pass is correct. When you use numeric gradient checking you should use a small amount of artifical data and a small number of neurons at each layer.


In [5]:
num_inputs = 2
input_shape = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_shape)
y = np.random.randint(num_classes, size=num_inputs)

model = init_two_layer_convnet(num_filters=3, filter_size=3, input_shape=input_shape)
loss, grads = two_layer_convnet(X, model, y)
for param_name in sorted(grads):
    f = lambda _: two_layer_convnet(X, model, y)[0]
    param_grad_num = eval_numerical_gradient(f, model[param_name], verbose=False, h=1e-6)
    e = rel_error(param_grad_num, grads[param_name])
    print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))


W1 max relative error: 6.530260e-07
W2 max relative error: 9.622474e-06
b1 max relative error: 9.775334e-08
b2 max relative error: 2.268415e-07

Overfit small data

A nice trick is to train your model with just a few training samples. You should be able to overfit small datasets, which will result in very high training accuracy and comparatively low validation accuracy.


In [6]:
# Use a two-layer ConvNet to overfit 50 training examples.

model = init_two_layer_convnet()
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train[:50], y_train[:50], X_val, y_val, model, two_layer_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=10, num_epochs=10,
          verbose=True)


starting iteration  0
Finished epoch 0 / 10: cost 2.305419, train: 0.120000, val 0.145000, lr 1.000000e-04
Finished epoch 1 / 10: cost 2.278910, train: 0.300000, val 0.139000, lr 9.500000e-05
Finished epoch 2 / 10: cost 1.974345, train: 0.340000, val 0.161000, lr 9.025000e-05
starting iteration  10
Finished epoch 3 / 10: cost 2.066883, train: 0.400000, val 0.157000, lr 8.573750e-05
Finished epoch 4 / 10: cost 1.068187, train: 0.520000, val 0.187000, lr 8.145062e-05
starting iteration  20
Finished epoch 5 / 10: cost 1.962163, train: 0.620000, val 0.190000, lr 7.737809e-05
Finished epoch 6 / 10: cost 0.925579, train: 0.480000, val 0.136000, lr 7.350919e-05
starting iteration  30
Finished epoch 7 / 10: cost 0.967514, train: 0.840000, val 0.202000, lr 6.983373e-05
Finished epoch 8 / 10: cost 0.360342, train: 0.900000, val 0.150000, lr 6.634204e-05
starting iteration  40
Finished epoch 9 / 10: cost 0.171145, train: 0.860000, val 0.141000, lr 6.302494e-05
Finished epoch 10 / 10: cost 0.267698, train: 0.980000, val 0.159000, lr 5.987369e-05
finished optimization. best validation accuracy: 0.202000

Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:


In [7]:
plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.xlabel('iteration')
plt.ylabel('loss')

plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()


Train the net

Once the above works, training the net is the next thing to try. You can set the acc_frequency parameter to change the frequency at which the training and validation set accuracies are tested. If your parameters are set properly, you should see the training and validation accuracy start to improve within a hundred iterations, and you should be able to train a reasonable model with just one epoch.

Using the parameters below you should be able to get around 50% accuracy on the validation set.


In [9]:
model = init_two_layer_convnet(filter_size=7)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, two_layer_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=50, num_epochs=1,
          acc_frequency=50, verbose=True)


starting iteration  0
Finished epoch 0 / 1: cost 2.306209, train: 0.114000, val 0.117000, lr 1.000000e-04
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 1: cost 1.965767, train: 0.341000, val 0.364000, lr 1.000000e-04
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 1: cost 1.887858, train: 0.384000, val 0.378000, lr 1.000000e-04
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 1: cost 1.621384, train: 0.402000, val 0.392000, lr 1.000000e-04
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 1: cost 1.228478, train: 0.428000, val 0.450000, lr 1.000000e-04
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 1: cost 1.971386, train: 0.407000, val 0.392000, lr 1.000000e-04
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 1: cost 1.627092, train: 0.395000, val 0.381000, lr 1.000000e-04
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 1: cost 1.299318, train: 0.460000, val 0.467000, lr 1.000000e-04
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 1: cost 1.832007, train: 0.459000, val 0.436000, lr 1.000000e-04
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 1: cost 1.478251, train: 0.443000, val 0.457000, lr 1.000000e-04
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 1: cost 1.431446, train: 0.408000, val 0.414000, lr 1.000000e-04
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 1: cost 1.537848, train: 0.444000, val 0.451000, lr 1.000000e-04
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 1: cost 1.912046, train: 0.464000, val 0.444000, lr 1.000000e-04
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 1: cost 1.362984, train: 0.452000, val 0.446000, lr 1.000000e-04
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 1: cost 1.914334, train: 0.444000, val 0.449000, lr 1.000000e-04
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 1: cost 1.363243, train: 0.443000, val 0.439000, lr 1.000000e-04
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 1: cost 1.462557, train: 0.493000, val 0.463000, lr 1.000000e-04
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 1: cost 1.585638, train: 0.506000, val 0.486000, lr 1.000000e-04
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 1: cost 1.672146, train: 0.502000, val 0.489000, lr 1.000000e-04
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 1: cost 1.588818, train: 0.529000, val 0.451000, lr 1.000000e-04
starting iteration  960
starting iteration  970
Finished epoch 1 / 1: cost 1.509724, train: 0.486000, val 0.451000, lr 9.500000e-05
finished optimization. best validation accuracy: 0.489000

Visualize weights

We can visualize the convolutional weights from the first layer. If everything worked properly, these will usually be edges and blobs of various colors and orientations.


In [10]:
from cs231n.vis_utils import visualize_grid

grid = visualize_grid(best_model['W1'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))


Out[10]:
<matplotlib.image.AxesImage at 0x7fc98ad20710>

Experiment!

Experiment and try to get the best performance that you can on CIFAR-10 using a ConvNet. Here are some ideas to get you started:

Things you should try:

  • Filter size: Above we used 7x7; this makes pretty pictures but smaller filters may be more efficient
  • Number of filters: Above we used 32 filters. Do more or fewer do better?
  • Network depth: The network above has two layers of trainable parameters. Can you do better with a deeper network? You can implement alternative architectures in the file cs231n/classifiers/convnet.py. Some good architectures to try include:
    • [conv-relu-pool]xN - conv - relu - [affine]xM - [softmax or SVM]
    • [conv-relu-pool]XN - [affine]XM - [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN - [affine]xM - [softmax or SVM]

Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

  • If the parameters are working well, you should see improvement within a few hundred iterations
  • Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
  • Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

Going above and beyond

If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.

  • Alternative update steps: For the assignment we implemented SGD+momentum and RMSprop; you could try alternatives like AdaGrad or AdaDelta.
  • Other forms of regularization such as L1 or Dropout
  • Alternative activation functions such as leaky ReLU or maxout
  • Model ensembles
  • Data augmentation

What we expect

At the very least, you should be able to train a ConvNet that gets at least 65% accuracy on the validation set. This is just a lower bound - if you are careful it should be possible to get accuracies much higher than that! Extra credit points will be awarded for particularly high-scoring models or unique approaches.

You should use the space below to experiment and train your network. The final cell in this notebook should contain the training, validation, and test set accuracies for your final trained network. In this notebook you should also write an explanation of what you did, any additional features that you implemented, and any visualizations or graphs that you make in the process of training and evaluating your network.

Have fun and happy training!

Let's first implement a convnet for [conv-relu-pool]XN - [affine]XM - [softmax or SVM], in which we can spesicify N and M, and test it with N=M=1 to make sure that it's working, it should produce a similar result to the one provided above.


In [28]:
model = init_my_convnet(filter_size=5)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, my_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=50, num_epochs=1,
          acc_frequency=50, verbose=True)


starting iteration  0
Finished epoch 0 / 1: cost 2.306267, train: 0.091000, val 0.087000, lr 1.000000e-04
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 1: cost 1.939814, train: 0.297000, val 0.318000, lr 1.000000e-04
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 1: cost 2.021238, train: 0.369000, val 0.366000, lr 1.000000e-04
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 1: cost 1.520605, train: 0.401000, val 0.408000, lr 1.000000e-04
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 1: cost 1.676531, train: 0.435000, val 0.433000, lr 1.000000e-04
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 1: cost 1.428057, train: 0.454000, val 0.460000, lr 1.000000e-04
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 1: cost 1.686054, train: 0.459000, val 0.452000, lr 1.000000e-04
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 1: cost 1.878882, train: 0.495000, val 0.469000, lr 1.000000e-04
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 1: cost 1.475378, train: 0.514000, val 0.495000, lr 1.000000e-04
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 1: cost 1.621550, train: 0.468000, val 0.470000, lr 1.000000e-04
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 1: cost 1.560145, train: 0.502000, val 0.498000, lr 1.000000e-04
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 1: cost 1.361562, train: 0.455000, val 0.464000, lr 1.000000e-04
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 1: cost 1.287574, train: 0.524000, val 0.501000, lr 1.000000e-04
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 1: cost 1.273405, train: 0.504000, val 0.506000, lr 1.000000e-04
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 1: cost 1.218164, train: 0.504000, val 0.534000, lr 1.000000e-04
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 1: cost 1.829692, train: 0.522000, val 0.527000, lr 1.000000e-04
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 1: cost 1.633394, train: 0.538000, val 0.536000, lr 1.000000e-04
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 1: cost 1.233817, train: 0.514000, val 0.535000, lr 1.000000e-04
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 1: cost 1.288067, train: 0.470000, val 0.517000, lr 1.000000e-04
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 1: cost 1.694094, train: 0.537000, val 0.515000, lr 1.000000e-04
starting iteration  960
starting iteration  970
Finished epoch 1 / 1: cost 1.748975, train: 0.548000, val 0.526000, lr 9.500000e-05
finished optimization. best validation accuracy: 0.536000

Now let's try it with 2 conv_relu_pooling layers followed by 1 affine layer and softmax as loss.


In [35]:
model = init_my_convnet(filter_size=5, Naff=1, Ncrp=2)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, my_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0005, batch_size=50, num_epochs=1,
          acc_frequency=50, verbose=True)


starting iteration  0
Finished epoch 0 / 1: cost 2.302609, train: 0.103000, val 0.085000, lr 5.000000e-04
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 1: cost 2.302107, train: 0.213000, val 0.227000, lr 5.000000e-04
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 1: cost 2.284233, train: 0.171000, val 0.189000, lr 5.000000e-04
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 1: cost 1.965318, train: 0.224000, val 0.235000, lr 5.000000e-04
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 1: cost 2.008021, train: 0.272000, val 0.318000, lr 5.000000e-04
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 1: cost 1.979000, train: 0.331000, val 0.361000, lr 5.000000e-04
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 1: cost 1.588286, train: 0.361000, val 0.391000, lr 5.000000e-04
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 1: cost 1.710140, train: 0.391000, val 0.421000, lr 5.000000e-04
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 1: cost 1.489501, train: 0.444000, val 0.445000, lr 5.000000e-04
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 1: cost 1.744067, train: 0.442000, val 0.447000, lr 5.000000e-04
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 1: cost 1.305959, train: 0.516000, val 0.478000, lr 5.000000e-04
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 1: cost 1.485334, train: 0.492000, val 0.499000, lr 5.000000e-04
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 1: cost 1.400453, train: 0.480000, val 0.502000, lr 5.000000e-04
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 1: cost 1.588982, train: 0.527000, val 0.518000, lr 5.000000e-04
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 1: cost 1.590275, train: 0.473000, val 0.526000, lr 5.000000e-04
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 1: cost 1.309673, train: 0.526000, val 0.536000, lr 5.000000e-04
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 1: cost 1.589883, train: 0.487000, val 0.532000, lr 5.000000e-04
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 1: cost 1.045143, train: 0.528000, val 0.523000, lr 5.000000e-04
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 1: cost 1.238712, train: 0.544000, val 0.554000, lr 5.000000e-04
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 1: cost 1.250409, train: 0.545000, val 0.566000, lr 5.000000e-04
starting iteration  960
starting iteration  970
Finished epoch 1 / 1: cost 1.203590, train: 0.576000, val 0.571000, lr 4.750000e-04
finished optimization. best validation accuracy: 0.571000

It seems to be working with those parameters let us train it more to see if we can get better accuracy on validation set.


In [41]:
model = init_my_convnet(filter_size=5, Naff=1, Ncrp=2)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, my_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0005, batch_size=50, num_epochs=5,
          acc_frequency=50, verbose=True)


starting iteration  0
Finished epoch 0 / 5: cost 2.302306, train: 0.093000, val 0.115000, lr 5.000000e-04
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 5: cost 2.301951, train: 0.198000, val 0.189000, lr 5.000000e-04
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 5: cost 2.040745, train: 0.243000, val 0.224000, lr 5.000000e-04
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 5: cost 2.175487, train: 0.307000, val 0.300000, lr 5.000000e-04
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 5: cost 1.733647, train: 0.326000, val 0.329000, lr 5.000000e-04
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 5: cost 1.774182, train: 0.386000, val 0.392000, lr 5.000000e-04
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 5: cost 1.640344, train: 0.410000, val 0.413000, lr 5.000000e-04
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 5: cost 1.424809, train: 0.431000, val 0.458000, lr 5.000000e-04
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 5: cost 1.639034, train: 0.486000, val 0.471000, lr 5.000000e-04
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 5: cost 1.408898, train: 0.483000, val 0.513000, lr 5.000000e-04
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 5: cost 1.601467, train: 0.496000, val 0.510000, lr 5.000000e-04
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 5: cost 1.239099, train: 0.471000, val 0.491000, lr 5.000000e-04
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 5: cost 1.203183, train: 0.487000, val 0.511000, lr 5.000000e-04
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 5: cost 1.491997, train: 0.514000, val 0.561000, lr 5.000000e-04
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 5: cost 1.350475, train: 0.538000, val 0.565000, lr 5.000000e-04
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 5: cost 1.401679, train: 0.562000, val 0.547000, lr 5.000000e-04
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 5: cost 1.070257, train: 0.604000, val 0.569000, lr 5.000000e-04
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 5: cost 0.992967, train: 0.562000, val 0.561000, lr 5.000000e-04
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 5: cost 1.285767, train: 0.585000, val 0.581000, lr 5.000000e-04
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 5: cost 1.285270, train: 0.606000, val 0.577000, lr 5.000000e-04
starting iteration  960
starting iteration  970
Finished epoch 1 / 5: cost 1.257491, train: 0.605000, val 0.565000, lr 4.750000e-04
starting iteration  980
starting iteration  990
starting iteration  1000
Finished epoch 1 / 5: cost 1.104038, train: 0.570000, val 0.564000, lr 4.750000e-04
starting iteration  1010
starting iteration  1020
starting iteration  1030
starting iteration  1040
starting iteration  1050
Finished epoch 1 / 5: cost 1.091238, train: 0.618000, val 0.579000, lr 4.750000e-04
starting iteration  1060
starting iteration  1070
starting iteration  1080
starting iteration  1090
starting iteration  1100
Finished epoch 1 / 5: cost 0.865487, train: 0.642000, val 0.584000, lr 4.750000e-04
starting iteration  1110
starting iteration  1120
starting iteration  1130
starting iteration  1140
starting iteration  1150
Finished epoch 1 / 5: cost 0.966514, train: 0.600000, val 0.596000, lr 4.750000e-04
starting iteration  1160
starting iteration  1170
starting iteration  1180
starting iteration  1190
starting iteration  1200
Finished epoch 1 / 5: cost 1.145190, train: 0.622000, val 0.591000, lr 4.750000e-04
starting iteration  1210
starting iteration  1220
starting iteration  1230
starting iteration  1240
starting iteration  1250
Finished epoch 1 / 5: cost 1.066363, train: 0.661000, val 0.597000, lr 4.750000e-04
starting iteration  1260
starting iteration  1270
starting iteration  1280
starting iteration  1290
starting iteration  1300
Finished epoch 1 / 5: cost 1.252574, train: 0.668000, val 0.606000, lr 4.750000e-04
starting iteration  1310
starting iteration  1320
starting iteration  1330
starting iteration  1340
starting iteration  1350
Finished epoch 1 / 5: cost 1.034007, train: 0.637000, val 0.602000, lr 4.750000e-04
starting iteration  1360
starting iteration  1370
starting iteration  1380
starting iteration  1390
starting iteration  1400
Finished epoch 1 / 5: cost 1.406098, train: 0.673000, val 0.614000, lr 4.750000e-04
starting iteration  1410
starting iteration  1420
starting iteration  1430
starting iteration  1440
starting iteration  1450
Finished epoch 1 / 5: cost 1.447156, train: 0.673000, val 0.626000, lr 4.750000e-04
starting iteration  1460
starting iteration  1470
starting iteration  1480
starting iteration  1490
starting iteration  1500
Finished epoch 1 / 5: cost 1.075828, train: 0.645000, val 0.624000, lr 4.750000e-04
starting iteration  1510
starting iteration  1520
starting iteration  1530
starting iteration  1540
starting iteration  1550
Finished epoch 1 / 5: cost 1.181671, train: 0.617000, val 0.590000, lr 4.750000e-04
starting iteration  1560
starting iteration  1570
starting iteration  1580
starting iteration  1590
starting iteration  1600
Finished epoch 1 / 5: cost 1.082535, train: 0.651000, val 0.608000, lr 4.750000e-04
starting iteration  1610
starting iteration  1620
starting iteration  1630
starting iteration  1640
starting iteration  1650
Finished epoch 1 / 5: cost 1.126464, train: 0.648000, val 0.614000, lr 4.750000e-04
starting iteration  1660
starting iteration  1670
starting iteration  1680
starting iteration  1690
starting iteration  1700
Finished epoch 1 / 5: cost 0.952312, train: 0.666000, val 0.640000, lr 4.750000e-04
starting iteration  1710
starting iteration  1720
starting iteration  1730
starting iteration  1740
starting iteration  1750
Finished epoch 1 / 5: cost 0.995959, train: 0.620000, val 0.610000, lr 4.750000e-04
starting iteration  1760
starting iteration  1770
starting iteration  1780
starting iteration  1790
starting iteration  1800
Finished epoch 1 / 5: cost 1.113103, train: 0.631000, val 0.611000, lr 4.750000e-04
starting iteration  1810
starting iteration  1820
starting iteration  1830
starting iteration  1840
starting iteration  1850
Finished epoch 1 / 5: cost 0.946668, train: 0.671000, val 0.626000, lr 4.750000e-04
starting iteration  1860
starting iteration  1870
starting iteration  1880
starting iteration  1890
starting iteration  1900
Finished epoch 1 / 5: cost 0.792992, train: 0.651000, val 0.613000, lr 4.750000e-04
starting iteration  1910
starting iteration  1920
starting iteration  1930
starting iteration  1940
starting iteration  1950
Finished epoch 1 / 5: cost 0.807186, train: 0.664000, val 0.640000, lr 4.750000e-04
Finished epoch 2 / 5: cost 1.028781, train: 0.675000, val 0.631000, lr 4.512500e-04
starting iteration  1960
starting iteration  1970
starting iteration  1980
starting iteration  1990
starting iteration  2000
Finished epoch 2 / 5: cost 0.827177, train: 0.663000, val 0.625000, lr 4.512500e-04
starting iteration  2010
starting iteration  2020
starting iteration  2030
starting iteration  2040
starting iteration  2050
Finished epoch 2 / 5: cost 0.813317, train: 0.677000, val 0.652000, lr 4.512500e-04
starting iteration  2060
starting iteration  2070
starting iteration  2080
starting iteration  2090
starting iteration  2100
Finished epoch 2 / 5: cost 0.757115, train: 0.682000, val 0.612000, lr 4.512500e-04
starting iteration  2110
starting iteration  2120
starting iteration  2130
starting iteration  2140
starting iteration  2150
Finished epoch 2 / 5: cost 1.125730, train: 0.677000, val 0.643000, lr 4.512500e-04
starting iteration  2160
starting iteration  2170
starting iteration  2180
starting iteration  2190
starting iteration  2200
Finished epoch 2 / 5: cost 1.118925, train: 0.667000, val 0.630000, lr 4.512500e-04
starting iteration  2210
starting iteration  2220
starting iteration  2230
starting iteration  2240
starting iteration  2250
Finished epoch 2 / 5: cost 0.898691, train: 0.686000, val 0.643000, lr 4.512500e-04
starting iteration  2260
starting iteration  2270
starting iteration  2280
starting iteration  2290
starting iteration  2300
Finished epoch 2 / 5: cost 0.743897, train: 0.675000, val 0.658000, lr 4.512500e-04
starting iteration  2310
starting iteration  2320
starting iteration  2330
starting iteration  2340
starting iteration  2350
Finished epoch 2 / 5: cost 0.842466, train: 0.693000, val 0.640000, lr 4.512500e-04
starting iteration  2360
starting iteration  2370
starting iteration  2380
starting iteration  2390
starting iteration  2400
Finished epoch 2 / 5: cost 1.073443, train: 0.686000, val 0.636000, lr 4.512500e-04
starting iteration  2410
starting iteration  2420
starting iteration  2430
starting iteration  2440
starting iteration  2450
Finished epoch 2 / 5: cost 0.847409, train: 0.697000, val 0.665000, lr 4.512500e-04
starting iteration  2460
starting iteration  2470
starting iteration  2480
starting iteration  2490
starting iteration  2500
Finished epoch 2 / 5: cost 0.795601, train: 0.695000, val 0.651000, lr 4.512500e-04
starting iteration  2510
starting iteration  2520
starting iteration  2530
starting iteration  2540
starting iteration  2550
Finished epoch 2 / 5: cost 0.881913, train: 0.695000, val 0.658000, lr 4.512500e-04
starting iteration  2560
starting iteration  2570
starting iteration  2580
starting iteration  2590
starting iteration  2600
Finished epoch 2 / 5: cost 0.732662, train: 0.694000, val 0.644000, lr 4.512500e-04
starting iteration  2610
starting iteration  2620
starting iteration  2630
starting iteration  2640
starting iteration  2650
Finished epoch 2 / 5: cost 0.950619, train: 0.702000, val 0.659000, lr 4.512500e-04
starting iteration  2660
starting iteration  2670
starting iteration  2680
starting iteration  2690
starting iteration  2700
Finished epoch 2 / 5: cost 1.062324, train: 0.717000, val 0.660000, lr 4.512500e-04
starting iteration  2710
starting iteration  2720
starting iteration  2730
starting iteration  2740
starting iteration  2750
Finished epoch 2 / 5: cost 0.696166, train: 0.724000, val 0.646000, lr 4.512500e-04
starting iteration  2760
starting iteration  2770
starting iteration  2780
starting iteration  2790
starting iteration  2800
Finished epoch 2 / 5: cost 0.771235, train: 0.704000, val 0.635000, lr 4.512500e-04
starting iteration  2810
starting iteration  2820
starting iteration  2830
starting iteration  2840
starting iteration  2850
Finished epoch 2 / 5: cost 0.696522, train: 0.723000, val 0.639000, lr 4.512500e-04
starting iteration  2860
starting iteration  2870
starting iteration  2880
starting iteration  2890
starting iteration  2900
Finished epoch 2 / 5: cost 0.869971, train: 0.697000, val 0.644000, lr 4.512500e-04
starting iteration  2910
starting iteration  2920
starting iteration  2930
Finished epoch 3 / 5: cost 0.810473, train: 0.685000, val 0.639000, lr 4.286875e-04
starting iteration  2940
starting iteration  2950
Finished epoch 3 / 5: cost 0.659876, train: 0.715000, val 0.661000, lr 4.286875e-04
starting iteration  2960
starting iteration  2970
starting iteration  2980
starting iteration  2990
starting iteration  3000
Finished epoch 3 / 5: cost 0.745437, train: 0.697000, val 0.650000, lr 4.286875e-04
starting iteration  3010
starting iteration  3020
starting iteration  3030
starting iteration  3040
starting iteration  3050
Finished epoch 3 / 5: cost 0.920292, train: 0.731000, val 0.675000, lr 4.286875e-04
starting iteration  3060
starting iteration  3070
starting iteration  3080
starting iteration  3090
starting iteration  3100
Finished epoch 3 / 5: cost 0.738827, train: 0.738000, val 0.689000, lr 4.286875e-04
starting iteration  3110
starting iteration  3120
starting iteration  3130
starting iteration  3140
starting iteration  3150
Finished epoch 3 / 5: cost 0.962501, train: 0.733000, val 0.638000, lr 4.286875e-04
starting iteration  3160
starting iteration  3170
starting iteration  3180
starting iteration  3190
starting iteration  3200
Finished epoch 3 / 5: cost 1.006064, train: 0.737000, val 0.657000, lr 4.286875e-04
starting iteration  3210
starting iteration  3220
starting iteration  3230
starting iteration  3240
starting iteration  3250
Finished epoch 3 / 5: cost 0.795208, train: 0.746000, val 0.654000, lr 4.286875e-04
starting iteration  3260
starting iteration  3270
starting iteration  3280
starting iteration  3290
starting iteration  3300
Finished epoch 3 / 5: cost 0.607481, train: 0.711000, val 0.669000, lr 4.286875e-04
starting iteration  3310
starting iteration  3320
starting iteration  3330
starting iteration  3340
starting iteration  3350
Finished epoch 3 / 5: cost 0.772631, train: 0.754000, val 0.678000, lr 4.286875e-04
starting iteration  3360
starting iteration  3370
starting iteration  3380
starting iteration  3390
starting iteration  3400
Finished epoch 3 / 5: cost 0.855887, train: 0.732000, val 0.659000, lr 4.286875e-04
starting iteration  3410
starting iteration  3420
starting iteration  3430
starting iteration  3440
starting iteration  3450
Finished epoch 3 / 5: cost 0.992182, train: 0.714000, val 0.654000, lr 4.286875e-04
starting iteration  3460
starting iteration  3470
starting iteration  3480
starting iteration  3490
starting iteration  3500
Finished epoch 3 / 5: cost 0.500986, train: 0.737000, val 0.667000, lr 4.286875e-04
starting iteration  3510
starting iteration  3520
starting iteration  3530
starting iteration  3540
starting iteration  3550
Finished epoch 3 / 5: cost 0.980792, train: 0.731000, val 0.667000, lr 4.286875e-04
starting iteration  3560
starting iteration  3570
starting iteration  3580
starting iteration  3590
starting iteration  3600
Finished epoch 3 / 5: cost 0.875331, train: 0.740000, val 0.679000, lr 4.286875e-04
starting iteration  3610
starting iteration  3620
starting iteration  3630
starting iteration  3640
starting iteration  3650
Finished epoch 3 / 5: cost 0.755349, train: 0.739000, val 0.687000, lr 4.286875e-04
starting iteration  3660
starting iteration  3670
starting iteration  3680
starting iteration  3690
starting iteration  3700
Finished epoch 3 / 5: cost 0.918076, train: 0.731000, val 0.676000, lr 4.286875e-04
starting iteration  3710
starting iteration  3720
starting iteration  3730
starting iteration  3740
starting iteration  3750
Finished epoch 3 / 5: cost 1.175447, train: 0.720000, val 0.655000, lr 4.286875e-04
starting iteration  3760
starting iteration  3770
starting iteration  3780
starting iteration  3790
starting iteration  3800
Finished epoch 3 / 5: cost 0.448092, train: 0.742000, val 0.686000, lr 4.286875e-04
starting iteration  3810
starting iteration  3820
starting iteration  3830
starting iteration  3840
starting iteration  3850
Finished epoch 3 / 5: cost 0.899348, train: 0.764000, val 0.687000, lr 4.286875e-04
starting iteration  3860
starting iteration  3870
starting iteration  3880
starting iteration  3890
starting iteration  3900
Finished epoch 3 / 5: cost 0.825107, train: 0.745000, val 0.659000, lr 4.286875e-04
starting iteration  3910
Finished epoch 4 / 5: cost 0.649122, train: 0.744000, val 0.676000, lr 4.072531e-04
starting iteration  3920
starting iteration  3930
starting iteration  3940
starting iteration  3950
Finished epoch 4 / 5: cost 0.422034, train: 0.746000, val 0.687000, lr 4.072531e-04
starting iteration  3960
starting iteration  3970
starting iteration  3980
starting iteration  3990
starting iteration  4000
Finished epoch 4 / 5: cost 0.839759, train: 0.780000, val 0.681000, lr 4.072531e-04
starting iteration  4010
starting iteration  4020
starting iteration  4030
starting iteration  4040
starting iteration  4050
Finished epoch 4 / 5: cost 0.959996, train: 0.757000, val 0.683000, lr 4.072531e-04
starting iteration  4060
starting iteration  4070
starting iteration  4080
starting iteration  4090
starting iteration  4100
Finished epoch 4 / 5: cost 0.370451, train: 0.750000, val 0.701000, lr 4.072531e-04
starting iteration  4110
starting iteration  4120
starting iteration  4130
starting iteration  4140
starting iteration  4150
Finished epoch 4 / 5: cost 0.797105, train: 0.733000, val 0.677000, lr 4.072531e-04
starting iteration  4160
starting iteration  4170
starting iteration  4180
starting iteration  4190
starting iteration  4200
Finished epoch 4 / 5: cost 0.855882, train: 0.765000, val 0.671000, lr 4.072531e-04
starting iteration  4210
starting iteration  4220
starting iteration  4230
starting iteration  4240
starting iteration  4250
Finished epoch 4 / 5: cost 0.828370, train: 0.755000, val 0.673000, lr 4.072531e-04
starting iteration  4260
starting iteration  4270
starting iteration  4280
starting iteration  4290
starting iteration  4300
Finished epoch 4 / 5: cost 0.887464, train: 0.752000, val 0.686000, lr 4.072531e-04
starting iteration  4310
starting iteration  4320
starting iteration  4330
starting iteration  4340
starting iteration  4350
Finished epoch 4 / 5: cost 1.084007, train: 0.733000, val 0.684000, lr 4.072531e-04
starting iteration  4360
starting iteration  4370
starting iteration  4380
starting iteration  4390
starting iteration  4400
Finished epoch 4 / 5: cost 0.484843, train: 0.762000, val 0.672000, lr 4.072531e-04
starting iteration  4410
starting iteration  4420
starting iteration  4430
starting iteration  4440
starting iteration  4450
Finished epoch 4 / 5: cost 0.556473, train: 0.761000, val 0.687000, lr 4.072531e-04
starting iteration  4460
starting iteration  4470
starting iteration  4480
starting iteration  4490
starting iteration  4500
Finished epoch 4 / 5: cost 0.736674, train: 0.779000, val 0.691000, lr 4.072531e-04
starting iteration  4510
starting iteration  4520
starting iteration  4530
starting iteration  4540
starting iteration  4550
Finished epoch 4 / 5: cost 0.638501, train: 0.774000, val 0.697000, lr 4.072531e-04
starting iteration  4560
starting iteration  4570
starting iteration  4580
starting iteration  4590
starting iteration  4600
Finished epoch 4 / 5: cost 0.746381, train: 0.784000, val 0.694000, lr 4.072531e-04
starting iteration  4610
starting iteration  4620
starting iteration  4630
starting iteration  4640
starting iteration  4650
Finished epoch 4 / 5: cost 0.721345, train: 0.767000, val 0.676000, lr 4.072531e-04
starting iteration  4660
starting iteration  4670
starting iteration  4680
starting iteration  4690
starting iteration  4700
Finished epoch 4 / 5: cost 0.868371, train: 0.726000, val 0.680000, lr 4.072531e-04
starting iteration  4710
starting iteration  4720
starting iteration  4730
starting iteration  4740
starting iteration  4750
Finished epoch 4 / 5: cost 0.392138, train: 0.764000, val 0.686000, lr 4.072531e-04
starting iteration  4760
starting iteration  4770
starting iteration  4780
starting iteration  4790
starting iteration  4800
Finished epoch 4 / 5: cost 0.438474, train: 0.790000, val 0.694000, lr 4.072531e-04
starting iteration  4810
starting iteration  4820
starting iteration  4830
starting iteration  4840
starting iteration  4850
Finished epoch 4 / 5: cost 0.738330, train: 0.760000, val 0.691000, lr 4.072531e-04
starting iteration  4860
starting iteration  4870
starting iteration  4880
starting iteration  4890
Finished epoch 5 / 5: cost 0.645431, train: 0.762000, val 0.683000, lr 3.868905e-04
finished optimization. best validation accuracy: 0.701000

Let us perform some finetuning on the model


In [ ]:
base = best_model.copy()
model = base.copy()
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, my_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.00001, batch_size=50, num_epochs=3,
          acc_frequency=50, verbose=True)

In [44]:
from cs231n.vis_utils import visualize_grid

grid = visualize_grid(best_model['W0'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))


Out[44]:
<matplotlib.image.AxesImage at 0x7f9a2afde550>

In [45]:
mask = np.random.choice(len(X_train), 1000)
pred_train = my_convnet(X_train[mask], model).argmax(axis=1)
pred_val = my_convnet(X_val, model).argmax(axis=1)
pred_test = my_convnet(X_test, model).argmax(axis=1)

acc_train = np.mean(y_train[mask]==pred_train)
acc_val = np.mean(y_val==pred_val)
acc_test = np.mean(y_test==pred_test)
print acc_train, acc_val, acc_test


0.964 0.774 0.755

Due to low computational power of my computer, I didn't get a chance to test dropout and different relu techniques(tough it is no excuse, I should've started earlier. but I believe it would be a lot better if we had ipython notebook installed on ineks) I think especially dropout would perform a lot better since we have very high training accuracy compared to validation and test acc. which suggest that we have some overfitting and which can be prevented by a process like dropout which is stochastic.