Train a ConvNet!

We now have a generic solver and a bunch of modularized layers. It's time to put it all together, and train a ConvNet to recognize the classes in CIFAR-10. In this notebook we will walk you through training a simple two-layer ConvNet and then set you free to build the best net that you can to perform well on CIFAR-10.

Open up the file cs231n/classifiers/convnet.py; you will see that the two_layer_convnet function computes the loss and gradients for a two-layer ConvNet. Note that this function uses the "sandwich" layers defined in cs231n/layer_utils.py.


In [3]:
# As usual, a bit of setup

import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifier_trainer import ClassifierTrainer
from cs231n.gradient_check import eval_numerical_gradient
from cs231n.classifiers.convnet import *

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-18, np.abs(x) + np.abs(y))))

In [15]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    
    # Transpose so that channels come first
    X_train = X_train.transpose(0, 3, 1, 2).copy()
    X_val = X_val.transpose(0, 3, 1, 2).copy()
    x_test = X_test.transpose(0, 3, 1, 2).copy()

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 3, 32, 32)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3, 32, 32)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

Sanity check loss

After you build a new network, one of the first things you should do is sanity check the loss. When we use the softmax loss, we expect the loss for random weights (and no regularization) to be about log(C) for C classes. When we add regularization this should go up.


In [3]:
model = init_two_layer_convnet()

X = np.random.randn(100, 3, 32, 32)
y = np.random.randint(10, size=100)

loss, _ = two_layer_convnet(X, model, y, reg=0)

# Sanity check: Loss should be about log(10) = 2.3026
print 'Sanity check loss (no regularization): ', loss

# Sanity check: Loss should go up when you add regularization
loss, _ = two_layer_convnet(X, model, y, reg=1)
print 'Sanity check loss (with regularization): ', loss


Sanity check loss (no regularization):  2.30261475792
Sanity check loss (with regularization):  2.34427064636

Gradient check

After the loss looks reasonable, you should always use numeric gradient checking to make sure that your backward pass is correct. When you use numeric gradient checking you should use a small amount of artifical data and a small number of neurons at each layer.


In [4]:
num_inputs = 2
input_shape = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_shape)
y = np.random.randint(num_classes, size=num_inputs)

model = init_two_layer_convnet(num_filters=3, filter_size=3, input_shape=input_shape)
loss, grads = two_layer_convnet(X, model, y)
for param_name in sorted(grads):
    f = lambda _: two_layer_convnet(X, model, y)[0]
    param_grad_num = eval_numerical_gradient(f, model[param_name], verbose=False, h=1e-6)
    e = rel_error(param_grad_num, grads[param_name])
    print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))


W1 max relative error: 3.364580e-07
W2 max relative error: 2.040161e-06
b1 max relative error: 1.037150e-08
b2 max relative error: 1.159503e-09

Overfit small data

A nice trick is to train your model with just a few training samples. You should be able to overfit small datasets, which will result in very high training accuracy and comparatively low validation accuracy.


In [5]:
# Use a two-layer ConvNet to overfit 50 training examples.

model = init_two_layer_convnet()
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train[:50], y_train[:50], X_val, y_val, model, two_layer_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=10, num_epochs=10,
          verbose=True)


starting iteration  0
Finished epoch 0 / 10: cost 2.312432, train: 0.060000, val 0.101000, lr 1.000000e-04
Finished epoch 1 / 10: cost 2.256983, train: 0.280000, val 0.124000, lr 9.500000e-05
Finished epoch 2 / 10: cost 2.172748, train: 0.260000, val 0.115000, lr 9.025000e-05
starting iteration  10
Finished epoch 3 / 10: cost 2.613127, train: 0.500000, val 0.140000, lr 8.573750e-05
Finished epoch 4 / 10: cost 1.638125, train: 0.460000, val 0.153000, lr 8.145062e-05
starting iteration  20
Finished epoch 5 / 10: cost 1.418967, train: 0.600000, val 0.194000, lr 7.737809e-05
Finished epoch 6 / 10: cost 1.179016, train: 0.700000, val 0.198000, lr 7.350919e-05
starting iteration  30
Finished epoch 7 / 10: cost 0.796459, train: 0.840000, val 0.194000, lr 6.983373e-05
Finished epoch 8 / 10: cost 0.620655, train: 0.920000, val 0.181000, lr 6.634204e-05
starting iteration  40
Finished epoch 9 / 10: cost 0.201750, train: 0.920000, val 0.195000, lr 6.302494e-05
Finished epoch 10 / 10: cost 0.057230, train: 0.960000, val 0.178000, lr 5.987369e-05
finished optimization. best validation accuracy: 0.198000

Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:


In [6]:
plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.xlabel('iteration')
plt.ylabel('loss')

plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()


Train the net

Once the above works, training the net is the next thing to try. You can set the acc_frequency parameter to change the frequency at which the training and validation set accuracies are tested. If your parameters are set properly, you should see the training and validation accuracy start to improve within a hundred iterations, and you should be able to train a reasonable model with just one epoch.

Using the parameters below you should be able to get around 50% accuracy on the validation set.


In [7]:
model = init_two_layer_convnet(filter_size=7)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, two_layer_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=50, num_epochs=1,
          acc_frequency=50, verbose=True)


starting iteration  0
Finished epoch 0 / 1: cost 2.307439, train: 0.093000, val 0.081000, lr 1.000000e-04
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 1: cost 1.903418, train: 0.359000, val 0.329000, lr 1.000000e-04
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 1: cost 1.608262, train: 0.373000, val 0.382000, lr 1.000000e-04
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 1: cost 2.083396, train: 0.396000, val 0.400000, lr 1.000000e-04
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 1: cost 1.561869, train: 0.378000, val 0.415000, lr 1.000000e-04
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 1: cost 1.513249, train: 0.383000, val 0.377000, lr 1.000000e-04
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 1: cost 1.542809, train: 0.472000, val 0.450000, lr 1.000000e-04
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 1: cost 1.762782, train: 0.467000, val 0.469000, lr 1.000000e-04
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 1: cost 1.781104, train: 0.413000, val 0.401000, lr 1.000000e-04
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 1: cost 1.882903, train: 0.447000, val 0.461000, lr 1.000000e-04
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 1: cost 1.863612, train: 0.463000, val 0.441000, lr 1.000000e-04
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 1: cost 1.276167, train: 0.432000, val 0.404000, lr 1.000000e-04
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 1: cost 1.431040, train: 0.458000, val 0.455000, lr 1.000000e-04
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 1: cost 1.630774, train: 0.462000, val 0.459000, lr 1.000000e-04
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 1: cost 1.634135, train: 0.493000, val 0.507000, lr 1.000000e-04
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 1: cost 1.962658, train: 0.466000, val 0.448000, lr 1.000000e-04
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 1: cost 1.201093, train: 0.505000, val 0.483000, lr 1.000000e-04
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 1: cost 1.443109, train: 0.467000, val 0.488000, lr 1.000000e-04
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 1: cost 2.091703, train: 0.482000, val 0.443000, lr 1.000000e-04
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 1: cost 1.654459, train: 0.512000, val 0.502000, lr 1.000000e-04
starting iteration  960
starting iteration  970
Finished epoch 1 / 1: cost 1.849139, train: 0.413000, val 0.424000, lr 9.500000e-05
finished optimization. best validation accuracy: 0.507000

Visualize weights

We can visualize the convolutional weights from the first layer. If everything worked properly, these will usually be edges and blobs of various colors and orientations.


In [8]:
from cs231n.vis_utils import visualize_grid

grid = visualize_grid(best_model['W1'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))


Out[8]:
<matplotlib.image.AxesImage at 0x110087ad0>

Experiment!

Experiment and try to get the best performance that you can on CIFAR-10 using a ConvNet. Here are some ideas to get you started:

Things you should try:

  • Filter size: Above we used 7x7; this makes pretty pictures but smaller filters may be more efficient
  • Number of filters: Above we used 32 filters. Do more or fewer do better?
  • Network depth: The network above has two layers of trainable parameters. Can you do better with a deeper network? You can implement alternative architectures in the file cs231n/classifiers/convnet.py. Some good architectures to try include:
    • [conv-relu-pool]xN - conv - relu - [affine]xM - [softmax or SVM]
    • [conv-relu-pool]XN - [affine]XM - [softmax or SVM]
    • [conv-relu-conv-relu-pool]xN - [affine]xM - [softmax or SVM]

Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

  • If the parameters are working well, you should see improvement within a few hundred iterations
  • Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
  • Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

Going above and beyond

If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.

  • Alternative update steps: For the assignment we implemented SGD+momentum and RMSprop; you could try alternatives like AdaGrad or AdaDelta.
  • Other forms of regularization such as L1 or Dropout
  • Alternative activation functions such as leaky ReLU or maxout
  • Model ensembles
  • Data augmentation

What we expect

At the very least, you should be able to train a ConvNet that gets at least 65% accuracy on the validation set. This is just a lower bound - if you are careful it should be possible to get accuracies much higher than that! Extra credit points will be awarded for particularly high-scoring models or unique approaches.

You should use the space below to experiment and train your network. The final cell in this notebook should contain the training, validation, and test set accuracies for your final trained network. In this notebook you should also write an explanation of what you did, any additional features that you implemented, and any visualizations or graphs that you make in the process of training and evaluating your network.

Have fun and happy training!

Model

Experimentation

After manual tuning to get the rough number range for hyperparams, I ran cross-validations for low iterations on corn.

I base training of incrementally trained models to make for better initialization at each stage and be able to adjust parameters at different stages in the training. This allowed for much quicker progress...

Format

Finally I passed the 65% threshold with a model of the format

* [conv-relu-pool]-[conv-relu-pool]-[affine]-[relu]-[affine]-[svm]

I ran SVM and Softmax models and with all settings (tuned on either) SVM outperformed Softmax in this formats.


In [8]:
# Some checks to make sure my model is reasonable and
# produces proper learning plots on the first few epochs

# Loss check
model = init_supercool_convnet(weight_scale = 5e-2)
X = np.random.randn(100, 3, 32, 32)
y = np.random.randint(10, size=100)
loss, _ = supercool_convnet(X, model, y, reg=0)
# Sanity check: Loss should be about log(10) = 2.3026
print 'Sanity check loss (no regularization): ', loss
# Sanity check: Loss should go up when you add regularization
loss, _ = supercool_convnet(X, model, y, reg=1)
print 'Sanity check loss (with regularization): ', loss

# Gradient check
num_inputs = 2
input_shape = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_shape)
y = np.random.randint(num_classes, size=num_inputs)
model = init_supercool_convnet(num_filters=3, filter_size=3, input_shape=input_shape)
loss, grads = supercool_convnet(X, model, y)
for param_name in sorted(grads):
    f = lambda _: supercool_convnet(X, model, y)[0]
    param_grad_num = eval_numerical_gradient(f, model[param_name], verbose=False, h=1e-6)
    e = rel_error(param_grad_num, grads[param_name])
    print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))

# Make sure we can overfit...
model = init_supercool_convnet(weight_scale=5e-2, bias_scale=0, filter_size=3) # weight_scale=5e-2; tune to 3e-2 to make this work
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train[:50], y_train[:50], X_val, y_val, model, supercool_convnet,
          reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=10, num_epochs=10, # change to 20 epochs
          verbose=True) # batch size 40-100

model = init_supercool_convnet(weight_scale=3e-2, bias_scale=0, filter_size=3)

# with open('best_model.pkl', 'rb') as f:
#     model = cPickle.load(f)

trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
          X_train, y_train, X_val, y_val, model, supercool_convnet,
          reg=0.5, momentum=0.9, learning_rate=5e-5, batch_size=50, num_epochs=1, # change to 20 epochs
          verbose=True, acc_frequency=50) # batch size 40-100

plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.xlabel('iteration')
plt.ylabel('loss')

plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()


Sanity check loss (no regularization):  9.07812640504
Sanity check loss (with regularization):  1338.74079638
W1 max relative error: 1.175510e-06
W2 max relative error: 4.023119e-07
W3 max relative error: 6.078274e-03
W4 max relative error: 8.864741e-03
b1 max relative error: 1.756967e-08
b2 max relative error: 3.383135e-09
b3 max relative error: 4.025342e-08
b4 max relative error: 5.139782e-10
starting iteration  0
Finished epoch 0 / 10: cost 112.900922, train: 0.140000, val 0.112000, lr 1.000000e-04
Finished epoch 1 / 10: cost 155.226098, train: 0.120000, val 0.105000, lr 9.500000e-05
Finished epoch 2 / 10: cost 50.397387, train: 0.180000, val 0.103000, lr 9.025000e-05
starting iteration  10
Finished epoch 3 / 10: cost 10.530897, train: 0.300000, val 0.105000, lr 8.573750e-05
Finished epoch 4 / 10: cost 8.600551, train: 0.260000, val 0.105000, lr 8.145062e-05
starting iteration  20
Finished epoch 5 / 10: cost 7.472381, train: 0.200000, val 0.104000, lr 7.737809e-05
Finished epoch 6 / 10: cost 8.334688, train: 0.240000, val 0.100000, lr 7.350919e-05
starting iteration  30
Finished epoch 7 / 10: cost 7.168265, train: 0.200000, val 0.099000, lr 6.983373e-05
Finished epoch 8 / 10: cost 8.247624, train: 0.200000, val 0.098000, lr 6.634204e-05
starting iteration  40
Finished epoch 9 / 10: cost 8.138461, train: 0.240000, val 0.106000, lr 6.302494e-05
Finished epoch 10 / 10: cost 7.732231, train: 0.260000, val 0.095000, lr 5.987369e-05
finished optimization. best validation accuracy: 0.112000
starting iteration  0
Finished epoch 0 / 1: cost 256.947813, train: 0.100000, val 0.126000, lr 5.000000e-05
starting iteration  10
starting iteration  20
starting iteration  30
starting iteration  40
starting iteration  50
Finished epoch 0 / 1: cost 239.993287, train: 0.279000, val 0.256000, lr 5.000000e-05
starting iteration  60
starting iteration  70
starting iteration  80
starting iteration  90
starting iteration  100
Finished epoch 0 / 1: cost 232.275271, train: 0.311000, val 0.299000, lr 5.000000e-05
starting iteration  110
starting iteration  120
starting iteration  130
starting iteration  140
starting iteration  150
Finished epoch 0 / 1: cost 227.697579, train: 0.347000, val 0.355000, lr 5.000000e-05
starting iteration  160
starting iteration  170
starting iteration  180
starting iteration  190
starting iteration  200
Finished epoch 0 / 1: cost 221.247640, train: 0.373000, val 0.376000, lr 5.000000e-05
starting iteration  210
starting iteration  220
starting iteration  230
starting iteration  240
starting iteration  250
Finished epoch 0 / 1: cost 217.452615, train: 0.401000, val 0.389000, lr 5.000000e-05
starting iteration  260
starting iteration  270
starting iteration  280
starting iteration  290
starting iteration  300
Finished epoch 0 / 1: cost 210.564200, train: 0.399000, val 0.403000, lr 5.000000e-05
starting iteration  310
starting iteration  320
starting iteration  330
starting iteration  340
starting iteration  350
Finished epoch 0 / 1: cost 205.200113, train: 0.420000, val 0.413000, lr 5.000000e-05
starting iteration  360
starting iteration  370
starting iteration  380
starting iteration  390
starting iteration  400
Finished epoch 0 / 1: cost 201.261938, train: 0.438000, val 0.435000, lr 5.000000e-05
starting iteration  410
starting iteration  420
starting iteration  430
starting iteration  440
starting iteration  450
Finished epoch 0 / 1: cost 194.731383, train: 0.461000, val 0.440000, lr 5.000000e-05
starting iteration  460
starting iteration  470
starting iteration  480
starting iteration  490
starting iteration  500
Finished epoch 0 / 1: cost 190.211307, train: 0.458000, val 0.456000, lr 5.000000e-05
starting iteration  510
starting iteration  520
starting iteration  530
starting iteration  540
starting iteration  550
Finished epoch 0 / 1: cost 185.626986, train: 0.465000, val 0.455000, lr 5.000000e-05
starting iteration  560
starting iteration  570
starting iteration  580
starting iteration  590
starting iteration  600
Finished epoch 0 / 1: cost 180.702342, train: 0.441000, val 0.447000, lr 5.000000e-05
starting iteration  610
starting iteration  620
starting iteration  630
starting iteration  640
starting iteration  650
Finished epoch 0 / 1: cost 176.395718, train: 0.492000, val 0.462000, lr 5.000000e-05
starting iteration  660
starting iteration  670
starting iteration  680
starting iteration  690
starting iteration  700
Finished epoch 0 / 1: cost 172.533069, train: 0.474000, val 0.464000, lr 5.000000e-05
starting iteration  710
starting iteration  720
starting iteration  730
starting iteration  740
starting iteration  750
Finished epoch 0 / 1: cost 167.909728, train: 0.485000, val 0.485000, lr 5.000000e-05
starting iteration  760
starting iteration  770
starting iteration  780
starting iteration  790
starting iteration  800
Finished epoch 0 / 1: cost 164.084062, train: 0.497000, val 0.486000, lr 5.000000e-05
starting iteration  810
starting iteration  820
starting iteration  830
starting iteration  840
starting iteration  850
Finished epoch 0 / 1: cost 160.133955, train: 0.454000, val 0.514000, lr 5.000000e-05
starting iteration  860
starting iteration  870
starting iteration  880
starting iteration  890
starting iteration  900
Finished epoch 0 / 1: cost 155.581273, train: 0.504000, val 0.494000, lr 5.000000e-05
starting iteration  910
starting iteration  920
starting iteration  930
starting iteration  940
starting iteration  950
Finished epoch 0 / 1: cost 152.044799, train: 0.497000, val 0.506000, lr 5.000000e-05
starting iteration  960
starting iteration  970
Finished epoch 1 / 1: cost 149.422164, train: 0.482000, val 0.507000, lr 4.750000e-05
finished optimization. best validation accuracy: 0.514000

Learnings

Noting that there is almost no gap (though pretty good rates), I trained incremental models with decreasing regularization rates beginning with 0.04.

Final model

Clearly, there is a lot more room for tuning, time permitting. This model doesn't produce a super pretty gap, but it passed the required threshold.

0.684000 accuracy on validation set

0.706000 accuracy on training set

0.667000 accuracy on testing set


In [37]:
import cPickle

with open('best_model_2.pkl', 'rb') as f:
    best_model = cPickle.load(f)

# print X_val.shape
# print X_test.shape

scores_test = supercool_convnet(X_test.transpose(0, 3, 1, 2) , best_model)
print 'Test accuracy: ', np.mean(np.argmax(scores_test, axis=1) == y_test)


Test accuracy:  0.667

In [ ]: