We now have a generic solver and a bunch of modularized layers. It's time to put it all together, and train a ConvNet to recognize the classes in CIFAR-10. In this notebook we will walk you through training a simple two-layer ConvNet and then set you free to build the best net that you can to perform well on CIFAR-10.
Open up the file cs231n/classifiers/convnet.py
; you will see that the two_layer_convnet
function computes the loss and gradients for a two-layer ConvNet. Note that this function uses the "sandwich" layers defined in cs231n/layer_utils.py
.
In [2]:
# As usual, a bit of setup
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifier_trainer import ClassifierTrainer
from cs231n.gradient_check import eval_numerical_gradient
from cs231n.classifiers.convnet import *
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
""" returns relative error """
return np.max(np.abs(x - y) / (np.maximum(1e-18, np.abs(x) + np.abs(y))))
In [40]:
from cs231n.data_utils import load_CIFAR10
# Modify load_CIFAR10 and the following function to load less data if you have memory issues.
# Load batches 1, 2 and 3; and call the function as follows:
#def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the two-layer neural net classifier. These are the same steps as
we used for the SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# Subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
# Transpose so that channels come first
X_train = X_train.transpose(0, 3, 1, 2).copy()
X_val = X_val.transpose(0, 3, 1, 2).copy()
X_test = X_test.transpose(0, 3, 1, 2).copy()
return X_train, y_train, X_val, y_val, X_test, y_test
# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
In [4]:
model = init_two_layer_convnet()
X = np.random.randn(100, 3, 32, 32)
y = np.random.randint(10, size=100)
loss, _ = two_layer_convnet(X, model, y, reg=0)
# Sanity check: Loss should be about log(10) = 2.3026
print 'Sanity check loss (no regularization): ', loss
# Sanity check: Loss should go up when you add regularization
loss, _ = two_layer_convnet(X, model, y, reg=1)
print 'Sanity check loss (with regularization): ', loss
In [5]:
num_inputs = 2
input_shape = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_shape)
y = np.random.randint(num_classes, size=num_inputs)
model = init_two_layer_convnet(num_filters=3, filter_size=3, input_shape=input_shape)
loss, grads = two_layer_convnet(X, model, y)
for param_name in sorted(grads):
f = lambda _: two_layer_convnet(X, model, y)[0]
param_grad_num = eval_numerical_gradient(f, model[param_name], verbose=False, h=1e-6)
e = rel_error(param_grad_num, grads[param_name])
print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))
In [6]:
# Use a two-layer ConvNet to overfit 50 training examples.
model = init_two_layer_convnet()
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train[:50], y_train[:50], X_val, y_val, model, two_layer_convnet,
reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=10, num_epochs=10,
verbose=True)
Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:
In [7]:
plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.xlabel('iteration')
plt.ylabel('loss')
plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()
Once the above works, training the net is the next thing to try. You can set the acc_frequency
parameter to change the frequency at which the training and validation set accuracies are tested. If your parameters are set properly, you should see the training and validation accuracy start to improve within a hundred iterations, and you should be able to train a reasonable model with just one epoch.
Using the parameters below you should be able to get around 50% accuracy on the validation set.
In [9]:
model = init_two_layer_convnet(filter_size=7)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train, y_train, X_val, y_val, model, two_layer_convnet,
reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=50, num_epochs=1,
acc_frequency=50, verbose=True)
In [10]:
from cs231n.vis_utils import visualize_grid
grid = visualize_grid(best_model['W1'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))
Out[10]:
Experiment and try to get the best performance that you can on CIFAR-10 using a ConvNet. Here are some ideas to get you started:
cs231n/classifiers/convnet.py
. Some good architectures to try include:For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.
At the very least, you should be able to train a ConvNet that gets at least 65% accuracy on the validation set. This is just a lower bound - if you are careful it should be possible to get accuracies much higher than that! Extra credit points will be awarded for particularly high-scoring models or unique approaches.
You should use the space below to experiment and train your network. The final cell in this notebook should contain the training, validation, and test set accuracies for your final trained network. In this notebook you should also write an explanation of what you did, any additional features that you implemented, and any visualizations or graphs that you make in the process of training and evaluating your network.
Have fun and happy training!
Let's first implement a convnet for [conv-relu-pool]XN - [affine]XM - [softmax or SVM], in which we can spesicify N and M, and test it with N=M=1 to make sure that it's working, it should produce a similar result to the one provided above.
In [28]:
model = init_my_convnet(filter_size=5)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train, y_train, X_val, y_val, model, my_convnet,
reg=0.001, momentum=0.9, learning_rate=0.0001, batch_size=50, num_epochs=1,
acc_frequency=50, verbose=True)
Now let's try it with 2 conv_relu_pooling layers followed by 1 affine layer and softmax as loss.
In [35]:
model = init_my_convnet(filter_size=5, Naff=1, Ncrp=2)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train, y_train, X_val, y_val, model, my_convnet,
reg=0.001, momentum=0.9, learning_rate=0.0005, batch_size=50, num_epochs=1,
acc_frequency=50, verbose=True)
It seems to be working with those parameters let us train it more to see if we can get better accuracy on validation set.
In [41]:
model = init_my_convnet(filter_size=5, Naff=1, Ncrp=2)
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train, y_train, X_val, y_val, model, my_convnet,
reg=0.001, momentum=0.9, learning_rate=0.0005, batch_size=50, num_epochs=5,
acc_frequency=50, verbose=True)
Let us perform some finetuning on the model
In [ ]:
base = best_model.copy()
model = base.copy()
trainer = ClassifierTrainer()
best_model, loss_history, train_acc_history, val_acc_history = trainer.train(
X_train, y_train, X_val, y_val, model, my_convnet,
reg=0.001, momentum=0.9, learning_rate=0.00001, batch_size=50, num_epochs=3,
acc_frequency=50, verbose=True)
In [44]:
from cs231n.vis_utils import visualize_grid
grid = visualize_grid(best_model['W0'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))
Out[44]:
In [45]:
mask = np.random.choice(len(X_train), 1000)
pred_train = my_convnet(X_train[mask], model).argmax(axis=1)
pred_val = my_convnet(X_val, model).argmax(axis=1)
pred_test = my_convnet(X_test, model).argmax(axis=1)
acc_train = np.mean(y_train[mask]==pred_train)
acc_val = np.mean(y_val==pred_val)
acc_test = np.mean(y_test==pred_test)
print acc_train, acc_val, acc_test
Due to low computational power of my computer, I didn't get a chance to test dropout and different relu techniques(tough it is no excuse, I should've started earlier. but I believe it would be a lot better if we had ipython notebook installed on ineks) I think especially dropout would perform a lot better since we have very high training accuracy compared to validation and test acc. which suggest that we have some overfitting and which can be prevented by a process like dropout which is stochastic.