In [1]:
# Run some setup code
import numpy as np
import matplotlib.pyplot as plt
# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
# bool var. to let program show debug info.
debug = True
show_img = True
We will use the class TwoLayerNet in the file nnet.py to represent instances of our network. The network parameters are stored in the instance variable self.params where keys are string parameter names and values are numpy arrays.
In [2]:
import cifar10
# Load the raw CIFAR-10 data
X, y, X_test, y_test = cifar10.load('../cifar-10-batches-py', debug = debug)
m = 49000
m_val = 1000
m_test = 1000
m_dev = 500
X, y, X_test, y_test, X_dev, y_dev, X_val, y_val = cifar10.split_vec(X, y, X_test, y_test, m, m_test, m_val, m_dev, debug = debug, show_img = show_img)
In [34]:
from nnet import NNet
n0 = X_dev.shape[1]
n1 = 20
n2 = 10
# Forward pass: compute scores
model = NNet(n0, n1, n2)
model.train_check(X_dev, y_dev, lamda = 3.3)
In [27]:
n0 = X_dev.shape[1]
n1 = 20
n2 = 10
alpha, lamda, T, B, rho = 1e-4, 0.5, 1000, 200, 0.95
hpara = (alpha, lamda, T, B, rho)
# Forward pass: compute scores
model = NNet(n0, n1, n2)
model.train(X, y, X_val, y_val, hpara, debug, show_img)
# Predict on the val. set
print 'val. acc.:', np.mean(model.predict(X_val) == y_val)
With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good.
One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.
Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.
In [33]:
model.visualize_W()
What's wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.
Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.
Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.
Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).
In [42]:
best_model = None
best_acc = -1
# TODO: Tune hyperparameters using the validation set. Store your best trained
# model in best_net.
#
# To help debug your network, it may help to use visualizations similar to the
# ones we used above; these visualizations will have significant qualitative
# differences from the ones we saw above for the poorly tuned network.
#
# Tweaking hyperparameters by hand can be fun, but you might find it useful to
# write code to sweep through possible combinations of hyperparameters
# automatically like we did on the previous exercises.
n0 = X_dev.shape[1]
n1 = 200
n2 = 10
alpha, lamda, T, B, rho = 2e-3, 3e-2, 10000, 200, 0.95
for alpha in [2e-3]:
hpara = (alpha, lamda, T, B, rho)
print hpara
model = NNet(n0, n1, n2)
model.train(X, y, X_val, y_val, hpara, debug, show_img)
# Predict on the val. set
val_acc = np.mean(model.predict(X_val) == y_val)
print 'val. acc.:', val_acc
print '\n'
if val_acc > best_acc:
best_acc = val_acc
best_model = model
In [43]:
# Visualize the weights of the best model
best_model.visualize_W()
In [45]:
print 'Test accuracy: ', np.mean(best_model.predict(X_test) == y_test)
In [ ]: