Implementing a Neural Network

In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.


In [1]:
# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from cs231n.classifiers.neural_net import TwoLayerNet

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

We will use the class TwoLayerNet in the file cs231n/classifiers/neural_net.py to represent instances of our network. The network parameters are stored in the instance variable self.params where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.


In [2]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
  np.random.seed(0)
  return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
  np.random.seed(1)
  X = 10 * np.random.randn(num_inputs, input_size)
  y = np.array([0, 1, 2, 2, 1])
  return X, y

net = init_toy_model()
X, y = init_toy_data()

Forward pass: compute scores

Open the file cs231n/classifiers/neural_net.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.


In [3]:
scores = net.loss(X)
print 'Your scores:'
print scores
print
print 'correct scores:'
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print correct_scores
print

# The difference should be very small. We get < 1e-7
print 'Difference between your scores and correct scores:'
print np.sum(np.abs(scores - correct_scores))


Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.68027207459e-08

Forward pass: compute loss

In the same function, implement the second part that computes the data and regularizaion loss.


In [4]:
loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print 'Difference between your loss and correct loss:'
print np.sum(np.abs(loss - correct_loss))


Difference between your loss and correct loss:
1.79412040779e-13

Backward pass

Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1, b1, W2, and b2. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:


In [5]:
from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.1)

# these should all be less than 1e-8 or so
for param_name in grads:
  f = lambda W: net.loss(X, y, reg=0.1)[0]
  param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
  print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))


W1 max relative error: 3.561318e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 4.447625e-11
b1 max relative error: 2.738421e-09

Train the network

To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.


In [6]:
net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=1e-5,
            num_iters=100, verbose=False)

print 'Final training loss: ', stats['loss_history'][-1]

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Final training loss:  0.0171496079387

Load the data

Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it's time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a real dataset.


In [7]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

Train a network

To train our network we will use SGD with momentum. In addition, we will adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the learning rate by multiplying it by a decay rate.


In [8]:
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.5, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc


iteration 0 / 1000: loss 2.302954
iteration 100 / 1000: loss 2.302550
iteration 200 / 1000: loss 2.297648
iteration 300 / 1000: loss 2.259602
iteration 400 / 1000: loss 2.204170
iteration 500 / 1000: loss 2.118565
iteration 600 / 1000: loss 2.051535
iteration 700 / 1000: loss 1.988466
iteration 800 / 1000: loss 2.006591
iteration 900 / 1000: loss 1.951473
Validation accuracy:  0.287

Debug the training

With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good.

One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.


In [9]:
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()



In [10]:
from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
  W1 = net.params['W1']
  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
  plt.gca().axis('off')
  plt.show()

show_net_weights(net)


Tune your hyperparameters

What's wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).


In [11]:
best_net = None # store the best model into this 

results = {}
best_val = -1

### try different hyperparameters bellow, but very time consuming
learning_rates = np.logspace(-7, -1, 7)
regularization_strengths = np.logspace(-3, 4, 8)
hidden_size = np.array([50, 100, 150, 200, 300])
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net.                                                            #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
#################################################################################
# Identical to visualization code above
def visualize(stats):
    plt.subplot(2, 1, 1)
    plt.plot(stats['loss_history'])
    plt.title('Loss history')
    plt.xlabel('Iteration')
    plt.ylabel('Loss')

    plt.subplot(2, 1, 2)
    plt.plot(stats['train_acc_history'], label='train')
    plt.plot(stats['val_acc_history'], label='val')
    plt.title('Classification accuracy history')
    plt.xlabel('Epoch')
    plt.ylabel('Clasification accuracy')
    plt.show()


net = TwoLayerNet(input_size, 500, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=15000, batch_size=200,
            learning_rate=1e-3, learning_rate_decay=0.91,
            reg=0.028, verbose=True)

best_net = net

# Best accuracy on the validation set
stats['best_val_acc'] = max(stats['val_acc_history'])

visualize(stats)
    
test_acc = np.mean(net.predict(X_test) == y_test)
print 'Validation accuracy: ', stats['best_val_acc'], 'Test accuracy: ', test_acc
#################################################################################
#                               END OF YOUR CODE                                #
#################################################################################


iteration 0 / 15000: loss 2.302861
iteration 100 / 15000: loss 1.838995
iteration 200 / 15000: loss 1.690406
iteration 300 / 15000: loss 1.664884
iteration 400 / 15000: loss 1.515058
iteration 500 / 15000: loss 1.575803
iteration 600 / 15000: loss 1.560809
iteration 700 / 15000: loss 1.550282
iteration 800 / 15000: loss 1.365007
iteration 900 / 15000: loss 1.472263
iteration 1000 / 15000: loss 1.462418
iteration 1100 / 15000: loss 1.362322
iteration 1200 / 15000: loss 1.520131
iteration 1300 / 15000: loss 1.386946
iteration 1400 / 15000: loss 1.249246
iteration 1500 / 15000: loss 1.224484
iteration 1600 / 15000: loss 1.217268
iteration 1700 / 15000: loss 1.411583
iteration 1800 / 15000: loss 1.314169
iteration 1900 / 15000: loss 1.212310
iteration 2000 / 15000: loss 1.263647
iteration 2100 / 15000: loss 1.081395
iteration 2200 / 15000: loss 1.242019
iteration 2300 / 15000: loss 1.280284
iteration 2400 / 15000: loss 1.180105
iteration 2500 / 15000: loss 1.176819
iteration 2600 / 15000: loss 1.286027
iteration 2700 / 15000: loss 1.129642
iteration 2800 / 15000: loss 1.152007
iteration 2900 / 15000: loss 0.877283
iteration 3000 / 15000: loss 1.107274
iteration 3100 / 15000: loss 1.063716
iteration 3200 / 15000: loss 1.090910
iteration 3300 / 15000: loss 1.113020
iteration 3400 / 15000: loss 1.222601
iteration 3500 / 15000: loss 1.000864
iteration 3600 / 15000: loss 1.161176
iteration 3700 / 15000: loss 1.029722
iteration 3800 / 15000: loss 1.024019
iteration 3900 / 15000: loss 1.078409
iteration 4000 / 15000: loss 1.150775
iteration 4100 / 15000: loss 1.025028
iteration 4200 / 15000: loss 1.002042
iteration 4300 / 15000: loss 1.007431
iteration 4400 / 15000: loss 0.963527
iteration 4500 / 15000: loss 1.025350
iteration 4600 / 15000: loss 0.907627
iteration 4700 / 15000: loss 0.940467
iteration 4800 / 15000: loss 1.076877
iteration 4900 / 15000: loss 0.929436
iteration 5000 / 15000: loss 0.944205
iteration 5100 / 15000: loss 1.038381
iteration 5200 / 15000: loss 0.887022
iteration 5300 / 15000: loss 0.851182
iteration 5400 / 15000: loss 1.008594
iteration 5500 / 15000: loss 0.845478
iteration 5600 / 15000: loss 0.890566
iteration 5700 / 15000: loss 0.909917
iteration 5800 / 15000: loss 0.838521
iteration 5900 / 15000: loss 0.902939
iteration 6000 / 15000: loss 0.850292
iteration 6100 / 15000: loss 0.833458
iteration 6200 / 15000: loss 0.915525
iteration 6300 / 15000: loss 0.985770
iteration 6400 / 15000: loss 0.942741
iteration 6500 / 15000: loss 0.913543
iteration 6600 / 15000: loss 0.880701
iteration 6700 / 15000: loss 0.853265
iteration 6800 / 15000: loss 0.966762
iteration 6900 / 15000: loss 0.939237
iteration 7000 / 15000: loss 0.932870
iteration 7100 / 15000: loss 0.872834
iteration 7200 / 15000: loss 0.843612
iteration 7300 / 15000: loss 0.897157
iteration 7400 / 15000: loss 0.934481
iteration 7500 / 15000: loss 0.829609
iteration 7600 / 15000: loss 0.951289
iteration 7700 / 15000: loss 0.915690
iteration 7800 / 15000: loss 0.906896
iteration 7900 / 15000: loss 0.907321
iteration 8000 / 15000: loss 0.940879
iteration 8100 / 15000: loss 0.819042
iteration 8200 / 15000: loss 0.835554
iteration 8300 / 15000: loss 0.759392
iteration 8400 / 15000: loss 0.843715
iteration 8500 / 15000: loss 0.795839
iteration 8600 / 15000: loss 0.860421
iteration 8700 / 15000: loss 0.838070
iteration 8800 / 15000: loss 0.844575
iteration 8900 / 15000: loss 0.850190
iteration 9000 / 15000: loss 0.823846
iteration 9100 / 15000: loss 0.827456
iteration 9200 / 15000: loss 0.825139
iteration 9300 / 15000: loss 0.821669
iteration 9400 / 15000: loss 0.782539
iteration 9500 / 15000: loss 0.853103
iteration 9600 / 15000: loss 0.779079
iteration 9700 / 15000: loss 0.753872
iteration 9800 / 15000: loss 0.912820
iteration 9900 / 15000: loss 0.872414
iteration 10000 / 15000: loss 0.919943
iteration 10100 / 15000: loss 0.867612
iteration 10200 / 15000: loss 0.929366
iteration 10300 / 15000: loss 0.880612
iteration 10400 / 15000: loss 0.807983
iteration 10500 / 15000: loss 0.758022
iteration 10600 / 15000: loss 0.846027
iteration 10700 / 15000: loss 0.785558
iteration 10800 / 15000: loss 0.903450
iteration 10900 / 15000: loss 0.838284
iteration 11000 / 15000: loss 0.824468
iteration 11100 / 15000: loss 0.778718
iteration 11200 / 15000: loss 0.864977
iteration 11300 / 15000: loss 0.891792
iteration 11400 / 15000: loss 0.952166
iteration 11500 / 15000: loss 0.814440
iteration 11600 / 15000: loss 0.871742
iteration 11700 / 15000: loss 0.753738
iteration 11800 / 15000: loss 0.858441
iteration 11900 / 15000: loss 0.765860
iteration 12000 / 15000: loss 0.807904
iteration 12100 / 15000: loss 0.789161
iteration 12200 / 15000: loss 0.745866
iteration 12300 / 15000: loss 0.679046
iteration 12400 / 15000: loss 0.776020
iteration 12500 / 15000: loss 0.794142
iteration 12600 / 15000: loss 0.945326
iteration 12700 / 15000: loss 0.913506
iteration 12800 / 15000: loss 0.859616
iteration 12900 / 15000: loss 0.854929
iteration 13000 / 15000: loss 0.801189
iteration 13100 / 15000: loss 0.941473
iteration 13200 / 15000: loss 0.870938
iteration 13300 / 15000: loss 0.843294
iteration 13400 / 15000: loss 0.910465
iteration 13500 / 15000: loss 0.766685
iteration 13600 / 15000: loss 0.824633
iteration 13700 / 15000: loss 0.730603
iteration 13800 / 15000: loss 0.980873
iteration 13900 / 15000: loss 0.798108
iteration 14000 / 15000: loss 0.734966
iteration 14100 / 15000: loss 0.887481
iteration 14200 / 15000: loss 0.869299
iteration 14300 / 15000: loss 0.862543
iteration 14400 / 15000: loss 0.806841
iteration 14500 / 15000: loss 0.824723
iteration 14600 / 15000: loss 0.967816
iteration 14700 / 15000: loss 0.819208
iteration 14800 / 15000: loss 0.889108
iteration 14900 / 15000: loss 0.854705
Validation accuracy:  0.575 Test accuracy:  0.553

In [12]:
# visualize the weights of the best network
show_net_weights(best_net)


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-92183bf1948b> in <module>()
      1 # visualize the weights of the best network
----> 2 show_net_weights(best_net)

<ipython-input-10-b31a8c78de82> in show_net_weights(net)
      4 
      5 def show_net_weights(net):
----> 6   W1 = net.params['W1']
      7   W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
      8   plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))

AttributeError: 'NoneType' object has no attribute 'params'

Run on the test set

When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.

We will give you extra bonus point for every 1% of accuracy above 52%.


In [ ]:
test_acc = (best_net.predict(X_test) == y_test).mean()
print 'Test accuracy: ', test_acc