Implementing a Neural Network

In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.


In [1]:
# A bit of setup

from __future__ import absolute_import, division, print_function

import numpy as np
import matplotlib.pyplot as plt
import seaborn

from cs231n.classifiers.neural_net import TwoLayerNet

%matplotlib inline

# set default size of plots
plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(
        np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

We will use the class TwoLayerNet in the file cs231n/classifiers/neural_net.py to represent instances of our network. The network parameters are stored in the instance variable self.params where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.


In [2]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
    np.random.seed(0)
    return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
    np.random.seed(1)
    X = 10 * np.random.randn(num_inputs, input_size)
    y = np.array([0, 1, 2, 2, 1])
    return X, y

net = init_toy_model()
X, y = init_toy_data()

Forward pass: compute scores

Open the file cs231n/classifiers/neural_net.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.


In [3]:
scores = net.loss(X)
print('Your scores:')
print(scores)
print('')
print('correct scores:')
correct_scores = np.asarray([
    [-0.81233741, -1.27654624, -0.70335995],
    [-0.17129677, -1.18803311, -0.47310444],
    [-0.51590475, -1.01354314, -0.8504215 ],
    [-0.15419291, -0.48629638, -0.52901952],
    [-0.00618733, -0.12435261, -0.15226949]])
print(correct_scores)
print('')

# The difference should be very small. We get < 1e-7
print('Difference between your scores and correct scores:')
print(np.sum(np.abs(scores - correct_scores)))


Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.68027207459e-08

Forward pass: compute loss

In the same function, implement the second part that computes the data and regularizaion loss.


In [4]:
loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print('Difference between your loss and correct loss:')
print(np.sum(np.abs(loss - correct_loss)))


Difference between your loss and correct loss:
1.79856129989e-13

Backward pass

Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1, b1, W2, and b2. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:


In [5]:
from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of
# the backward pass. If your implementation is correct, the
# difference between the numeric and analytic gradients should be
# less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.1)

# these should all be less than 1e-8 or so
for param_name in grads:
    f = lambda W: net.loss(X, y, reg=0.1)[0]
    param_grad_num = eval_numerical_gradient(
        f, net.params[param_name], verbose=False)
    print('{} max relative error: {:e}'.format(
        param_name, rel_error(param_grad_num, grads[param_name])))


b2 max relative error: 4.447625e-11
b1 max relative error: 2.738421e-09
W1 max relative error: 3.561318e-09
W2 max relative error: 3.440708e-09

Train the network

To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.


In [6]:
net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=1e-5,
            num_iters=100, verbose=False)

print('Final training loss: ', stats['loss_history'][-1])

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Final training loss:  0.0171496079387

Load the data

Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it's time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a real dataset.


In [7]:
from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000,
                     num_validation=1000,
                     num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing
    to prepare it for the two-layer neural net classifier. These
    are the same steps as we used for the SVM, but condensed to a
    single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = '../data/cifar10'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)


Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

Train a network

To train our network we will use SGD with momentum. In addition, we will adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the learning rate by multiplying it by a decay rate.


In [8]:
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
                  #num_iters=1000, batch_size=200,
                  num_iters=1000, batch_size=200,
                  #learning_rate=1e-4, learning_rate_decay=0.95,
                  learning_rate=9e-4, learning_rate_decay=0.98, 
                  reg=1, verbose=True)
                  #reg=0.5, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('Validation accuracy: ', val_acc)


iteration 0/1000: loss 2.303339
iteration 100/1000: loss 1.987746
iteration 200/1000: loss 1.781735
iteration 300/1000: loss 1.645711
iteration 400/1000: loss 1.749594
iteration 500/1000: loss 1.591137
iteration 600/1000: loss 1.685168
iteration 700/1000: loss 1.490434
iteration 800/1000: loss 1.507688
iteration 900/1000: loss 1.584159
Validation accuracy:  0.459

Debug the training

With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good.

One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.


In [9]:
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train', color='blue')
plt.plot(stats['val_acc_history'], label='val', color='green')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()



In [10]:
from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(net)


Tune your hyperparameters

What's wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).


In [11]:
best_net = None # store the best model into this
best_net_stats = None
best_val_acc = 0.0
best_lr = None
best_decay = None
best_reg = None

###################################################################
# TODO: Tune hyperparameters using the validation set. Store your #
# best trained   model in best_net.                               #
#                                                                 #
# To help debug your network, it may help to use visualizations   #
# similar to the ones we used above; these visualizations will    #
# have significant qualitative differences from the ones we saw   #
# above for the poorly tuned network.                             #
#                                                                 #
# Tweaking hyperparameters by hand can be fun, but you might find #
# it useful to write code to sweep through possible combinations  #
# of hyperparameters automatically like we did on the previous    #
# exercises.
###################################################################
import itertools

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
num_iters = 1000
N = X_train.shape[0]
batch_size = 200
std = np.sqrt(2.0 / N)

#learning_rates = [9e-4]
learning_rates = 10 ** np.linspace(np.log10(9e-4) - 0.3,
                                   np.log10(9e-4) + 0.3, 5)

#learning_rate_decays = np.linspace(0.8, 0.98, 5)
learning_rate_decays = 10 ** np.linspace(-0.017728, -0.004364, 4)

#regularizations = [0.003, 0.01, 0.03, 0.1, 0.3, 1, 3]
#regularizations = 10 ** np.linspace(-1.52, 0, 5)
regularizations = 10 ** np.linspace(-1.5, 0.5, 5)

combinations = itertools.product(learning_rates,
                                 learning_rate_decays,
                                 regularizations)

results = {} # dict of validation accuracies for each combination

for lr, decay, reg in combinations:
    net = TwoLayerNet(input_size, hidden_size, num_classes, std)
    stats = net.train(X_train, y_train, X_val, y_val,
                      num_iters=num_iters, batch_size=batch_size,
                      learning_rate=lr, learning_rate_decay=decay,
                      reg=reg, verbose=False)
    train_acc = np.mean(net.predict(X_train) == y_train)
    val_acc = np.mean(net.predict(X_val) == y_val)
    results[(lr, decay, reg)] = val_acc
    print('lr={}, decay={}, reg={}:'.format(lr, decay, reg))
    print('\ttrain_acc={}, val_acc={}'.format(train_acc, val_acc))
    if (val_acc > best_val_acc):
        best_net = net
        best_net_stats = stats
        best_val_acc = val_acc
        best_lr = lr
        best_decay = decay
        best_reg = reg

print('')
print('Best validation accuracy:', best_val_acc)
print('\tlearning rate:', best_lr)
print('\tlearning rate decay:', best_decay)
print('\tregularization strength:', best_reg)
###################################################################
#                         END OF YOUR CODE                        #
###################################################################


lr=0.000451068510265, decay=0.960001695353, reg=0.0316227766017:
	train_acc=0.452693877551, val_acc=0.438
lr=0.000451068510265, decay=0.960001695353, reg=0.1:
	train_acc=0.446816326531, val_acc=0.444
lr=0.000451068510265, decay=0.960001695353, reg=0.316227766017:
	train_acc=0.448693877551, val_acc=0.448
lr=0.000451068510265, decay=0.960001695353, reg=1.0:
	train_acc=0.452346938776, val_acc=0.464
lr=0.000451068510265, decay=0.960001695353, reg=3.16227766017:
	train_acc=0.451408163265, val_acc=0.454
lr=0.000451068510265, decay=0.969899346399, reg=0.0316227766017:
	train_acc=0.448632653061, val_acc=0.451
lr=0.000451068510265, decay=0.969899346399, reg=0.1:
	train_acc=0.447448979592, val_acc=0.44
lr=0.000451068510265, decay=0.969899346399, reg=0.316227766017:
	train_acc=0.45106122449, val_acc=0.452
lr=0.000451068510265, decay=0.969899346399, reg=1.0:
	train_acc=0.453489795918, val_acc=0.436
lr=0.000451068510265, decay=0.969899346399, reg=3.16227766017:
	train_acc=0.445448979592, val_acc=0.445
lr=0.000451068510265, decay=0.979899042573, reg=0.0316227766017:
	train_acc=0.453265306122, val_acc=0.439
lr=0.000451068510265, decay=0.979899042573, reg=0.1:
	train_acc=0.452653061224, val_acc=0.438
lr=0.000451068510265, decay=0.979899042573, reg=0.316227766017:
	train_acc=0.455367346939, val_acc=0.46
lr=0.000451068510265, decay=0.979899042573, reg=1.0:
	train_acc=0.453714285714, val_acc=0.468
lr=0.000451068510265, decay=0.979899042573, reg=3.16227766017:
	train_acc=0.447816326531, val_acc=0.446
lr=0.000451068510265, decay=0.990001835964, reg=0.0316227766017:
	train_acc=0.452204081633, val_acc=0.453
lr=0.000451068510265, decay=0.990001835964, reg=0.1:
	train_acc=0.453795918367, val_acc=0.415
lr=0.000451068510265, decay=0.990001835964, reg=0.316227766017:
	train_acc=0.454142857143, val_acc=0.424
lr=0.000451068510265, decay=0.990001835964, reg=1.0:
	train_acc=0.450408163265, val_acc=0.452
lr=0.000451068510265, decay=0.990001835964, reg=3.16227766017:
	train_acc=0.445673469388, val_acc=0.444
lr=0.000637151205946, decay=0.960001695353, reg=0.0316227766017:
	train_acc=0.466244897959, val_acc=0.464
lr=0.000637151205946, decay=0.960001695353, reg=0.1:
	train_acc=0.467489795918, val_acc=0.47
lr=0.000637151205946, decay=0.960001695353, reg=0.316227766017:
	train_acc=0.467040816327, val_acc=0.47
lr=0.000637151205946, decay=0.960001695353, reg=1.0:
	train_acc=0.472448979592, val_acc=0.457
lr=0.000637151205946, decay=0.960001695353, reg=3.16227766017:
	train_acc=0.452469387755, val_acc=0.449
lr=0.000637151205946, decay=0.969899346399, reg=0.0316227766017:
	train_acc=0.465020408163, val_acc=0.445
lr=0.000637151205946, decay=0.969899346399, reg=0.1:
	train_acc=0.463571428571, val_acc=0.448
lr=0.000637151205946, decay=0.969899346399, reg=0.316227766017:
	train_acc=0.46693877551, val_acc=0.452
lr=0.000637151205946, decay=0.969899346399, reg=1.0:
	train_acc=0.470775510204, val_acc=0.464
lr=0.000637151205946, decay=0.969899346399, reg=3.16227766017:
	train_acc=0.457367346939, val_acc=0.46
lr=0.000637151205946, decay=0.979899042573, reg=0.0316227766017:
	train_acc=0.467734693878, val_acc=0.464
lr=0.000637151205946, decay=0.979899042573, reg=0.1:
	train_acc=0.461571428571, val_acc=0.444
lr=0.000637151205946, decay=0.979899042573, reg=0.316227766017:
	train_acc=0.470979591837, val_acc=0.459
lr=0.000637151205946, decay=0.979899042573, reg=1.0:
	train_acc=0.467510204082, val_acc=0.438
lr=0.000637151205946, decay=0.979899042573, reg=3.16227766017:
	train_acc=0.455448979592, val_acc=0.445
lr=0.000637151205946, decay=0.990001835964, reg=0.0316227766017:
	train_acc=0.464857142857, val_acc=0.457
lr=0.000637151205946, decay=0.990001835964, reg=0.1:
	train_acc=0.46893877551, val_acc=0.448
lr=0.000637151205946, decay=0.990001835964, reg=0.316227766017:
	train_acc=0.470510204082, val_acc=0.455
lr=0.000637151205946, decay=0.990001835964, reg=1.0:
	train_acc=0.470020408163, val_acc=0.464
lr=0.000637151205946, decay=0.990001835964, reg=3.16227766017:
	train_acc=0.458448979592, val_acc=0.445
lr=0.0009, decay=0.960001695353, reg=0.0316227766017:
	train_acc=0.480693877551, val_acc=0.478
lr=0.0009, decay=0.960001695353, reg=0.1:
	train_acc=0.478224489796, val_acc=0.436
lr=0.0009, decay=0.960001695353, reg=0.316227766017:
	train_acc=0.483163265306, val_acc=0.46
lr=0.0009, decay=0.960001695353, reg=1.0:
	train_acc=0.481530612245, val_acc=0.461
lr=0.0009, decay=0.960001695353, reg=3.16227766017:
	train_acc=0.457326530612, val_acc=0.466
lr=0.0009, decay=0.969899346399, reg=0.0316227766017:
	train_acc=0.48393877551, val_acc=0.46
lr=0.0009, decay=0.969899346399, reg=0.1:
	train_acc=0.484612244898, val_acc=0.456
lr=0.0009, decay=0.969899346399, reg=0.316227766017:
	train_acc=0.471367346939, val_acc=0.457
lr=0.0009, decay=0.969899346399, reg=1.0:
	train_acc=0.487408163265, val_acc=0.464
lr=0.0009, decay=0.969899346399, reg=3.16227766017:
	train_acc=0.46106122449, val_acc=0.464
lr=0.0009, decay=0.979899042573, reg=0.0316227766017:
	train_acc=0.48693877551, val_acc=0.478
lr=0.0009, decay=0.979899042573, reg=0.1:
	train_acc=0.472081632653, val_acc=0.469
lr=0.0009, decay=0.979899042573, reg=0.316227766017:
	train_acc=0.48612244898, val_acc=0.465
lr=0.0009, decay=0.979899042573, reg=1.0:
	train_acc=0.479244897959, val_acc=0.473
lr=0.0009, decay=0.979899042573, reg=3.16227766017:
	train_acc=0.46712244898, val_acc=0.48
lr=0.0009, decay=0.990001835964, reg=0.0316227766017:
	train_acc=0.478653061224, val_acc=0.459
lr=0.0009, decay=0.990001835964, reg=0.1:
	train_acc=0.478448979592, val_acc=0.455
lr=0.0009, decay=0.990001835964, reg=0.316227766017:
	train_acc=0.481081632653, val_acc=0.465
lr=0.0009, decay=0.990001835964, reg=1.0:
	train_acc=0.477693877551, val_acc=0.471
lr=0.0009, decay=0.990001835964, reg=3.16227766017:
	train_acc=0.457040816327, val_acc=0.451
lr=0.00127128379016, decay=0.960001695353, reg=0.0316227766017:
	train_acc=0.489244897959, val_acc=0.468
lr=0.00127128379016, decay=0.960001695353, reg=0.1:
	train_acc=0.491816326531, val_acc=0.467
lr=0.00127128379016, decay=0.960001695353, reg=0.316227766017:
	train_acc=0.498530612245, val_acc=0.485
lr=0.00127128379016, decay=0.960001695353, reg=1.0:
	train_acc=0.452306122449, val_acc=0.442
lr=0.00127128379016, decay=0.960001695353, reg=3.16227766017:
	train_acc=0.453020408163, val_acc=0.455
lr=0.00127128379016, decay=0.969899346399, reg=0.0316227766017:
	train_acc=0.457428571429, val_acc=0.448
lr=0.00127128379016, decay=0.969899346399, reg=0.1:
	train_acc=0.494959183673, val_acc=0.47
lr=0.00127128379016, decay=0.969899346399, reg=0.316227766017:
	train_acc=0.483020408163, val_acc=0.472
lr=0.00127128379016, decay=0.969899346399, reg=1.0:
	train_acc=0.480979591837, val_acc=0.474
lr=0.00127128379016, decay=0.969899346399, reg=3.16227766017:
	train_acc=0.455387755102, val_acc=0.448
lr=0.00127128379016, decay=0.979899042573, reg=0.0316227766017:
	train_acc=0.474836734694, val_acc=0.43
lr=0.00127128379016, decay=0.979899042573, reg=0.1:
	train_acc=0.491326530612, val_acc=0.46
lr=0.00127128379016, decay=0.979899042573, reg=0.316227766017:
	train_acc=0.49487755102, val_acc=0.473
lr=0.00127128379016, decay=0.979899042573, reg=1.0:
	train_acc=0.491, val_acc=0.485
lr=0.00127128379016, decay=0.979899042573, reg=3.16227766017:
	train_acc=0.467795918367, val_acc=0.458
lr=0.00127128379016, decay=0.990001835964, reg=0.0316227766017:
	train_acc=0.491551020408, val_acc=0.474
lr=0.00127128379016, decay=0.990001835964, reg=0.1:
	train_acc=0.484816326531, val_acc=0.462
lr=0.00127128379016, decay=0.990001835964, reg=0.316227766017:
	train_acc=0.481408163265, val_acc=0.466
lr=0.00127128379016, decay=0.990001835964, reg=1.0:
	train_acc=0.491020408163, val_acc=0.468
lr=0.00127128379016, decay=0.990001835964, reg=3.16227766017:
	train_acc=0.444367346939, val_acc=0.437
lr=0.00179573608347, decay=0.960001695353, reg=0.0316227766017:
	train_acc=0.472102040816, val_acc=0.462
lr=0.00179573608347, decay=0.960001695353, reg=0.1:
	train_acc=0.474775510204, val_acc=0.482
lr=0.00179573608347, decay=0.960001695353, reg=0.316227766017:
	train_acc=0.496857142857, val_acc=0.488
lr=0.00179573608347, decay=0.960001695353, reg=1.0:
	train_acc=0.495428571429, val_acc=0.482
lr=0.00179573608347, decay=0.960001695353, reg=3.16227766017:
	train_acc=0.438775510204, val_acc=0.448
lr=0.00179573608347, decay=0.969899346399, reg=0.0316227766017:
	train_acc=0.495612244898, val_acc=0.455
lr=0.00179573608347, decay=0.969899346399, reg=0.1:
	train_acc=0.488469387755, val_acc=0.452
lr=0.00179573608347, decay=0.969899346399, reg=0.316227766017:
	train_acc=0.487428571429, val_acc=0.465
lr=0.00179573608347, decay=0.969899346399, reg=1.0:
	train_acc=0.481959183673, val_acc=0.454
lr=0.00179573608347, decay=0.969899346399, reg=3.16227766017:
	train_acc=0.457428571429, val_acc=0.467
lr=0.00179573608347, decay=0.979899042573, reg=0.0316227766017:
	train_acc=0.474714285714, val_acc=0.457
lr=0.00179573608347, decay=0.979899042573, reg=0.1:
	train_acc=0.488510204082, val_acc=0.459
lr=0.00179573608347, decay=0.979899042573, reg=0.316227766017:
	train_acc=0.493428571429, val_acc=0.458
lr=0.00179573608347, decay=0.979899042573, reg=1.0:
	train_acc=0.467714285714, val_acc=0.462
lr=0.00179573608347, decay=0.979899042573, reg=3.16227766017:
	train_acc=0.434142857143, val_acc=0.443
lr=0.00179573608347, decay=0.990001835964, reg=0.0316227766017:
	train_acc=0.482020408163, val_acc=0.476
lr=0.00179573608347, decay=0.990001835964, reg=0.1:
	train_acc=0.484693877551, val_acc=0.45
lr=0.00179573608347, decay=0.990001835964, reg=0.316227766017:
	train_acc=0.478, val_acc=0.444
lr=0.00179573608347, decay=0.990001835964, reg=1.0:
	train_acc=0.472510204082, val_acc=0.456
lr=0.00179573608347, decay=0.990001835964, reg=3.16227766017:
	train_acc=0.446, val_acc=0.448

Best validation accuracy: 0.488
	learning rate: 0.00179573608347
	learning rate decay: 0.960001695353
	regularization strength: 0.316227766017

In [12]:
# Plot the loss function and train / validation accuracies
# for the best net
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train', color='blue')
plt.plot(stats['val_acc_history'], label='val', color='green')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()



In [13]:
# visualize the weights of the best network
show_net_weights(best_net)


Run on the test set

When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.

We will give you extra bonus point for every 1% of accuracy above 52%.


In [14]:
test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy:', test_acc)


Test accuracy: 0.488

Conclusion

Note that different runs may yield different results.

Best validation accuracy: 0.488
    learning rate:           0.00179573608347
    learning rate decay:     0.960001695353
    regularization strength: 0.316227766017

And

Training accuracy:   0.497
Validation accuracy: 0.488
Test accuracy:       0.488

In [ ]: