Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the HW page on the course website.
Having gained some experience with neural networks, let us train a network that predicts the next character given a set of characters in a text.
All of your work for this exercise will be done in this notebook.
In [1]:
import random
import numpy as np
from metu.data_utils import load_nextchar_dataset, plain_text_file_to_dataset
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
""" returns relative error """
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
In [2]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.
from cs231n.classifiers.neural_net_for_regression import TwoLayerNet
input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5
def init_toy_model():
np.random.seed(0)
return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)
def init_toy_data():
np.random.seed(1)
X = 10 * np.random.randn(num_inputs, input_size)
y = np.array([[0, 1, 2], [1, 2, 3], [2, 3, 4], [2, 1, 4], [2, 1, 4]])
return X, y
net = init_toy_model()
X, y = init_toy_data()
Open the file cs231n/classifiers/neural_net_for_regression.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the previous exercises: It takes the data and weights and computes the regression scores, the squared error loss, and the gradients on the parameters.
To be more specific, you will implement the following loss function:
$$\frac{1}{2}\sum_i (o_i - y_i)^2 + \frac{1}{2}\lambda\sum_j w_j^2,$$where $i$ runs through the samples in the batch; $o_i$ is the prediction of the network for the $i^{th}$ sample, and $y_i$ is the correct character; $\lambda$ is the weight of the regularization term.
The first layer uses ReLU as the activation function. The output layer does not use any activation functions.
Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.
In [3]:
scores = net.loss(X)
print 'Your scores:'
print scores
print
print 'correct scores:'
correct_scores = np.asarray([
[-0.81233741, -1.27654624, -0.70335995],
[-0.17129677, -1.18803311, -0.47310444],
[-0.51590475, -1.01354314, -0.8504215 ],
[-0.15419291, -0.48629638, -0.52901952],
[-0.00618733, -0.12435261, -0.15226949]])
print correct_scores
print
# The difference should be very small. We get < 1e-7
print 'Difference between your scores and correct scores:'
print np.sum(np.abs(scores - correct_scores))
In [4]:
loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 66.3406756909
# should be very small, we get < 1e-10
print 'Difference between your loss and correct loss:'
print np.sum(np.abs(loss - correct_loss))
In [5]:
from cs231n.gradient_check import eval_numerical_gradient
# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.
loss, grads = net.loss(X, y, reg=0.1)
# these should all be less than 1e-8 or so
for param_name in grads:
#this grad check is faulty, it takes parameter W but never uses it
f = lambda W: net.loss(X, y, reg=0.1)[0]
param_grad_num = eval_numerical_gradient(f, net.params[param_name])
print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))
In [38]:
# Load the TEXT data
# If your memory turns out to be sufficient, try the following:
#def get_nextchar_data(training_ratio=0.6, val_ratio=0.1):
def get_nextchar_data(training_ratio=0.1, test_ratio=0.06, val_ratio=0.01):
# Load the nextchar training data
X, y = load_nextchar_dataset(nextchar_datafile)
# Subsample the data
length=len(y)
num_training=int(length*training_ratio)
num_val = int(length*val_ratio)
num_test = min((length-num_training-num_val), int(length*test_ratio))
mask = range(num_training-1)
X_train = X[mask]
y_train = y[mask]
mask = range(num_training, num_training+num_test)
X_test = X[mask]
y_test = y[mask]
mask = range(num_training+num_test, num_training+num_test+num_val)
X_val = X[mask]
y_val = y[mask]
return X_train, y_train, X_val, y_val, X_test, y_test
nextchar_datafile = 'metu/dataset/nextchar_data.pkl'
input_size = 5 # Size of the input of the network
#plain_text_file_to_dataset("metu/dataset/ince_memed_1.txt", nextchar_datafile, input_size)
plain_text_file_to_dataset("metu/dataset/shakespeare.txt", nextchar_datafile, input_size)
X_train, y_train, X_val, y_val, X_test, y_test = get_nextchar_data()
print "Number of instances in the training set: ", len(X_train)
print "Number of instances in the validation set: ", len(X_val)
print "Number of instances in the testing set: ", len(X_test)
In [39]:
# We have loaded the dataset. That wasn't difficult, was it? :)
# Let's look at a few samples
#
from metu.data_utils import int_list_to_string, int_to_charstr
print "Input - Next char to be predicted"
for i in range(1,10):
print int_list_to_string(X_train[i]) + " - " + int_list_to_string(y_train[i])
In [69]:
# Now, let's train a neural network
input_size = input_size
hidden_size = 5000
num_classes = 1
net = TwoLayerNet(input_size, hidden_size, num_classes)
# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=4000, batch_size=32*8,
learning_rate=6e-7, learning_rate_decay=0.95,
reg=5, verbose=True)
# Predict on the validation set
val_err = np.sum(np.square(net.predict(X_val) - y_val), axis=1).mean()
print 'Validation error: ', val_err
I have managed to get a loss below 10,000 and a validation error of about 1100 on the validation set (by playing around the parameters a little bit). However, this isn't very good.
One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.
In [70]:
# Plot the loss function and train / validation errors
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
train = plt.plot(stats['train_err_history'], label='train')
val = plt.plot(stats['val_err_history'], label='val')
plt.legend(loc='upper right', shadow=True)
plt.title('Classification error history')
plt.xlabel('Epoch')
plt.ylabel('Clasification error')
plt.show()
In [71]:
# Show some sample outputs:
print "Input - predicted char - true char"
for i in range(0,100):
print int_list_to_string(X_val[i]) + " - " \
+ int_list_to_string([int(x) for x in net.predict(X_val[i])]) \
+ " - " + int_list_to_string(y_val[i]), net.predict(X_val[i]), y_val[i]
In [72]:
test_err = np.sum(np.square(net.predict(X_test) - y_test), axis=1).mean()
print 'Test error: ', test_err
In [73]:
# Show some sample outputs:
print "Input - predicted char - true char"
for i in range(0,100):
print int_list_to_string(X_test[i]) + " - " \
+ int_list_to_string([int(x) for x in net.predict(X_test[i])]) \
+ " - " + int_list_to_string(y_test[i]), net.predict(X_test[i]), y_test[i]
In [ ]: