Exercise 4: Neural Network Learning


In [ ]:
import numpy as np
import scipy.io
import scipy.optimize
import matplotlib.pyplot as plt
%matplotlib inline

In [ ]:
# uncomment for console - useful for debugging
# %qtconsole

In [ ]:
ex3data1 = scipy.io.loadmat("./ex4data1.mat")
X = ex3data1['X']
y = ex3data1['y'][:,0]
m, n = X.shape
m, n

In [ ]:
input_layer_size  = n  # 20x20 Input Images of Digits
hidden_layer_size = 25 # 25 hidden units
num_labels = 10        # 10 labels, from 1 to 10
                       # (note that we have mapped "0" to label 10)
lambda_ = 1

Part 1: Loading and Visualizing Data

We start the exercise by first loading and visualizing the dataset. You will be working with a dataset that contains handwritten digits.


In [ ]:
def display(X, display_rows=5, display_cols=5, figsize=(4,4), random_x=False):
    m = X.shape[0]
    fig, axes = plt.subplots(display_rows, display_cols, figsize=figsize)
    fig.subplots_adjust(wspace=0.1, hspace=0.1)

    import random

    for i, ax in enumerate(axes.flat):
        ax.set_axis_off()
        x = None
        if random_x:
            x = random.randint(0, m-1)
        else:
            x = i
        image = X[x].reshape(20, 20).T
        image = image / np.max(image)
        ax.imshow(image, cmap=plt.cm.Greys_r)

display(X, random_x=True)

In [ ]:
def add_ones_column(array):
    return np.insert(array, 0, 1, axis=1)

Part 2: Loading Parameters

In this part of the exercise, we load some pre-initialized neural network parameters.


In [ ]:
ex4weights = scipy.io.loadmat('./ex4weights.mat')
Theta1 = ex4weights['Theta1']
Theta2 = ex4weights['Theta2']
print(Theta1.shape, Theta2.shape)

Unrolling the parameters into one vector:


In [ ]:
nn_params = np.concatenate((Theta1.flat, Theta2.flat))
nn_params.shape

In [ ]:
def sigmoid(z):
    return 1 / (1+np.exp(-z))

Part 3: Compute Cost (Feedforward)

To the neural network, you should first start by implementing the feedforward part of the neural network that returns the cost only. You should complete the code in nn_cost_function() to return cost. After implementing the feedforward to compute the cost, you can verify that your implementation is correct by verifying that you get the same cost as us for the fixed debugging parameters.

We suggest implementing the feedforward cost without regularization first so that it will be easier for you to debug. Later, in part 4, you will get to implement the regularized cost.


In [ ]:
def nn_cost_function(nn_params, input_layer_size, hidden_layer_size,
                     num_labels, X, y, lambda_):
    #NNCOSTFUNCTION Implements the neural network cost function for a two layer
    #neural network which performs classification
    #   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
    #   X, y, lambda) computes the cost and gradient of the neural network. The
    #   parameters for the neural network are "unrolled" into the vector
    #   nn_params and need to be converted back into the weight matrices. 
    # 
    #   The returned parameter grad should be a "unrolled" vector of the
    #   partial derivatives of the neural network.
    #

    # Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
    # for our 2 layer neural network
    t1_len = (input_layer_size+1)*hidden_layer_size
    Theta1 = nn_params[:t1_len].reshape(hidden_layer_size, input_layer_size+1)
    Theta2 = nn_params[t1_len:].reshape(num_labels, hidden_layer_size+1)
    m = X.shape[0]
    
    # You need to return the following variables correctly 
    J = 0;
    Theta1_grad = np.zeros(Theta1.shape);
    Theta2_grad = np.zeros(Theta2.shape);
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: You should complete the code by working through the
    #               following parts.
    #
    # Part 1: Feedforward the neural network and return the cost in the
    #         variable J. After implementing Part 1, you can verify that your
    #         cost function computation is correct by verifying the cost
    #         computed for lambda == 0.
    #
    # Part 2: Implement the backpropagation algorithm to compute the gradients
    #         Theta1_grad and Theta2_grad. You should return the partial derivatives of
    #         the cost function with respect to Theta1 and Theta2 in Theta1_grad and
    #         Theta2_grad, respectively. After implementing Part 2, you can check
    #         that your implementation is correct by running checkNNGradients
    #
    #         Note: The vector y passed into the function is a vector of labels
    #               containing values from 1..K. You need to map this vector into a 
    #               binary vector of 1's and 0's to be used with the neural network
    #               cost function.
    #
    #         Hint: We recommend implementing backpropagation using a for-loop
    #               over the training examples if you are implementing it for the 
    #               first time.
    #
    # Part 3: Implement regularization with the cost function and gradients.
    #
    #         Hint: You can implement this around the code for
    #               backpropagation. That is, you can compute the gradients for
    #               the regularization separately and then add them to Theta1_grad
    #               and Theta2_grad from Part 2.
    #
    
    
    
    
    
    
    # =========================================================================

    # Unroll gradients
    gradient = np.concatenate((Theta1_grad.flat, Theta2_grad.flat))

    return J, gradient

The cost at the given parameters should be about 0.287629.


In [ ]:
lambda_ = 0       # No regularization
nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_)

The cost at the given parameters and a regularization factor of 1 should be about 0.38377.

Part 4: Implement Regularization

Once your cost function implementation is correct, you should now continue to implement the regularization with the cost.


In [ ]:
lambda_ = 1
nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_)

Part 5: Sigmoid Gradient

Before you start implementing the neural network, you will first implement the gradient for the sigmoid function. You should complete the code in sigmoid_gradient.


In [ ]:
def sigmoid_gradient(z):
    #SIGMOIDGRADIENT returns the gradient of the sigmoid function
    #evaluated at z
    #   g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function
    #   evaluated at z. This should work regardless if z is a matrix or a
    #   vector. In particular, if z is a vector or matrix, you should return
    #   the gradient for each element.

    g = np.zeros(z.shape)

    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the gradient of the sigmoid function evaluated at
    #               each value of z (z can be a matrix, vector or scalar).


    
    
    
    # =============================================================

    
    return g

In [ ]:
sigmoid_gradient(np.array([1, -0.5, 0, 0.5, 1]))

Part 6: Initializing Pameters

In this part of the exercise, you will be starting to implment a two layer neural network that classifies digits. You will start by implementing a function to initialize the weights of the neural network.


In [ ]:
def rand_initialize_weight(L_in, L_out):
    #RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
    #incoming connections and L_out outgoing connections
    #   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights 
    #   of a layer with L_in incoming connections and L_out outgoing 
    #   connections. 
    #
    #   Note that W should be set to a matrix of size(L_out, 1 + L_in) as
    #   the column row of W handles the "bias" terms
    #
    
    # You need to return the following variables correctly 
    W = np.zeros((L_out, L_in))
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Initialize W randomly so that we break the symmetry while
    #               training the neural network.
    #
    # Note: The first row of W corresponds to the parameters for the bias units
    #
    
    return W
    
    # =========================================================================

Part 7: Implement Backpropagation

Once your cost matches up with ours, you should proceed to implement the backpropagation algorithm for the neural network. You should add to the code you've written in nn_cost_function to return the partial derivatives of the parameters.


In [ ]:
def numerical_gradient(f, x, dx=1e-6):
    perturb = np.zeros(x.size)
    result  = np.zeros(x.size)
    for i in range(x.size):
        perturb[i] = dx
        result[i] = (f(x+perturb) - f(x-perturb)) / (2*dx)
        perturb[i] = 0
    return result

def check_NN_gradients(lambda_=0):
    input_layer_size = 3
    hidden_layer_size = 5
    num_labels = 3
    m = 5

    def debug_matrix(fan_out, fan_in):
        W = np.sin(np.arange(fan_out * (fan_in+1))+1) / 10
        return W.reshape(fan_out, fan_in+1)

    Theta1 = debug_matrix(hidden_layer_size, input_layer_size)
    Theta2 = debug_matrix(num_labels, hidden_layer_size)

    X = debug_matrix(m, input_layer_size - 1)
    y = 1 + ((1 + np.arange(m)) % num_labels)
    
    nn_params = np.concatenate([Theta1.flat, Theta2.flat])

    cost, grad = nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_)
    def just_cost(nn_params):
        cost, grad = nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_)
        return cost
    
    return np.sum(np.abs(grad - numerical_gradient(just_cost, nn_params))) / grad.size

If your backpropagation implementation is correct, then the relative difference will be small (less than 1e-9).


In [ ]:
check_NN_gradients()

In [ ]:
initial_Theta1 = rand_initialize_weight(hidden_layer_size, input_layer_size+1)
initial_Theta2 = rand_initialize_weight(num_labels, hidden_layer_size+1)

Part 8: Implement Regularization

Once your backpropagation implementation is correct, you should now continue to implement the regularization with the cost and gradient.


In [ ]:
def cost_fun(nn_params):
    return nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda_)

lambda_ = 3
nn_params = np.concatenate((initial_Theta1.flat, initial_Theta2.flat))
res = scipy.optimize.minimize(cost_fun, nn_params, jac=True, method='L-BFGS-B', 
                              options=dict(maxiter=200, disp=True))

In [ ]:
res

The cost at lambda = 3 should be about 0.57.


In [ ]:
res.fun

Part 8: Training NN

You have now implemented all the code necessary to train a neural network. To train your neural network, we will use scipy.optimize.minimize.

Recall that these advanced optimizers are able to train our cost functions efficiently as long as we provide them with the gradient computations.

After you have completed the assignment, change the MaxIter to a larger value to see how more training helps. You should also try different values of lambda.


In [ ]:
lambda_ = 1
nn_params = np.concatenate((initial_Theta1.flat, initial_Theta2.flat))
res = scipy.optimize.minimize(cost_fun, nn_params, jac=True, method='L-BFGS-B', 
                              options=dict(maxiter=200, disp=True))
nn_params = res.x

Obtain Theta1 and Theta2 back from nn_params:


In [ ]:
t1_len = (input_layer_size+1)*hidden_layer_size
Theta1 = nn_params[:t1_len].reshape(hidden_layer_size, input_layer_size+1)
Theta2 = nn_params[t1_len:].reshape(num_labels, hidden_layer_size+1)

Part 9: Visualize Weights

You can now "visualize" what the neural network is learning by displaying the hidden units to see what features they are capturing in the data.


In [ ]:
display(Theta1[:,1:], figsize=(6,6))

In [ ]:
def predict(Theta1, Theta2, X):
    #PREDICT Predict the label of an input given a trained neural network
    #   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
    #   trained weights of a neural network (Theta1, Theta2)

    m = X.shape[0]
    num_labels = Theta2.shape[1]
    
    # You need to return the following variables correctly. Remember that 
    # the given data labels go from 1..10, with 10 representing the digit 0!
    p = np.zeros(X.shape[0])

    
    # ====================== YOUR CODE HERE ======================
    
    
    
    
    # ============================================================
    
    
    return p

predictions = predict(Theta1, Theta2, X)
np.mean(predictions == y)