Student 1: Fabio Ellena
Student 2: Lorenzo Canale
The aim of this session is to practice with Artificial Neural Networks. Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells.
To generate your final report, use print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by April 7th 2017.
In this session, your will implement, train and test a Neural Network for the Handwritten Digits Recognition problem [1] with different settings of hyper parameters. You will use the MNIST dataset which was constructed from a number of scanned document dataset available from the National Institute of Standards and Technology (NIST). Images of digits were taken from a variety of scanned documents, normalized in size and centered.
<img src="Nimages/mnist.png",width="350" height="500" align="center">
This assignment includes a written part of programms to help you understand how to build and train your neural net and then to test your code and get restults.
Functions defined inside the python files mentionned above can be imported using the python command : from filename import *
You will use the following libraries:
numpy : for creating arrays and using methods to manipulate arrays.
matplotlib : for making plots
Part 1: Before designing and writing your code, you will first work on a neural network by hand.
Consider the above Neural network with two inputs $X=(x1,x2)$, one hidden layers and a single output unit $(y)$.
The initial weights are set to random values. Neurons 6 and 7 represent the bias. Bias values are equal to 1.
Training sample, X = (0.8, 0.2), whose class label is Y=0.4.
Assume that the neurons have a Sigmoid activation function $f(x)=\frac{1}{(1+e^{-x})}$ and the learning rate $\mu$=1
<img src="Nimages/NN.png", width="700" height="900">
Question 1.1.1: Compute the new values of weights $w_{i,j}$ after a forward pass and a backward pass. $w_{i,j}$ is the weight of the connexion between neuron $i$ and neuron $j$.
In [1]:
import utils as UT
import transfer_functions as TF
import NeuralNetwork as NN
import numpy as np
In [2]:
u3 = 0.8*0.3 + 0.2*0.8 + 1*0.2
u4 = 0.8*(-0.5) + 0.2*0.2 + 1*(-0.4)
o3 = TF.sigmoid(u3)
o4 = TF.sigmoid(u4)
o7 = 1.0
o6 = 1.0
u5 = o3*(-0.6) + o4*0.4 + o7*0.5
o5 = TF.sigmoid(u5)
y=o5
print('feed forward:\n')
print('u3=%f' % u3)
print('u4=%f' % u4)
print('u5=%f' % u5)
print('o3=%f' % o3)
print('o4=%f' % o4)
print('o5=%f' % o5)
print('o6=%f' % o6)
print('o7=%f' % o7)
print('output of NN is %f' % o5)
In [3]:
dEu5 = (y-0.4)*o5*(1-o5)
w45 = 0.4
w35 = -0.6
w75 = 0.5
dEu3 = dEu5*w35*o3*(1-o3)
dEu4 = dEu5*w45*o4*(1-o4)
w45 -= dEu5*o4
w35 -= dEu5*o3
w75 -= dEu5*o7
w13 = 0.3
w14 = -0.5
w23 = 0.8
w24 = 0.2
w63 = 0.2
w64 = -0.4
w13 -= dEu3*0.8
w14 -= dEu4*0.8
w23 -= dEu3*0.2
w24 -= dEu4*0.2
w63 -= dEu3*1
w64 -= dEu4*1
print('back propagation ')
print('w13=%f' % w13)
print('w14=%f' % w14)
print('w23=%f' % w23)
print('w24=%f' % w24)
print('w63=%f' % w63)
print('w64=%f' % w64)
print('w35=%f' % w35)
print('w45=%f' % w45)
print('w75=%f' % w75)
u3 = 0.8*w13 + 0.2*w23 + 1*w63
u4 = 0.8*w14 + 0.2*w24 + 1*w64
o3 = TF.sigmoid(u3)
o4 = TF.sigmoid(u3)
o7 = 1
u5 = o3*w35 + o4*w45 + o7*w75
o5 = TF.sigmoid(u5)
y=o5
print('output after backpropagation is %f' % y)
Part 2: Neural Network Implementation
Please read all source files carefully and understand the data structures and all functions. You are to complete the missing code. First you should define the neural network (using the NeuralNetwork class, see in the NeuralNetwork.py file) and reinitialise weights. Then you will to complete the Feed Forward and the Back-propagation functions.
Question 1.2.1: Define the neural network corresponding to the one in part 1
In [6]:
from NeuralNetwork import *
#create the network
my_first_net = NeuralNetwork(input_layer_size=2, hidden_layer_size=2, output_layer_size=1,)
In [7]:
#Data preparation
X=[0.8,0.2]
Y=[0.4]
data=[]
data.append(X)
data.append(Y)
#initialize weights
wi=np.array([[0.3,-0.5],[0.8,0.2],[0.2,-0.4]])
wo=np.array([[-0.6],[0.4],[0.5]])
my_first_net.weights_initialisation(wi,wo)
print(my_first_net.W_input_to_hidden)
print(my_first_net.W_hidden_to_output)
Question 1.2.2: Implement the Feed Forward function (feedForward(X) in the NeuralNetwork.py file)
In [8]:
def feedForward(self, inputs):
# Compute input activations
self.a_input = np.append(inputs, [1])
# Compute hidden activations
self.a_hidden = np.append(self.tf(self.a_input.dot(self.W_input_to_hidden)), [1])
# Compute output activations
self.a_out = self.tf(self.a_hidden.dot(self.W_hidden_to_output))
return self.a_out
Check your network outputs the expected value (the one you computed in question 1.1)
In [9]:
#test my Feed Forward function
Output_activation=my_first_net.feedForward(X)
print("output activation =%.3f" %(Output_activation))
Question 1.2.3: Implement the Back-propagation Algorithm (backPropagate(Y) in the NeuralNetwork.py file)
In [10]:
def backPropagate(self, targets):
# calculate error terms for output
self.err_out = self.a_out - targets
# calculate error terms for hidden
delta_out = self.err_out * self.dtf(self.a_out)
delta_hidden = self.W_hidden_to_output.dot(delta_out) * self.dtf(self.a_hidden)
# update output weights: calculate the new weights
self.W_hidden_to_output -= self.learning_rate * np.outer(self.a_hidden, delta_out)
# update input weights
self.W_input_to_hidden -= self.learning_rate * np.outer(self.a_input, delta_hidden[:-1])
# calculate error
return np.sum(self.err_out**2) / 2
Check the gradient values and weight updates are correct (similar to the ones you computed in question 1.1)
In [11]:
#test my Back-propagation function
my_first_net.backPropagate(Y)
#Print weights after backpropagation
print('wi_new=', my_first_net.W_input_to_hidden)
print('wo_new=', my_first_net.W_hidden_to_output)
Your Feed Forward and Back-Propagation implementations are working, Great!! Let's tackle a real world problem.
Data Preparation
The MNIST dataset consists of handwritten digit images it contains 60,000 examples for the training set and 10,000 examples for testing. In this Lab Session, the official training set of 60,000 is divided into an actual training set of 50,000 examples, 10,000 validation examples and 10,000 examples for test. All digit images have been size-normalized and centered in a fixed size image of 28 x 28 pixels. The images are stored in byte form you will use the NumPy python library to read the data files into NumPy arrays that we will use to train the ANN.
The MNIST dataset is available in the Data folder. To get the training, testing and validation data, run the the load_data() function.
In [12]:
from utils import *
training_data, validation_data, test_data=load_data()
print("Training data size: %d" % (len(training_data)))
print("Validation data size: %d" % (len(validation_data)))
print("Test data size: %d" % (len(test_data)))
MNIST Dataset Digits Visualisation
In [13]:
ROW = 2
COLUMN = 4
for i in range(ROW * COLUMN):
# train[i][0] is i-th image data with size 28x28
image = training_data[i][0].reshape(28, 28)
plt.subplot(ROW, COLUMN, i+1)
plt.imshow(image, cmap='gray') # cmap='gray' is for black and white picture.
plt.axis('off') # do not show axis value
plt.tight_layout() # automatic padding between subplots
plt.show()
Part 1: Creating the Neural Networks
The input layer of the neural network contains neurons encoding the values of the input pixels. The training data for the network will consist of many 28 by 28 pixel images of scanned handwritten digits, and so the input layer contains 784=28×28 neurons. The second layer of the network is a hidden layer, we set the neuron number in the hidden layer to 30. The output layer contains 10 neurons.
Question 2.1.1: Create the network described above using the NeuralNetwork class
In [14]:
#create the network
from NeuralNetwork import *
my_mnist_net = NeuralNetwork(784, 30, 10, iterations=30, learning_rate=0.1)
Question 2.1.2: Add the information about the performance of the neural network on the test set at each epoch
In [15]:
test_accuracy=my_mnist_net.predict(test_data)/100
print('Test_Accuracy %-2.2f' % test_accuracy)
Question 2.1.3: Train the Neural Network and comment your findings
In [16]:
#train your network
evaluations = my_mnist_net.train(training_data,validation_data)
In [17]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [79]:
#save your model in Models/ using a distinguishing name for your model (architecture, learning rate, etc...)
my_mnist_net.save("Models/model_" + str(784) + "_" + str(30) + "_" + str(10) + "_" + str(0.1) + "_" + "30.model")
Question 2.1.4: Guess digit, Implement and test a python function that predict the class of a digit (the folder images_test contains some examples of images of digits)
In [33]:
#Your implementation goes here
reload(NN)
from scipy import misc
def predict_image(my_mnist_net,img_path,number):
img = misc.imread(img_path,mode='L')
plt.imshow(img, cmap='gray')
plt.show()
mean = np.mean(img)
if mean > 255/2 :
img = np.invert(img)
img = misc.imresize(img, (28,28))
plt.imshow(img, cmap='gray')
plt.show()
#count pixels
img = np.reshape(img, (28*28))
test = (img,number)
#print(img)
return my_mnist_net.predict2(test)
print('predicted: %d, real number: %d' %predict_image(my_mnist_net,'./Images_test/4.bmp',4))
print('predicted: %d, real number: %d' %predict_image(my_mnist_net,'./Images_test/5.bmp',5))
print('predicted: %d, real number: %d' %predict_image(my_mnist_net,'./Images_test/9.bmp',9))
Part 2: Change the neural network structure and parameters to optimize performance
Question 2.2.1: Change the learning rate (0.001, 0.1, 1.0 , 10). Train the new neural nets with the original specifications (Part 2.1), for 50 iterations. Plot test accuracy vs iteration for each learning rate on the same graph. Report the maximum test accuracy achieved for each learning rate. Which one achieves the maximum test accuracy?
In [16]:
#Your implementation with a learning rate of 0.001 goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.001, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [17]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [25]:
#Your implementation with a learning rate of 0.01 goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.01, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [26]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [18]:
#Your implementation with a learning rate of 0.1 goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.1, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [19]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [20]:
#Your implementation with a learning rate of 1.0 goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=1.0, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [21]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [22]:
#Your implementation with a learning rate of 10.0 goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=10.0, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [23]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
Question 2.2.2 : initialize all weights to 0. Plot the training accuracy curve. Comment your results
In [27]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.1, iterations=50)
my_mnist_net.weights_initialisation(np.zeros((28*28 + 1, 30)), np.zeros((31,10)))
evaluations = my_mnist_net.train(training_data, validation_data)
In [28]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
When all weights are 0, we have all neurons with the same activation, in this case 0, and the same output, in this case 0.5. What happens is that all deltas will be equal, thus all the weight updates of a single layer will be equal. This means that all the neurons are learning in the same way.
This explains why the error increases in the first iterations and then it converges. This explains well because we need to randomize the weights to avoid this weird simmetry situations.
Question 2.2.3 : Try with a different transfer function (such as tanh). File transfer_functions.py provides you the python implementation of the tanh function and its derivative
In [30]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.1, iterations=50,transfer_function=TF.tanh,d_transfer_function=TF.dtanh)
evaluations = my_mnist_net.train(training_data, validation_data)
In [31]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [52]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 30, 10, learning_rate=0.01, iterations=50,transfer_function=TF.tanh,d_transfer_function=TF.dtanh)
evaluations = my_mnist_net.train(training_data, validation_data)
In [53]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
Question 2.2.4 : Add more neurons in the hidden layer (try with 100, 200, 300). Plot the curve representing the validation accuracy versus the number of neurons in the hidden layer. (Choose and justify other hyper-parameters)
In [32]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 100, 10, learning_rate=0.1, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [33]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [34]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 200, 10, learning_rate=0.1, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [35]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [36]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 300, 10, learning_rate=0.1, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [37]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [50]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 100, 10, learning_rate=1.0, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [51]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [54]:
#Your implementation goes here
my_mnist_net = NeuralNetwork(28*28, 200, 10, learning_rate=1.0, iterations=50)
evaluations = my_mnist_net.train(training_data, validation_data)
In [55]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
Question 2.2.5 : Add one additionnal hidden layers and train your network, discuss your results with different setting.
In [59]:
# Your implementation goes here
import NeuralNetwork2 as NN2
reload (NN2)
NeuralNetwork2 = NN2.NeuralNetwork2
my_mnist_net = NeuralNetwork2(28*28, 30, 30, 10, iterations=50, learning_rate=1.0)
evaluations = my_mnist_net.train(training_data, validation_data)
In [60]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [61]:
# Your implementation goes here
import NeuralNetwork2 as NN2
reload (NN2)
NeuralNetwork2 = NN2.NeuralNetwork2
my_mnist_net = NeuralNetwork2(28*28, 30, 30, 10, iterations=50, learning_rate=0.1)
evaluations = my_mnist_net.train(training_data, validation_data)
In [62]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [63]:
# Your implementation goes here
import NeuralNetwork2 as NN2
reload (NN2)
NeuralNetwork2 = NN2.NeuralNetwork2
my_mnist_net = NeuralNetwork2(28*28, 100, 100, 10, iterations=50, learning_rate=0.1)
evaluations = my_mnist_net.train(training_data, validation_data)
In [64]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")
In [65]:
# Your implementation goes here
import NeuralNetwork2 as NN2
reload (NN2)
NeuralNetwork2 = NN2.NeuralNetwork2
my_mnist_net = NeuralNetwork2(28*28, 100, 100, 10, iterations=50, learning_rate=1.0)
evaluations = my_mnist_net.train(training_data, validation_data)
In [66]:
UT.plot_curve(range(1,my_mnist_net.iterations+1),evaluations[0], "Error")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[1], "Training_Accuracy")
UT.plot_curve(range(1,my_mnist_net.iterations+1), evaluations[2], "Validation_Accuracy")