In this tutorial, you'll implement a shallow neural network to classify digits ranging from 0 to 9. The dataset you'll use is quite famous, it's called 'MNIST' http://yann.lecun.com/exdb/mnist/. A French guy put it up, he's very famous in the DL comunity, he's called Yann Lecun and is now both head of the Facebook AI reseach program and head of something in the University of New York...
As a first step, I invite you to discover what is MNIST. You might find this notebook to be usefull, but feel to browse the web.
Once you get the idea, you can download the dataset
In [ ]:
# Download the dataset in this directory (does that work on Windows OS ?)
! wget http://deeplearning.net/data/mnist/mnist.pkl.gz
In [ ]:
import cPickle, gzip, numpy
import numpy as np
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
def to_one_hot(y, n_classes=10): # You might want to use this as some point...
_y = np.zeros((len(y), n_classes))
_y[np.arange(len(y)), y] = 1
return _y
X_train, y_train = train_set[0], train_set[1]
X_valid, y_valid = valid_set[0], valid_set[1]
X_test, y_test = test_set[0], test_set[1]
Now that you have the data, you can build the a shallow neural network (SNN). I expect your SNN to have two layers.
- Layer 1 has 20 neurons with a sigmoid activation
- Layer 2 has 10 neurons with a softmax activation
- Loss is Negative Log Likelihood (wich is also the cross entropy)
You'll need to comment your work such that I understand that you understand what you are doing
In [ ]:
# HELPER
def softmax(Z):
"""Z is a vector eg. [1,2,3]
return: the vector softmax(Z) eg. [.09, .24, .67]
"""
return np.exp(Z) / np.exp(Z).sum(axis=0)
# Define the variables here (initialize the weights with the np.random.normal module):
W1, b1 =
W2, b2 =
In [ ]:
def Pred(X, ??? ):
"""Explanations ...
Arguments:
X: An input image (as a vector)(shape is <784,1>)
Returns : a vector ???
"""
pass
def loss(P, Y):
"""Explanations :
Arguments:
P: The prediction vector corresponding to an image (X^s)
Y: The ground truth of an image
Returns: a vector ???
"""
pass
In [ ]:
def dW1( ??? ):
"""Explanations ??
Returns: A vector which is the derivative of the loss with respect to W1
"""
pass
def db1(L, ???):
"""Explanations ??
Arguments:
L is the loss af a sample (a scalar)
Returns: A scalar which is the derivative of the Loss with respect to b1
"""
pass
def dW2( ??? ):
pass
def db2( ??? ):
pass
In [ ]:
In [ ]:
Build a deeper model trained with SGD (You don't need to use the biases here)
- Layer 1 has 10 neurons with a sigmoid activation
- Layer 2 has 10 neurons with a sigmoid activation
- Layer 3 has 10 neurons with a sigmoid activation
- Layer 4 has 10 neurons with a sigmoid activation
- Layer 5 has 10 neurons with a sigmoid activation
- Layer 6 has 10 neurons with a softmax activation
- Loss is Negative Log Likelihood
Is it converging ? Why ? What's wrong ?
In [ ]: