Moving to Shallow Neural Networks

In this tutorial, you'll implement a shallow neural network to classify digits ranging from 0 to 9. The dataset you'll use is quite famous, it's called 'MNIST' http://yann.lecun.com/exdb/mnist/. A French guy put it up, he's very famous in the DL comunity, he's called Yann Lecun and is now both head of the Facebook AI reseach program and head of something in the University of New York...

 First step

As a first step, I invite you to discover what is MNIST. You might find this notebook to be usefull, but feel to browse the web.

Once you get the idea, you can download the dataset


In [ ]:
# Download the dataset in this directory (does that work on Windows OS ?)
! wget http://deeplearning.net/data/mnist/mnist.pkl.gz

In [ ]:
import cPickle, gzip, numpy
import numpy as np

# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

def to_one_hot(y, n_classes=10): # You might want to use this as some point...
    _y = np.zeros((len(y), n_classes))
    _y[np.arange(len(y)), y] = 1
    return _y

X_train, y_train = train_set[0], train_set[1]
X_valid, y_valid = valid_set[0], valid_set[1]
X_test,  y_test  = test_set[0],  test_set[1]

You can now implement a 2 layers NN

Now that you have the data, you can build the a shallow neural network (SNN). I expect your SNN to have two layers.

- Layer 1 has 20 neurons with a sigmoid activation
- Layer 2 has 10 neurons with a softmax activation
- Loss is Negative Log Likelihood (wich is also the cross entropy)

You'll need to comment your work such that I understand that you understand what you are doing

1 - Define Parameters


In [ ]:
# HELPER 
def softmax(Z):
    """Z is a vector eg. [1,2,3]
    return: the vector softmax(Z) eg. [.09, .24, .67]
    """
    return np.exp(Z) / np.exp(Z).sum(axis=0)
    

# Define the variables here (initialize the weights with the np.random.normal module):
W1, b1 =
W2, b2 =

2 - Define Model


In [ ]:
def Pred(X, ??? ):
    """Explanations ...
    Arguments:
        X: An input image (as a vector)(shape is <784,1>)
    Returns : a vector ???
    """
    pass

def loss(P, Y):
    """Explanations : 
    Arguments:
        P: The prediction vector corresponding to an image (X^s)
        Y: The ground truth of an image
    Returns: a vector ???
    """
    pass

3 - Define Derivatives


In [ ]:
def dW1( ??? ):
    """Explanations ??
    Returns: A vector which is the derivative of the loss with respect to W1
    """
    pass 


def db1(L, ???):
    """Explanations ??
    Arguments:
        L is the loss af a sample (a scalar)
    Returns: A scalar which is the derivative of the Loss with respect to b1
    """
    pass


def dW2( ??? ):
    pass


def db2( ??? ):
    pass

4 - Train you model

You may use Standard Gradient Descent (SGD) to train your model. (Experiment with many learning rates)


In [ ]:

5 - Test the accuracy of your model on the Test set


In [ ]:


You can now go Deeper

Build a deeper model trained with SGD (You don't need to use the biases here)

- Layer 1 has 10 neurons with a sigmoid activation
- Layer 2 has 10 neurons with a sigmoid activation
- Layer 3 has 10 neurons with a sigmoid activation
- Layer 4 has 10 neurons with a sigmoid activation
- Layer 5 has 10 neurons with a sigmoid activation
- Layer 6 has 10 neurons with a softmax activation
- Loss is Negative Log Likelihood

Is it converging ? Why ? What's wrong ?


In [ ]: