Convolutional Neural Networks

Thanks to these notes for much of the material for this hands on session.


In [1]:
from IPython.display import IFrame

IFrame('http://cs231n.github.io//assets/conv-demo/index.html', 850, 700)


Out[1]:

Review: Conv Layer

The example above serves as a visual guide to the parameters we are working with in a convolutional NN which:

  • Accepts a volume of size $W_1 \times H_1 \times D_1$
  • Requires four hyperparameters:
    • Number of filters $K$,
    • their spatial extent $F$,
    • the stride $S$,
    • the amount of zero padding $P$.
  • Produces a volume of size $W_2 \times H_2 \times D_2$ where:
    • $W_2 = (W_1 - F + 2P)/S + 1$
    • $H_2 = (H_1 - F + 2P)/S + 1$ (i.e. width and height are computed equally by symmetry)
    • $D_2 = K$
  • With parameter sharing, it introduces $F \cdot F \cdot D_1$ weights per filter, for a total of $(F \cdot F \cdot D_1) \cdot K$ weights and $K$ biases.
  • In the output volume, the $d$-th depth slice (of size $W_2 \times H_2$) is the result of performing a valid convolution of the $d$-th filter over the input volume with a stride of $S$, and then offset by $d$-th bias.

Local connectivity basics

Lets' get familiar with the how a convolutional layer works, and reduces the number of required weights compared with a fully connected layer, by working through an example.

Suppose we have a 32x32x3 input image with 10 3x3 filters of stride 1, pad 1.

Q1 How many neurons are in this layer?

Q2 How many weights would be required for a fully connected layer with this many neurons?

Q3 How many weights are required for each neuron in our convolutional layer? How many weights total for the layer, considering weight sharing?

Exercise: implement forward propogation for a convolutional layer

Let's implement forward propogation for a conv layer. Fill in the code below. There is an example with a solution output in place to help you see how close your solution is.


In [3]:
import numpy as np


def conv_forward_naive(x, w, b, conv_param):
    """
    A naive implementation of the forward pass for a convolutional layer.
    The input consists of N data points, each with C channels, height H and width
    W. We convolve each input with F different filters, where each filter spans
    all C channels and has height HH and width HH.
    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input.
    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    (N, C, H, W) = x.shape
    (F, _, HH, WW) = w.shape
    stride = conv_param['stride']
    pad = conv_param['pad']
    H_prime = 1 + int((H + 2 * pad - HH) / stride)
    W_prime = 1 + int((W + 2 * pad - WW) / stride)
    out = np.zeros((N, F, H_prime, W_prime))

    # your code goes here!
    # hint: you can use the function np.pad for padding    


    # end of your code

    cache = (x, w, b, conv_param)
    return out, cache


#
# Test bed
#

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))


x_shape = (2, 3, 4, 4)
w_shape = (3, 3, 4, 4)
x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)
w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)
b = np.linspace(-0.1, 0.2, num=3)

conv_param = {'stride': 2, 'pad': 1}
out, _ = conv_forward_naive(x, w, b, conv_param)
correct_out = np.array([[[[[-0.08759809, -0.10987781],
                           [-0.18387192, -0.2109216]],
                          [[0.21027089, 0.21661097],
                           [0.22847626, 0.23004637]],
                          [[0.50813986, 0.54309974],
                           [0.64082444, 0.67101435]]],
                         [[[-0.98053589, -1.03143541],
                           [-1.19128892, -1.24695841]],
                          [[0.69108355, 0.66880383],
                           [0.59480972, 0.56776003]],
                          [[2.36270298, 2.36904306],
                           [2.38090835, 2.38247847]]]]])

# Compare your output to solution. Should be within 1e-8 (and print out 0.0000)
print('Testing conv_forward_naive')
print('difference: {:.4f}'.format(np.asscalar(rel_error(out, correct_out))))


Testing conv_forward_naive
difference: 1.0000