CNN from scratch

In this notebook, we're going to build a convolutional neural network for recognizing handwritten digits from scratch. By from scratch, I mean without using tensorflow's almighty neural network functions like tf.nn.conv2d. This way, you'll be able to uncover the blackbox and understand how CNN works more clearly. We'll use tensorflow interactively, so you can check the intermediate results along the way. This will also help your understanding.

Outline

Here are some functions we will implement from scratch in this notebook.

  1. Convolutional layer
  2. ReLU
  3. Max Pooling
  4. Affine layer (Fully connected layer)
  5. Softmax
  6. Cross entropy error

First things first, let's import TensorFlow


In [1]:
# GPUs or CPU
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))


TensorFlow Version: 1.4.1
Default GPU Device: /device:GPU:0

These two lines of code will download and read in the handwritten digits data automatically.


In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/home/arasdar/datasets/MNIST_data/", one_hot=True, reshape=False)


Extracting /home/arasdar/datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /home/arasdar/datasets/MNIST_data/t10k-labels-idx1-ubyte.gz

We're going to look at only 100 examples at a time.


In [3]:
batch_size = 100

Here is the first example of data. It's a representation of a picture as a bunch of numbers.


In [4]:
example_X, example_ys = mnist.train.next_batch(batch_size)
example_X[0].shape


Out[4]:
(28, 28, 1)

We use the convenient InteractiveSession, for checking the intermediate results along the way. You can now use Tensor.eval() and Operation.run() without having to specify a session explicitly.


In [5]:
session = tf.InteractiveSession()

We start building the computation graph by creating placeholders for the input images(X) and target output labels(t).


In [6]:
X = tf.placeholder('float', [batch_size, 28, 28, 1])
t = tf.placeholder('float', [batch_size, 10])

Below is an overview of the model we will build. It starts with a convolutional layer, pass the result to ReLU, pool, affine layer, ReLU again, second affine layer, and then apply softmax function. Keep in mind this architecture while you're following the notebook.

$$ conv - relu - pool - affine - relu - affine - softmax$$

Convolutional layer


In [7]:
filter_h, filter_w, filter_c, filter_n = 5, 5, 1, 30

In [8]:
W1 = tf.Variable(tf.random_normal([filter_h, filter_w, filter_c, filter_n], stddev=0.01))
b1 = tf.Variable(tf.zeros([filter_n]))

In [9]:
def convolution(X, W, b, padding, stride):
    n, h, w, c = map(lambda d: d.value, X.get_shape())
    filter_h, filter_w, filter_c, filter_n = [d.value for d in W.get_shape()]
    
    out_h = (h + 2*padding - filter_h)//stride + 1
    out_w = (w + 2*padding - filter_w)//stride + 1

    X_flat = flatten(X, filter_h, filter_w, filter_c, out_h, out_w, stride, padding)
    W_flat = tf.reshape(W, [filter_h*filter_w*filter_c, filter_n])
    
    z = tf.matmul(X_flat, W_flat) + b     # b: 1 X filter_n
    
    return tf.transpose(tf.reshape(z, [out_h, out_w, n, filter_n]), [2, 0, 1, 3])

To compute convolution easily, we do a simple trick called flattening. After flattening, input data will be transformed into a 2D matrix, which allows for matrix multiplication with a filter (which is also flattened into 2D).


In [10]:
def flatten(X, window_h, window_w, window_c, out_h, out_w, stride=1, padding=0):
    
    X_padded = tf.pad(X, [[0,0], [padding, padding], [padding, padding], [0,0]])

    windows = []
    for y in range(out_h):
        for x in range(out_w):
            window = tf.slice(X_padded, [0, y*stride, x*stride, 0], [-1, window_h, window_w, -1])
            windows.append(window)
    stacked = tf.stack(windows) # shape : [out_h, out_w, n, filter_h, filter_w, c]

    return tf.reshape(stacked, [-1, window_c*window_w*window_h])

In [15]:
print(X.shape, X.dtype, W1.shape, W1.dtype, b1.shape, b1.dtype)


(100, 28, 28, 1) <dtype: 'float32'> (5, 5, 1, 30) <dtype: 'float32_ref'> (30,) <dtype: 'float32_ref'>

In [17]:
conv_layer = convolution(X, W1, b1, padding=2, stride=1)
conv_layer


Out[17]:
<tf.Tensor 'transpose_1:0' shape=(100, 28, 28, 30) dtype=float32>

ReLU


In [18]:
def relu(X):
    return tf.maximum(X, tf.zeros_like(X))

In [20]:
conv_activation_layer = relu(conv_layer)
conv_activation_layer


Out[20]:
<tf.Tensor 'Maximum_1:0' shape=(100, 28, 28, 30) dtype=float32>

Max pooling


In [21]:
def max_pool(X, pool_h, pool_w, padding, stride):
    n, h, w, c = [d.value for d in X.get_shape()]
    
    out_h = (h + 2*padding - pool_h)//stride + 1
    out_w = (w + 2*padding - pool_w)//stride + 1

    X_flat = flatten(X, pool_h, pool_w, c, out_h, out_w, stride, padding)

    pool = tf.reduce_max(tf.reshape(X_flat, [out_h, out_w, n, pool_h*pool_w, c]), axis=3)
    return tf.transpose(pool, [2, 0, 1, 3])

In [22]:
pooling_layer = max_pool(conv_activation_layer, pool_h=2, pool_w=2, padding=0, stride=2)
pooling_layer


Out[22]:
<tf.Tensor 'transpose_2:0' shape=(100, 14, 14, 30) dtype=float32>

Affine layer 1


In [23]:
batch_size, pool_output_h, pool_output_w, filter_n = [d.value for d in pooling_layer.get_shape()]

In [24]:
# number of nodes in the hidden layer
hidden_size = 100

In [25]:
W2 = tf.Variable(tf.random_normal([pool_output_h*pool_output_w*filter_n, hidden_size], stddev=0.01))
b2 = tf.Variable(tf.zeros([hidden_size]))

In [26]:
def affine(X, W, b):
    n = X.get_shape()[0].value # number of samples
    X_flat = tf.reshape(X, [n, -1])
    return tf.matmul(X_flat, W) + b

In [27]:
affine_layer1 = affine(pooling_layer, W2, b2)
affine_layer1


Out[27]:
<tf.Tensor 'add_2:0' shape=(100, 100) dtype=float32>

In [28]:
init = tf.global_variables_initializer()
init.run()
affine_layer1.eval({X:example_X, t:example_ys})[0]


Out[28]:
array([ 0.00925174, -0.00247716,  0.00373842, -0.01411869,  0.01808271,
       -0.00904754, -0.00638073, -0.00591116, -0.01707015, -0.0120058 ,
       -0.03524879, -0.00075297, -0.02764303, -0.00427013, -0.00041813,
       -0.014628  , -0.01604748,  0.01305443,  0.00531883, -0.0068157 ,
        0.01700079, -0.00695998,  0.01047445,  0.00686595,  0.00277898,
       -0.01327773,  0.02326751,  0.00105084, -0.00554805,  0.00553582,
        0.00587317, -0.03639751, -0.01127398, -0.01640638, -0.00076795,
       -0.01045864,  0.04147588, -0.01156281,  0.02095137, -0.00906414,
       -0.00811423,  0.00924253, -0.01614905, -0.00712552, -0.01189603,
       -0.00500245,  0.02677673, -0.03063465,  0.01832008, -0.02034824,
        0.00077786,  0.00186712,  0.02998282,  0.02252693, -0.02051033,
        0.00244616, -0.00651262, -0.00623711,  0.01035882, -0.00499087,
        0.01756883, -0.00748296,  0.01257539, -0.02312755,  0.01175804,
       -0.02441126,  0.0081053 , -0.04168748,  0.01337018, -0.01475471,
        0.01009009,  0.00202592,  0.01153922, -0.0029256 , -0.00095263,
        0.01502997,  0.00875417,  0.02194441, -0.0025549 , -0.01146321,
       -0.01528439,  0.03253143,  0.00252969,  0.01415024, -0.01252698,
       -0.04096382, -0.0087318 ,  0.04963036,  0.000627  , -0.01366708,
       -0.00637358, -0.02686524, -0.00803551, -0.0043704 ,  0.00203722,
       -0.02695317, -0.02998732, -0.00332387,  0.01216408,  0.01784112],
      dtype=float32)

The above result shows the representation of the first example as a 100 dimention vector in the hidden layer.


In [29]:
affine_activation_layer1 = relu(affine_layer1)
affine_activation_layer1


Out[29]:
<tf.Tensor 'Maximum_2:0' shape=(100, 100) dtype=float32>

In [30]:
affine_activation_layer1.eval({X:example_X, t:example_ys})[0]


Out[30]:
array([0.00925174, 0.        , 0.00373842, 0.        , 0.01808271,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.01305443, 0.00531883, 0.        ,
       0.01700079, 0.        , 0.01047445, 0.00686595, 0.00277898,
       0.        , 0.02326751, 0.00105084, 0.        , 0.00553582,
       0.00587317, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.04147588, 0.        , 0.02095137, 0.        ,
       0.        , 0.00924253, 0.        , 0.        , 0.        ,
       0.        , 0.02677673, 0.        , 0.01832008, 0.        ,
       0.00077786, 0.00186712, 0.02998282, 0.02252693, 0.        ,
       0.00244616, 0.        , 0.        , 0.01035882, 0.        ,
       0.01756883, 0.        , 0.01257539, 0.        , 0.01175804,
       0.        , 0.0081053 , 0.        , 0.01337018, 0.        ,
       0.01009009, 0.00202592, 0.01153922, 0.        , 0.        ,
       0.01502997, 0.00875417, 0.02194441, 0.        , 0.        ,
       0.        , 0.03253143, 0.00252969, 0.01415024, 0.        ,
       0.        , 0.        , 0.04963036, 0.000627  , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.00203722,
       0.        , 0.        , 0.        , 0.01216408, 0.01784112],
      dtype=float32)

This is after applying ReLU to the above representation. You can see that we set all the negative numbers to 0.

Affine layer 2


In [31]:
output_size = 10

In [33]:
W3 = tf.Variable(tf.random_normal([hidden_size, output_size], stddev=0.01))
b3 = tf.Variable(tf.zeros([output_size]))
W3, b3


Out[33]:
(<tf.Variable 'Variable_6:0' shape=(100, 10) dtype=float32_ref>,
 <tf.Variable 'Variable_7:0' shape=(10,) dtype=float32_ref>)

In [34]:
affine_layer2 = affine(affine_activation_layer1, W3, b3)

In [35]:
# because you have new variables, you need to initialize them.
init = tf.global_variables_initializer()
init.run()

In [36]:
affine_layer2.eval({X:example_X, t:example_ys})[0]


Out[36]:
array([ 0.00182686,  0.00224067,  0.00030621, -0.00095036,  0.00034657,
       -0.00138684,  0.00118214,  0.00020053, -0.00027152,  0.00456122],
      dtype=float32)

Softmax


In [37]:
def softmax(X):
    X_centered = X - tf.reduce_max(X) # to avoid overflow
    X_exp = tf.exp(X_centered)
    exp_sum = tf.reduce_sum(X_exp, axis=1)
    return tf.transpose(tf.transpose(X_exp) / exp_sum)

In [39]:
softmax_layer = softmax(affine_layer2)
softmax_layer


Out[39]:
<tf.Tensor 'transpose_6:0' shape=(100, 10) dtype=float32>

In [40]:
softmax_layer.eval({X:example_X, t:example_ys})[0]


Out[40]:
array([0.10010205, 0.10014348, 0.09994995, 0.09982443, 0.09995398,
       0.09978087, 0.10003753, 0.09993938, 0.09989222, 0.10037614],
      dtype=float32)

We got somewhat evenly distributed probabilities over 10 digits. This is as expected because we haven't trained our model yet.

Cross entropy error


In [41]:
def cross_entropy_error(y, t):
    return -tf.reduce_mean(tf.log(tf.reduce_sum(y * t, axis=1)))

In [42]:
loss = cross_entropy_error(softmax_layer, t)

In [43]:
loss.eval({X:example_X, t:example_ys})


Out[43]:
2.3026032

In [44]:
learning_rate = 0.1
trainer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

In [45]:
# number of times to iterate over training data
training_epochs = 2

In [46]:
# number of batches
num_batch = int(mnist.train.num_examples/batch_size)
num_batch


Out[46]:
550

In [61]:
501%10


Out[61]:
1

In [63]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(500):
    X_mb, y_mb = mnist.train.next_batch(batch_size)

    _, loss_val = sess.run([trainer, loss], feed_dict={X: X_mb, t: y_mb})
    avg_cost = loss_val / num_batch #.eval(feed_dict={X:X_mb, t:y_mb})
    
    # Every 1000 iterations = if remaining of or mode of i on 1000 is zero or i is the multiplication of 1000
    # Print out the results
    if i%100 == 0:
        print(avg_cost)

#     if i % 100 == 0:
#         y_pred = sess.run(forward_step, feed_dict={X: X_val})
#         acc = accuracy(y_val, y_pred)

#         print('Iter: {} Loss: {:.4f} Validation: {}'.format(i, loss_val, acc))


0.004186583432284268
0.00411091457713734
0.0010877627676183527
0.0008315690539099954
0.0009691767259077592

In [53]:
avg_cost = 0
for i in range(50):
    train_X, train_ys = mnist.train.next_batch(100)
    trainer.run(feed_dict={X:train_X, t:train_ys})
    avg_cost = loss.eval(feed_dict={X:train_X, t:train_ys}) / num_batch
    print(avg_cost)


#     #         if net_type == 'cnn':
#     X_mb = X_mb.reshape([-1, 28, 28, 1])

#     _, loss_val = sess.run([train_step, loss], feed_dict={X: X_mb, y: y_mb})

#     if i % 100 == 0:
#         y_pred = sess.run(forward_step, feed_dict={X: X_val})
#         acc = accuracy(y_val, y_pred)

#         print('Iter: {} Loss: {:.4f} Validation: {}'.format(i, loss_val, acc))


0.0002530513026497581
0.0005116122961044312
0.00015374977480281482
0.00024025369774211537
0.0002687355875968933
0.00021916113116524435
0.0004313530434261669
0.00013656723228367893
0.00027604070576754485
0.0002509538422931324
0.00026693642139434813
0.0004813069647008722
0.00024193882942199707
0.00022291856733235447
0.00028267844156785446
0.0002987148545005105
0.00020740299062295392
0.00031692970882762564
0.0002276279709555886
0.00022814975543455644
0.00013362857428464022
0.00020789036696607416
0.00018464168364351445
0.00041824812238866634
0.0003933531045913696
0.0002731123566627502
0.00023209579966285013
0.0002516916394233704
0.00020734738219868053
0.0002037431841546839
0.0002385480837388472
0.0001330316202207045
0.00036589893427762117
0.00029510717500339856
0.00017985174601728267
0.00030295583334836096
0.0002563689784570174
0.00021367035128853538
0.00028936746445569126
0.00030755351890217173
0.00022703165357763116
0.00020914259282025423
0.0001804633167656985
0.00016635522246360779
0.00019020209258252923
0.00011095401915636929
0.00024563537402586504
0.000314336744221774
0.00040946258739991623
0.00014553071423010393

In [38]:
from tqdm import tqdm_notebook

In [39]:
for epoch in range(training_epochs):
    avg_cost = 0
    for _ in tqdm_notebook(range(num_batch)):
        train_X, train_ys = mnist.train.next_batch(batch_size)
        trainer.run(feed_dict={X:train_X, t:train_ys})
        avg_cost += loss.eval(feed_dict={X:train_X, t:train_ys}) / num_batch

    print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost), flush=True)


Epoch: 0001 cost= 0.801890662
Epoch: 0002 cost= 0.109185281

In [41]:
test_x = mnist.test.images[:batch_size]
test_t = mnist.test.labels[:batch_size]

In [46]:
def accuracy(network, t):
    
    t_predict = tf.argmax(network, axis=1)
    t_actual = tf.argmax(t, axis=1)

    return tf.reduce_mean(tf.cast(tf.equal(t_predict, t_actual), tf.float32))

In [47]:
accuracy(softmax_layer, t).eval(feed_dict={X:test_x, t:test_t})


Out[47]:
0.98000002

We got an accuracy of 98%. Awesome!


In [48]:
session.close()

dreamgonfly@gmail.com