In this notebook, we're going to build a convolutional neural network for recognizing handwritten digits from scratch. By from scratch, I mean without using tensorflow's almighty neural network functions like tf.nn.conv2d. This way, you'll be able to uncover the blackbox and understand how CNN works more clearly. We'll use tensorflow interactively, so you can check the intermediate results along the way. This will also help your understanding.
Here are some functions we will implement from scratch in this notebook.
First things first, let's import TensorFlow
In [1]:
# GPUs or CPU
import tensorflow as tf
# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))
# Check for a GPU
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
These two lines of code will download and read in the handwritten digits data automatically.
In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/home/arasdar/datasets/MNIST_data/", one_hot=True, reshape=False)
We're going to look at only 100 examples at a time.
In [3]:
batch_size = 100
Here is the first example of data. It's a representation of a picture as a bunch of numbers.
In [4]:
example_X, example_ys = mnist.train.next_batch(batch_size)
example_X[0].shape
Out[4]:
We use the convenient InteractiveSession, for checking the intermediate results along the way. You can now use Tensor.eval() and Operation.run() without having to specify a session explicitly.
In [5]:
session = tf.InteractiveSession()
We start building the computation graph by creating placeholders for the input images(X) and target output labels(t).
In [6]:
X = tf.placeholder('float', [batch_size, 28, 28, 1])
t = tf.placeholder('float', [batch_size, 10])
Below is an overview of the model we will build. It starts with a convolutional layer, pass the result to ReLU, pool, affine layer, ReLU again, second affine layer, and then apply softmax function. Keep in mind this architecture while you're following the notebook.
In [7]:
filter_h, filter_w, filter_c, filter_n = 5, 5, 1, 30
In [8]:
W1 = tf.Variable(tf.random_normal([filter_h, filter_w, filter_c, filter_n], stddev=0.01))
b1 = tf.Variable(tf.zeros([filter_n]))
In [9]:
def convolution(X, W, b, padding, stride):
n, h, w, c = map(lambda d: d.value, X.get_shape())
filter_h, filter_w, filter_c, filter_n = [d.value for d in W.get_shape()]
out_h = (h + 2*padding - filter_h)//stride + 1
out_w = (w + 2*padding - filter_w)//stride + 1
X_flat = flatten(X, filter_h, filter_w, filter_c, out_h, out_w, stride, padding)
W_flat = tf.reshape(W, [filter_h*filter_w*filter_c, filter_n])
z = tf.matmul(X_flat, W_flat) + b # b: 1 X filter_n
return tf.transpose(tf.reshape(z, [out_h, out_w, n, filter_n]), [2, 0, 1, 3])
To compute convolution easily, we do a simple trick called flattening. After flattening, input data will be transformed into a 2D matrix, which allows for matrix multiplication with a filter (which is also flattened into 2D).
In [10]:
def flatten(X, window_h, window_w, window_c, out_h, out_w, stride=1, padding=0):
X_padded = tf.pad(X, [[0,0], [padding, padding], [padding, padding], [0,0]])
windows = []
for y in range(out_h):
for x in range(out_w):
window = tf.slice(X_padded, [0, y*stride, x*stride, 0], [-1, window_h, window_w, -1])
windows.append(window)
stacked = tf.stack(windows) # shape : [out_h, out_w, n, filter_h, filter_w, c]
return tf.reshape(stacked, [-1, window_c*window_w*window_h])
In [15]:
print(X.shape, X.dtype, W1.shape, W1.dtype, b1.shape, b1.dtype)
In [17]:
conv_layer = convolution(X, W1, b1, padding=2, stride=1)
conv_layer
Out[17]:
In [18]:
def relu(X):
return tf.maximum(X, tf.zeros_like(X))
In [20]:
conv_activation_layer = relu(conv_layer)
conv_activation_layer
Out[20]:
In [21]:
def max_pool(X, pool_h, pool_w, padding, stride):
n, h, w, c = [d.value for d in X.get_shape()]
out_h = (h + 2*padding - pool_h)//stride + 1
out_w = (w + 2*padding - pool_w)//stride + 1
X_flat = flatten(X, pool_h, pool_w, c, out_h, out_w, stride, padding)
pool = tf.reduce_max(tf.reshape(X_flat, [out_h, out_w, n, pool_h*pool_w, c]), axis=3)
return tf.transpose(pool, [2, 0, 1, 3])
In [22]:
pooling_layer = max_pool(conv_activation_layer, pool_h=2, pool_w=2, padding=0, stride=2)
pooling_layer
Out[22]:
In [23]:
batch_size, pool_output_h, pool_output_w, filter_n = [d.value for d in pooling_layer.get_shape()]
In [24]:
# number of nodes in the hidden layer
hidden_size = 100
In [25]:
W2 = tf.Variable(tf.random_normal([pool_output_h*pool_output_w*filter_n, hidden_size], stddev=0.01))
b2 = tf.Variable(tf.zeros([hidden_size]))
In [26]:
def affine(X, W, b):
n = X.get_shape()[0].value # number of samples
X_flat = tf.reshape(X, [n, -1])
return tf.matmul(X_flat, W) + b
In [27]:
affine_layer1 = affine(pooling_layer, W2, b2)
affine_layer1
Out[27]:
In [28]:
init = tf.global_variables_initializer()
init.run()
affine_layer1.eval({X:example_X, t:example_ys})[0]
Out[28]:
The above result shows the representation of the first example as a 100 dimention vector in the hidden layer.
In [29]:
affine_activation_layer1 = relu(affine_layer1)
affine_activation_layer1
Out[29]:
In [30]:
affine_activation_layer1.eval({X:example_X, t:example_ys})[0]
Out[30]:
This is after applying ReLU to the above representation. You can see that we set all the negative numbers to 0.
In [31]:
output_size = 10
In [33]:
W3 = tf.Variable(tf.random_normal([hidden_size, output_size], stddev=0.01))
b3 = tf.Variable(tf.zeros([output_size]))
W3, b3
Out[33]:
In [34]:
affine_layer2 = affine(affine_activation_layer1, W3, b3)
In [35]:
# because you have new variables, you need to initialize them.
init = tf.global_variables_initializer()
init.run()
In [36]:
affine_layer2.eval({X:example_X, t:example_ys})[0]
Out[36]:
In [37]:
def softmax(X):
X_centered = X - tf.reduce_max(X) # to avoid overflow
X_exp = tf.exp(X_centered)
exp_sum = tf.reduce_sum(X_exp, axis=1)
return tf.transpose(tf.transpose(X_exp) / exp_sum)
In [39]:
softmax_layer = softmax(affine_layer2)
softmax_layer
Out[39]:
In [40]:
softmax_layer.eval({X:example_X, t:example_ys})[0]
Out[40]:
We got somewhat evenly distributed probabilities over 10 digits. This is as expected because we haven't trained our model yet.
In [41]:
def cross_entropy_error(y, t):
return -tf.reduce_mean(tf.log(tf.reduce_sum(y * t, axis=1)))
In [42]:
loss = cross_entropy_error(softmax_layer, t)
In [43]:
loss.eval({X:example_X, t:example_ys})
Out[43]:
In [44]:
learning_rate = 0.1
trainer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
In [45]:
# number of times to iterate over training data
training_epochs = 2
In [46]:
# number of batches
num_batch = int(mnist.train.num_examples/batch_size)
num_batch
Out[46]:
In [61]:
501%10
Out[61]:
In [63]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(500):
X_mb, y_mb = mnist.train.next_batch(batch_size)
_, loss_val = sess.run([trainer, loss], feed_dict={X: X_mb, t: y_mb})
avg_cost = loss_val / num_batch #.eval(feed_dict={X:X_mb, t:y_mb})
# Every 1000 iterations = if remaining of or mode of i on 1000 is zero or i is the multiplication of 1000
# Print out the results
if i%100 == 0:
print(avg_cost)
# if i % 100 == 0:
# y_pred = sess.run(forward_step, feed_dict={X: X_val})
# acc = accuracy(y_val, y_pred)
# print('Iter: {} Loss: {:.4f} Validation: {}'.format(i, loss_val, acc))
In [53]:
avg_cost = 0
for i in range(50):
train_X, train_ys = mnist.train.next_batch(100)
trainer.run(feed_dict={X:train_X, t:train_ys})
avg_cost = loss.eval(feed_dict={X:train_X, t:train_ys}) / num_batch
print(avg_cost)
# # if net_type == 'cnn':
# X_mb = X_mb.reshape([-1, 28, 28, 1])
# _, loss_val = sess.run([train_step, loss], feed_dict={X: X_mb, y: y_mb})
# if i % 100 == 0:
# y_pred = sess.run(forward_step, feed_dict={X: X_val})
# acc = accuracy(y_val, y_pred)
# print('Iter: {} Loss: {:.4f} Validation: {}'.format(i, loss_val, acc))
In [38]:
from tqdm import tqdm_notebook
In [39]:
for epoch in range(training_epochs):
avg_cost = 0
for _ in tqdm_notebook(range(num_batch)):
train_X, train_ys = mnist.train.next_batch(batch_size)
trainer.run(feed_dict={X:train_X, t:train_ys})
avg_cost += loss.eval(feed_dict={X:train_X, t:train_ys}) / num_batch
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost), flush=True)
In [41]:
test_x = mnist.test.images[:batch_size]
test_t = mnist.test.labels[:batch_size]
In [46]:
def accuracy(network, t):
t_predict = tf.argmax(network, axis=1)
t_actual = tf.argmax(t, axis=1)
return tf.reduce_mean(tf.cast(tf.equal(t_predict, t_actual), tf.float32))
In [47]:
accuracy(softmax_layer, t).eval(feed_dict={X:test_x, t:test_t})
Out[47]:
We got an accuracy of 98%. Awesome!
In [48]:
session.close()
dreamgonfly@gmail.com