In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
In [1]:
# Load pickled data
import pickle
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import accuracy_score
import tensorflow as tf
np.random.seed(23)
# TODO: fill this in based on where you saved the training and testing data
training_file = 'traffic-signs-data/train.p'
testing_file = 'traffic-signs-data/test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
In [2]:
### To start off let's do a basic data summary.
# TODO: number of training examples
n_train = X_train.shape[0]
# TODO: number of testing examples
n_test = X_test.shape[0]
# TODO: what's the shape of an image?
image_shape = X_train.shape
# TODO: how many classes are in the dataset
n_classes = np.unique(y_train).shape[0]
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
In [3]:
def plt_img(image):
""" Helper for plotting images """
fig = plt.figure(figsize=(8,24))
a=fig.add_subplot(1,3,1)
a.title.set_text('RBG')
plt.imshow(image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
a=fig.add_subplot(1,3,2)
a.title.set_text('Grayscale')
plt.imshow(image)
image = cv2.equalizeHist(image)
a=fig.add_subplot(1,3,3)
a.title.set_text('Equalized')
plt.imshow(image)
plt.show()
random_images = np.random.randint(0, n_train, 5)
for i in random_images:
print("#### Class {} ####".format(y_train[i]))
plt_img(X_train[i])
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
In [4]:
def preprocess(data):
"""Convert to grayscale, histogram equalize, and expand dims"""
imgs = np.ndarray((data.shape[0], 32, 32, 1), dtype=np.uint8)
for i, img in enumerate(data):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.equalizeHist(img)
img = np.expand_dims(img, axis=2)
imgs[i] = img
return imgs
X_train = preprocess(X_train)
X_test = preprocess(X_test)
def center_normaize(data, mean, std):
"""Center normalize images"""
data = data.astype('float32')
data -= mean
data /= std
return data
mean = np.mean(X_train)
std = np.std(X_train)
X_train = center_normaize(X_train, mean, std)
X_test = center_normaize(X_test, mean, std)
Answer:
In [5]:
## One Hot Encode Labels
ohe = LabelBinarizer().fit(y_train)
y_train_ohe = ohe.transform(y_train)
y_test_ohe = ohe.transform(y_test)
## Split off 20% for validation
_, X_valid, _, y_valid_ohe = train_test_split(X_train, y_train_ohe, test_size=0.2, random_state=23, stratify=y_train)
Answer:
The data is already split into training/test sets, which are used in their basic form to train the model. Although the labels are not perfectly balanced, I did not find it necessary to generate additional data because there are sufficient samples for each class train the model. The only significant change was to one-hot-encode the labels, which is required for the categorical crossentropy in TensorFlow.
I also used a stratified validation set (with replacement) to evaluate the the training accuracy after each epoch. This step helps provide a balanced evaluation of model performance and overfitting versus the test data.
In [6]:
# HyperParameters
learning_rate = 0.0005
batch_size = 256
training_epochs = 100
n_classes = 43
dropout_prob = 0.5
layer_width = {
'layer_1': 64,
'layer_2': 128,
'layer_3': 256,
'layer_4': 256,
'dense_1': 400,
'dense_2': 200
}
weights = {
'layer_1': tf.Variable(tf.truncated_normal(
[3, 3, 1, layer_width['layer_1']], stddev=0.001)),
'layer_2': tf.Variable(tf.truncated_normal(
[3, 3, layer_width['layer_1'], layer_width['layer_2']], stddev=0.001)),
'layer_3': tf.Variable(tf.truncated_normal(
[3, 3, layer_width['layer_2'], layer_width['layer_3']], stddev=0.001)),
'layer_4': tf.Variable(tf.truncated_normal(
[3, 3, layer_width['layer_3'], layer_width['layer_4']], stddev=0.001)),
'dense_1': tf.Variable(tf.truncated_normal(
[4*4*256, layer_width['dense_1']], stddev=0.001)),
'dense_2': tf.Variable(tf.truncated_normal(
[layer_width['dense_1'], layer_width['dense_2']], stddev=0.001)),
'out': tf.Variable(tf.truncated_normal(
[layer_width['dense_2'], n_classes], stddev=0.001))
}
biases = {
'layer_1': tf.Variable(tf.zeros(layer_width['layer_1'])),
'layer_2': tf.Variable(tf.zeros(layer_width['layer_2'])),
'layer_3': tf.Variable(tf.zeros(layer_width['layer_3'])),
'layer_4': tf.Variable(tf.zeros(layer_width['layer_4'])),
'dense_1': tf.Variable(tf.zeros(layer_width['dense_1'])),
'dense_2': tf.Variable(tf.zeros(layer_width['dense_2'])),
'out': tf.Variable(tf.zeros(n_classes))
}
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2):
return tf.nn.max_pool(
x,
ksize=[1, k, k, 1],
strides=[1, k, k, 1],
padding='SAME')
def conv_net(x, weights, biases, dropout):
conv1 = conv2d(x, weights['layer_1'], biases['layer_1'])
conv1 = maxpool2d(conv1)
conv2 = conv2d(conv1, weights['layer_2'], biases['layer_2'])
conv2 = maxpool2d(conv2)
conv3 = conv2d(conv2, weights['layer_3'], biases['layer_3'])
conv4 = conv2d(conv3, weights['layer_4'], biases['layer_4'])
conv4 = maxpool2d(conv4)
fc1 = tf.reshape(conv4, [-1, weights['dense_1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['dense_1']), biases['dense_1'])
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1, dropout_prob)
fc2 = tf.add(tf.matmul(fc1, weights['dense_2']), biases['dense_2'])
fc2 = tf.nn.relu(fc2)
fc2 = tf.nn.dropout(fc2, dropout_prob)
out = tf.add(tf.matmul(fc2, weights['out']), biases['out'])
return out
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer:
The final model is a convolutional neural network that is similer to the VGG-style architectures in the following paper: https://arxiv.org/abs/1409.1556
The traffic sign dataset is much smaller than the imagenet benchmark used for VGG, so the the number of layers and neurons have been scaled down. The final model contains four convolutional layers and two fully connected layers, each with 50% dropout.
Below is a summary of the model and the shape of its layers.
Pool: (16, 16, 64)
Conv: (16, 16, 128)
Pool: (8, 8, 128)
Conv: (8, 8, 256)
Pool: (4, 4, 256)
Flatten: (2048)
FullyConnected: (400)
Dropout 0.5
FullyConnected: (200)
Dropout 0.5
Output: (43)
In [ ]:
x = tf.placeholder("float", [None, 32, 32, 1])
y = tf.placeholder("float", [None, n_classes])
keep_prob = tf.placeholder(tf.float32)
logits = conv_net(x, weights, biases, keep_prob)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Save model for evaluation
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
total_batch = int(n_train/batch_size)
# shuffle data index for each epoch
rand_idx = np.random.permutation(n_train)
for i in range(total_batch):
offset = i*batch_size
off_end = offset+batch_size
batch_idx = rand_idx[offset:off_end]
batch_x = X_train[batch_idx]
batch_y = y_train_ohe[batch_idx]
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y, keep_prob: dropout_prob})
cost_tr, acc_tr = sess.run([cost, accuracy], feed_dict={x: X_valid, y: y_valid_ohe, keep_prob: 1.})
cost_ts, acc_ts = sess.run([cost, accuracy], feed_dict={x: X_test, y: y_test_ohe, keep_prob: 1.})
print("#### Epoch {} ####".format(epoch+1))
print("Train Cost: {:.5f} | Train Accuracy: {:.5f}".format(cost_tr, acc_tr))
print("Test Cost: {:.5f} | Test Accuracy: {:.5f}".format(cost_ts, acc_ts))
save_path = saver.save(sess, "models/model.ckpt")
print("Training Complete! Model saved in file: %s" % save_path)
Answer:
Optimizer: I chose the Adam algorithm optimizer over simple gradient descent. Adam uses the moving averages or momentum of the parameters for make steps, allowing it to find local minima more efficiently. In other words, it allows larger gradient descent steps to be taken if the momentum is trending in a certain direction.
Batch Size and Epochs: Various combinations of batch size and epochs lead to acceptable results. Larger batch sizes require more epochs to train, but usually compute a better gradient. At the very least, batch sizes should larger than the 43 class labels in this dataset. Also, I randomly shuffled the training data a each epoch to prevent overfitting. A combination of 100 epochs at a batch size of 256 was the final choice for this model.
Hyper Parameters: A learning rate of 0.0005 was optimal for the Adam optimizer and batch size. A higher learning rate lead to overfitting, while a smaller learning rate failed to converge. The only other hyperparameter is the dropout probability, which as 0.5. This means that an input element has a 50% chance of being activated when forward propagating.
Answer:
I approached this problem by modeling the neural net on the most successful image classifiers, most notably the VGG-style convnets. This problem is similar to MNIST in many ways, so I used existing models as inspiration of this problem. I also experimented with various combinations of hyperparameters, normalization, and layer configurations.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv
useful as it contains mappings from the class id (integer) to the actual sign name.
In [55]:
### Load the images and plot them here.
import os
image_dir = 'extra-images/'
new_images = [image_dir+i for i in os.listdir(image_dir)]
new_images = sorted(new_images, key=lambda x: x.split('.')[0])
def read_img(file_path):
img = cv2.imread(file_path, cv2.IMREAD_COLOR)
img = img[:,:,::-1]
return cv2.resize(img, (32, 32), interpolation=cv2.INTER_AREA)
for img in new_images:
print(img)
img = read_img(img)
plt_img(img)
# labels for new images
y_new = np.array([4,14,17,2,38,25])
X_new = np.ndarray((6, 32, 32, 3), dtype=np.uint8)
for i, img in enumerate(new_images):
img = read_img(img)
X_new[i] = img
X_new = preprocess(X_new)
X_new = center_normaize(X_new, mean, std)
y_new_ohe = ohe.transform(y_new)
Answer:
See images plotted above. They are similar to the the training images, but there is more noise in the background. Also, the original images are not perfectly square, which causes some distortion when resizing to 32x32.
In [56]:
pr = tf.argmax(logits, 1)
tk = tf.nn.top_k(logits, 5)
with tf.Session() as sess:
saver.restore(sess, "models/model.ckpt")
c, acc, top_k, preds = sess.run([cost, accuracy, tk, pr], feed_dict={x: X_new, y: y_new_ohe, keep_prob: 1.})
print("Cost: {:.5f} | New Image Accuracy: {:.5f}".format(c, acc))
Answer:
The model accuracy on the new images is 83%, or 5 out 6 correct predictions. This performance is worse than the provided dataset, but 6 images is a very small sample. I am confident that a larger dataset would give comparable results to the testing data.
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k
could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
In [57]:
for i in range(6):
print(new_images[i])
print("Predicted Class: {} | True Class: {}".format(preds_class[i], y_new[i]))
fig = plt.figure(figsize=(12,4))
sns.barplot(x=top_k[1][i], y=top_k[0][i])
plt.show()
Answer:
The only incorrect prediction is sign4.jpg, but correct class does rank within the top five logits. It appears the model has not learned the importance of the sign numbers for this partiqular class.
The model was extremely confident for all other predictions, with the actual softmax probabilities approaching 1.0 for the correct class
Answer:
read_img
function. preprocess
and center_normalize
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.