This is the Project 3 in Self Driving Car Nano degree from Udacity
The purpose of this project is using deep learning to train a deep neural network to drive a car automously in a simulator.
The goals / steps of this project are the following:
Here I will consider the rubric points individually and describe how I addressed each point in my implementation.
My project includes the following files:
Using the Udacity provided simulator and my drive.py file, the car can be driven autonomously around the track by executing
python drive.py model.h5
The model.py file contains the code for training and saving the convolution neural network. The file shows the pipeline I used for training and validating the model, and it contains comments to explain how the code works.
My model is based on model proposed by the NVIDIA in this paper. The NVIDIA model is well-documented and not complicated and has been proven its effectiveness in self driving car control.
The NVIDIA model architecture is summarized as follows:
Layer | Description |
---|---|
Input | 66x200x3 RGB image |
Lambda | Normalized |
Convolution 5x5x24 | valid padding, subsample(2,2), activation ReLU, outputs 31x98x24 |
Convolution 5x5x36 | valid padding, subsample(2,2), activation ReLU, outputs 14x47x36 |
Convolution 5x5x48 | valid padding, subsample(2,2), activation ReLU, outputs 5x22x48 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 3x20x64 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 1x18x64 |
Flatten | outputs 1152 |
Fully connected | activation ReLU, outputs 100 |
Fully connected | activation ReLU, outputs 50 |
Fully connected | activation ReLU, outputs 20 |
Fully connected | outputs 1 |
My initial model is modified from NVIDIA model with some modifications as follows:
Layer | Description |
---|---|
Input | 160x320x3 RGB image |
Lambda | Cropping, outputs 75x320x3 |
Lambda | Normalized, outputs 75x320x3 |
Convolution 5x5x24 | valid padding, subsample(2,2), activation ReLU, outputs 36x158x24 |
Convolution 5x5x36 | valid padding, subsample(2,2), activation ReLU, outputs 16x77x36 |
Convolution 5x5x48 | valid padding, subsample(2,2), activation ReLU, outputs 6x37x48 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 4x35x64 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 2x33x64 |
Dropout | keeping probability 0.5, outputs 2x33x64 |
Flatten | outputs 4224 |
Fully connected | activation ReLU, outputs 100 |
Fully connected | activation ReLU, outputs 50 |
Fully connected | activation ReLU, outputs 20 |
Fully connected | outputs 1 |
Total params: 559,419
Training data was chosen to keep the vehicle driving on the road. I used a combination of
For details about how I created the training data, see the next section.
The overall strategy for deriving a model architecture was to use a well-known model that is sucessful in the field of self driving car and then twist model parameters as well as use appropriate image pre-processing and augmentation to the training dataset. The model was trained and validated on different data sets to ensure that the model was not overfitting. The model was tested by running it through the simulator and ensuring that the vehicle could stay on the track.
My first step was to use a convolution neural network model similar to the NVIDIA model in this paper. I thought this model might be appropriate because the NVIDIA model is well-documented and not complicated and has been proven its effectiveness in self driving car control.
Then I focused on image pre-processing methods which including:
In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. I found that my first model had a low mean squared error on the training set but a high mean squared error on the validation set. This implied that the model was overfitting.
To combat the overfitting, I modified the model so that the model contains a dropout layer with keeping probability 0.5 in order to reduce overfitting.
The final step was to run the simulator to see how well the car was driving around track one.
There were a few spots where the vehicle fell off the track. To improve the driving behavior in these cases, I collected more training data in the track one, as well as generate more training data using image augmentation methods (see details in the next part).
At the end of the process, the vehicle is able to drive autonomously around the track without leaving the road.
The final model architecture consisted of a convolution neural network with the following layers and layer sizes.
Model 1: without resizing: better performance, slower training time.
Layer | Description |
---|---|
Input | 160x320x3 RGB image |
Lambda | Cropping, outputs 75x320x3 |
Lambda | Normalized, outputs 75x320x3 |
Convolution 5x5x24 | valid padding, subsample(2,2), activation ReLU, outputs 36x158x24 |
Convolution 5x5x36 | valid padding, subsample(2,2), activation ReLU, outputs 16x77x36 |
Convolution 5x5x48 | valid padding, subsample(2,2), activation ReLU, outputs 6x37x48 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 4x35x64 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 2x33x64 |
Dropout | keeping probability 0.5, outputs 2x33x64 |
Flatten | outputs 4224 |
Fully connected | activation ReLU, outputs 100 |
Fully connected | activation ReLU, outputs 50 |
Fully connected | activation ReLU, outputs 20 |
Fully connected | outputs 1 |
Model 2: with resizing: faster training time.
Layer | Description |
---|---|
Input | 160x320x3 RGB image |
Lambda | Cropping, outputs 75x320x3 |
Lambda | Resized, outputs 66x200x3 |
Lambda | Normalized, outputs 66x200x3 |
Convolution 5x5x24 | valid padding, subsample(2,2), activation ReLU, outputs 31x98x24 |
Convolution 5x5x36 | valid padding, subsample(2,2), activation ReLU, outputs 14x47x36 |
Convolution 5x5x48 | valid padding, subsample(2,2), activation ReLU, outputs 5x22x48 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 3x20x64 |
Convolution 3x3x64 | valid padding, subsample(2,2), activation ReLU, outputs 1x18x64 |
Dropout | keeping probability 0.5, outputs 1152, outputs 1x18x64 |
Flatten | outputs 1152 |
Fully connected | activation ReLU, outputs 100 |
Fully connected | activation ReLU, outputs 50 |
Fully connected | activation ReLU, outputs 20 |
Fully connected | outputs 1 |
To capture good driving behavior, I first recorded two laps on track one using center lane driving. Here is an example image of center lane driving on track one:
I then recorded the vehicle recovering from the left side and right sides of the road back to center so that the vehicle would learn how to go back the center of driving from the left side and right sides. These images show what a recovery looks like:
To augment the data sat, I also randomly flipped images and angles thinking that this would balance the left turn bias because the tracks are left-turn. For example, here is an image that has then been flipped:
I also recorded the vehicle running around the curves only so that the vehicle would learn how to go around the curves.
I also recorded the counter-clockwise driving on track one and track two for better model generalization.
After the collection process, I had 36,202 number of data points. I then preprocessed this data by cropping the sky and steering wheel parts in the images.
I finally randomly shuffled the data set and put 20% of the data into a validation set.
I used this training data for training the model. The validation set helped determine if the model was over or under fitting.
I used batch generator to create training and validation batches with batch size = 32 to help the training and validation process faster and without out-of-memory problem.
I used an adam optimizer so that manually training the learning rate wasn't necessary.
The simulation result after first training times was good at the straight road but sometimes the vehicle went of the track.
Hence, I continued improve the image augmentation with the left and right cameras. The simulator captured images from three cameras mounted on the car: a center, right and left camera to overcome the issue of recovering from being off-center. I randomly selected one image from three camera and added steering correction if the left or right image was selected as in the lecture. This kind of image augmentation was applied in the training phase only. Image from center camera always selected in the validation phase.
Here is images from the center, left, and right cameras:
I added a ModelCheckpoint() object to save all the model weights after each epoch of training and tested all models in the simulator with autonomous mode. The ideal number of epochs was 10 epochs as evidenced by learning curve plotted at the end of training.
At the end of the process, the vehicle is able to drive autonomously around the track one without leaving the road.
I also tested the same network with adding a Lambda layer for resizing the images to the size of 66 pixel height x 200 pixel width like NVIDIA network. At the epoch = 10, the vehicle is able to drive autonomously around the track one but sometimes it went off the track a little bit. So I went with the network without resizing image.
In [1]:
import os
import csv
import cv2
import numpy as np
import sklearn
In [2]:
def flip_image(img, angle):
"""
Randomly flip the image and adjust the steering angle.
"""
if np.random.rand() < 0.5:
img = cv2.flip(img, 1)
angle = -angle
return img, angle
In [3]:
# Read the driving_log.csv and get paths of images as samples
samples = []
with open('../../P3_Data/driving_log.csv') as csvfile:
reader = csv.reader(csvfile)
for line in reader:
samples.append(line)
from sklearn.model_selection import train_test_split
train_samples, validation_samples = train_test_split(samples, test_size=0.2, random_state=0)
In [4]:
def select_image(batch_sample, is_training=False):
"""
Randomly select an image among the center, left or right images, and adjust the steering angle.
This way, we can teach your model how to steer if the car drifts off to the left or the right.
"""
if is_training == True:
choice = np.random.choice(3)
else:
choice = 0
name = '../../P3_Data/IMG/'+batch_sample[choice].split('/')[-1]
image = cv2.imread(name)
steering_center = float(batch_sample[3])
# create adjusted steering measurements for the side camera images
correction = 0.2 # this is a parameter to tune
steering_left = steering_center + correction
steering_right = steering_center - correction
if choice == 0:
return image, steering_center
elif choice == 1:
return image, steering_left
return image, steering_right
def generator(samples, batch_size=32, is_training=False):
num_samples = len(samples)
while 1: # Loop forever so the generator never terminates shuffle(samples)
for offset in range(0, num_samples, batch_size):
batch_samples = samples[offset:offset+batch_size]
images = []
angles = []
for batch_sample in batch_samples:
image, angle = select_image(batch_sample, is_training=is_training)
images.append(image)
angles.append(angle)
# Get training data
X_train = np.array(images)
y_train = np.array(angles)
# Randomly flip image if in training mode
if is_training == True:
X_train_augmented, y_train_augmented = [], []
for x, y in zip(X_train, y_train):
x_augmented, y_augmented = flip_image(x, y)
X_train_augmented.append(x_augmented)
y_train_augmented.append(y_augmented)
X_train_augmented = np.array(X_train_augmented)
y_train_augmented = np.array(y_train_augmented)
yield sklearn.utils.shuffle(X_train_augmented, y_train_augmented)
else:
yield sklearn.utils.shuffle(X_train, y_train)
# compile and train the model using the generator function
train_generator = generator(train_samples, batch_size=32, is_training=True)
validation_generator = generator(validation_samples, batch_size=32, is_training=False)
In [5]:
# Build network architecture
# for a regression network (need only 1 neuron at output)
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.layers.convolutional import Convolution2D, Cropping2D
from keras.optimizers import Adam
import matplotlib.pyplot as plt
from keras.backend import tf as ktf
from keras.callbacks import ModelCheckpoint
row, col, ch = 160, 320, 3 # image format
input_shape = (row,col,ch)
def resize(image):
from keras.backend import tf as ktf
resized = ktf.image.resize_images(image, (66, 200))
return resized
# Create the Sequential model
model = Sequential()
## Set up lambda layers for data preprocessing:
# Set up cropping2D layer: cropping (top, bottom) (left, right) pixels
model.add(Cropping2D(cropping=((60,25), (0,0)), input_shape=input_shape))
# Add Lambda layer for resizing image (image, height, width, data_format)
#model.add(Lambda(resize, input_shape=(75, 320, 3), output_shape=(66, 200, 3)))
# Add Lambda layer for normalization
model.add(Lambda(lambda x: (x / 127.5) - 1.0))
## Build a Multi-layer feedforward neural network with Keras here.
# 1st Layer - Add a convolution layer
model.add(Convolution2D(24, 5, 5, subsample=(2,2), activation='relu'))
# 2nd Layer - Add a convolution layer
model.add(Convolution2D(36, 5, 5, subsample=(2,2), activation='relu'))
# 3rd Layer - Add a convolution layer
model.add(Convolution2D(48, 5, 5, subsample=(2,2), activation='relu'))
# 4th Layer - Add a convolution layer
model.add(Convolution2D(64, 3, 3, activation='relu'))
# 5th Layer - Add a convolution layer
model.add(Convolution2D(64, 3, 3, activation='relu'))
# 6th Layer - Add a convolution layer
model.add(Dropout(0.5))
# 7th Layer - Add a flatten layer
model.add(Flatten())
# 8th Layer - Add a fully connected layer
model.add(Dense(100, activation='relu'))
# 9th Layer - Add a fully connected layer
model.add(Dense(50, activation='relu'))
# 10th Layer - Add a fully connected layer
model.add(Dense(10, activation='relu'))
# 11th Layer - Add a fully connected layer
model.add(Dense(1))
model.summary()
# saves the model weights after each epoch if the validation loss decreased
checkpointer = ModelCheckpoint('model-{epoch:02d}.h5',
monitor='val_loss',
verbose=0,
save_best_only=False,
mode='auto')
# Compile and train the model
model.compile(optimizer='adam', loss='mse', verbose = 1)
# history_object = model.fit(X_train, y_train, validation_split=0.2, shuffle=True, nb_epoch=7, batch_size=128)
history_object = model.fit_generator(train_generator, samples_per_epoch=len(train_samples), \
validation_data=validation_generator, nb_val_samples=len(validation_samples), \
nb_epoch=10, callbacks=[checkpointer], verbose=1)
### print the keys contained in the history object
print(history_object.history.keys())
### plot the training and validation loss for each epoch
plt.plot(history_object.history['loss'])
plt.plot(history_object.history['val_loss'])
plt.title('model mean squared error loss')
plt.ylabel('mean squared error loss')
plt.xlabel('epoch')
plt.legend(['training set', 'validation set'], loc='upper right')
plt.show()
Video with model without resizing images.
In [9]:
from IPython.display import HTML
HTML("""
<video width="320" height="160" controls>
<source src="{0}">
</video>
""".format('run1.mp4'))
Out[9]:
Video with model with resizing images.
In [10]:
HTML("""
<video width="320" height="160" controls>
<source src="{0}">
</video>
""".format('run2.mp4'))
Out[10]:
In [ ]: