Congratulations on reaching the final project of the Robotics Nanodegree!
Previously, you worked on the Semantic Segmentation lab where you built a deep learning network that locates a particular human target within an image. For this project, you will utilize what you implemented and learned from that lab and extend it to train a deep learning model that will allow a simulated quadcopter to follow around the person that it detects!
Most of the code below is similar to the lab with some minor modifications. You can start with your existing solution, and modify and improve upon it to train the best possible model for this task.
You can click on any of the following to quickly jump to that part of this notebook:
We have provided you with a starting dataset for this project. Download instructions can be found in the README for this project's repo. Alternatively, you can collect additional data of your own to improve your model. Check out the "Collecting Data" section in the Project Lesson in the Classroom for more details!
In [1]:
import os
import glob
import sys
import tensorflow as tf
from scipy import misc
import numpy as np
from tensorflow.contrib.keras.python import keras
from tensorflow.contrib.keras.python.keras import layers, models
from tensorflow import image
from utils import scoring_utils
from utils.separable_conv2d import SeparableConv2DKeras, BilinearUpSampling2D
from utils import data_iterator
from utils import plotting_tools
from utils import model_tools
The Encoder for your FCN will essentially require separable convolution layers, due to their advantages as explained in the classroom. The 1x1 convolution layer in the FCN, however, is a regular convolution. Implementations for both are provided below for your use. Each includes batch normalization with the ReLU activation function applied to the layers.
In [2]:
def separable_conv2d_batchnorm(input_layer, filters, strides=1):
output_layer = SeparableConv2DKeras(filters=filters,kernel_size=3, strides=strides,
padding='same', activation='relu')(input_layer)
output_layer = layers.BatchNormalization()(output_layer)
return output_layer
def conv2d_batchnorm(input_layer, filters, kernel_size=3, strides=1):
output_layer = layers.Conv2D(filters=filters, kernel_size=kernel_size, strides=strides,
padding='same', activation='relu')(input_layer)
output_layer = layers.BatchNormalization()(output_layer)
return output_layer
In [3]:
def bilinear_upsample(input_layer):
output_layer = BilinearUpSampling2D((2,2))(input_layer)
return output_layer
In the following cells, you will build an FCN to train a model to detect and locate the hero target within an image. The steps are:
In [4]:
def encoder_block(input_layer, filters, strides):
output_layer = separable_conv2d_batchnorm(input_layer, filters, strides)
return output_layer
The decoder block is comprised of three parts:
In [5]:
def decoder_block(small_ip_layer, large_ip_layer, filters):
# Upsample the small input layer using the bilinear_upsample() function.
upsampled_layer = bilinear_upsample(small_ip_layer)
# Concatenate the upsampled and large input layers using layers.concatenate
layer = layers.concatenate([upsampled_layer, large_ip_layer])
# Add some number of separable convolution layers
layer = separable_conv2d_batchnorm(layer, filters, strides=1)
output_layer = separable_conv2d_batchnorm(layer, filters, strides=1)
return output_layer
Now that you have the encoder and decoder blocks ready, go ahead and build your FCN architecture!
There are three steps:
In [6]:
def fcn_model(inputs, num_classes):
# Add Encoder Blocks.
# with each encoder layer, the depth of model (the number of filters) increases.
layer0 = encoder_block(inputs, 32, 2)
layer1 = encoder_block(layer0, 64, 2)
layer2 = encoder_block(layer1, 128, 2)
layer3 = encoder_block(layer2, 256, 2)
# Add 1x1 Convolution layer using conv2d_batchnorm().
layer4 = conv2d_batchnorm(layer3, 512, kernel_size=1, strides=1)
# Add the same number of Decoder Blocks as the number of Encoder Blocks
layer5 = decoder_block(layer4, layer2, 256)
layer6 = decoder_block(layer5, layer1, 128)
layer7 = decoder_block(layer6, layer0, 64)
layer8 = decoder_block(layer7, inputs, 32)
# The function returns the output layer of your model. "x" is the final layer obtained from the last decoder_block()
return layers.Conv2D(num_classes, 1, activation='softmax', padding='same')(layer8)
The following cells will use the FCN you created and define an ouput layer based on the size of the processed image and the number of classes recognized. You will define the hyperparameters to compile and train your model.
Please Note: For this project, the helper code in
will resize the copter images to 160x160x3 to speed up training.
In [7]:
image_hw = 160
image_shape = (image_hw, image_hw, 3)
inputs = layers.Input(image_shape)
num_classes = 3
# Call fcn_model()
output_layer = fcn_model(inputs, num_classes)
Define and tune your hyperparameters.
In [8]:
learning_rate = 0.003
batch_size = 32
num_epochs = 20
steps_per_epoch = 100
validation_steps = 10
workers = 4
In [9]:
# Define the Keras model and compile it for training
model = models.Model(inputs=inputs, outputs=output_layer)
#model.compile(optimizer=keras.optimizers.Adam(learning_rate), loss='categorical_crossentropy')
model.compile(optimizer=keras.optimizers.Nadam(learning_rate), loss='categorical_crossentropy')
# Data iterators for loading the training and validation data
train_iter = data_iterator.BatchIteratorSimple(batch_size=batch_size,
data_folder=os.path.join('..', 'data', 'train'),
val_iter = data_iterator.BatchIteratorSimple(batch_size=batch_size,
data_folder=os.path.join('..', 'data', 'validation'),
logger_cb = plotting_tools.LoggerPlotter()
callbacks = [logger_cb]
steps_per_epoch = steps_per_epoch, # the number of batches per epoch,
epochs = num_epochs, # the number of epochs to train for,
validation_data = val_iter, # validation iterator
validation_steps = validation_steps, # the number of batches to validate on
workers = workers)
In [10]:
# Save your trained model weights
weight_file_name = 'model_weights'
model_tools.save_network(model, weight_file_name)
Now that you have your model trained and saved, you can make predictions on your validation dataset. These predictions can be compared to the mask images, which are the ground truth labels, to evaluate how well your model is doing under different conditions.
There are three different predictions available from the helper code provided:
In [11]:
# If you need to load a model which you previously trained you can uncomment the codeline that calls the function below.
# weight_file_name = 'model_weights'
# restored_model = model_tools.load_network(weight_file_name)
The following cell will write predictions to files and return paths to the appropriate directories.
The run_num
parameter is used to define or group all the data for a particular model run. You can change it for different runs. For example, 'run_1', 'run_2' etc.
In [12]:
run_num = 'run_1'
val_with_targ, pred_with_targ = model_tools.write_predictions_grade_set(model,
run_num,'patrol_with_targ', 'sample_evaluation_data')
val_no_targ, pred_no_targ = model_tools.write_predictions_grade_set(model,
run_num,'patrol_non_targ', 'sample_evaluation_data')
val_following, pred_following = model_tools.write_predictions_grade_set(model,
run_num,'following_images', 'sample_evaluation_data')
Now lets look at your predictions, and compare them to the ground truth labels and original images. Run each of the following cells to visualize some sample images from the predictions in the validation set.
In [13]:
# images while following the target
im_files = plotting_tools.get_im_file_sample('sample_evaluation_data','following_images', run_num)
for i in range(3):
im_tuple = plotting_tools.load_images(im_files[i])
In [14]:
# images while at patrol without target
im_files = plotting_tools.get_im_file_sample('sample_evaluation_data','patrol_non_targ', run_num)
for i in range(3):
im_tuple = plotting_tools.load_images(im_files[i])
In [15]:
# images while at patrol with target
im_files = plotting_tools.get_im_file_sample('sample_evaluation_data','patrol_with_targ', run_num)
for i in range(3):
im_tuple = plotting_tools.load_images(im_files[i])
In [16]:
# Scores for while the quad is following behind the target.
true_pos1, false_pos1, false_neg1, iou1 = scoring_utils.score_run_iou(val_following, pred_following)
In [17]:
# Scores for images while the quad is on patrol and the target is not visable
true_pos2, false_pos2, false_neg2, iou2 = scoring_utils.score_run_iou(val_no_targ, pred_no_targ)
In [18]:
# This score measures how well the neural network can detect the target from far away
true_pos3, false_pos3, false_neg3, iou3 = scoring_utils.score_run_iou(val_with_targ, pred_with_targ)
In [19]:
# Sum all the true positives, etc from the three datasets to get a weight for the score
true_pos = true_pos1 + true_pos2 + true_pos3
false_pos = false_pos1 + false_pos2 + false_pos3
false_neg = false_neg1 + false_neg2 + false_neg3
weight = true_pos/(true_pos+false_neg+false_pos)
In [20]:
# The IoU for the dataset that never includes the hero is excluded from grading
final_IoU = (iou1 + iou3)/2
In [21]:
# And the final grade score is
final_score = final_IoU * weight
