In [1]:
from fcn8s_tensorflow import FCN8s
from data_generator.batch_generator import BatchGenerator
from helpers.visualization_utils import print_segmentation_onto_image, create_video_from_images
from cityscapesscripts.helpers.labels import TRAINIDS_TO_COLORS_DICT, TRAINIDS_TO_RGBA_DICT
from math import ceil
import time
import matplotlib.pyplot as plt
%matplotlib inline
This notebook walks you through how to work with this FCN-8s implementation. I will take the Cityscapes dataset as an example to train the model on in this notebook, but the described setup is applicable to arbitrary datasets. Here is an overview of what using this model looks like:
First, you create an instance of the FCN8s model class. The constructor is explained in a subsequent next section.
The instantiated FCN8s model has the following main public methods:
fcn8s_tensorflow.py provides detailed documentation on the class and all of it's public methods, so take a look.
You can find a link to download a fully convolutionalized VGG-16 that was pre-trained on ImageNet classification in the README.
In the subsequent sections I'll go step by step over training, evaluation, prediction, and visualization.
Let's get the preparation out of the way first. The train() and evaluate() methods need a generator that feeds them with batches of images and corresponding ground truth images. Ideally we want two generators, onr that serves data from a training dataset and another that serves data from a validation dataset. The Cityscapes dataset already provides a split of the data for us, so I'll just stick with that.
In order to train on the Cityscapes dataset, the only thing you really need to do here is set the appropriate paths to the dataset on your machine, for other datasets you will have to pass some different values to the BatchGenerator constructor, check the documentation for details.
If you need to preprocess your dataset, e.g. to change the image size or to convert the segmentation class labels, I suggest you do that offline. Take a look at how to use BatchGenerator as an offline preprocessor.
In [2]:
# TODO: Set the paths to the images.
train_images = '../../datasets/Cityscapes_small/leftImg8bit/train/'
val_images = '../../datasets/Cityscapes_small/leftImg8bit/val/'
test_images = '../../datasets/Cityscapes_small/leftImg8bit/test/'
# TODO: Set the paths to the ground truth images.
train_gt = '../../datasets/Cityscapes_small/gtFine/train/'
val_gt = '../../datasets/Cityscapes_small/gtFine/val/'
# Put the paths to the datasets in lists, because that's what `BatchGenerator` requires as input.
train_image_dirs = [train_images]
train_ground_truth_dirs = [train_gt]
val_image_dirs = [val_images]
val_ground_truth_dirs = [val_gt]
num_classes = 20 # TODO: Set the number of segmentation classes.
train_dataset = BatchGenerator(image_dirs=train_image_dirs,
image_file_extension='png',
ground_truth_dirs=train_ground_truth_dirs,
image_name_split_separator='leftImg8bit',
ground_truth_suffix='gtFine_labelIds',
check_existence=True,
num_classes=num_classes)
val_dataset = BatchGenerator(image_dirs=val_image_dirs,
image_file_extension='png',
ground_truth_dirs=val_ground_truth_dirs,
image_name_split_separator='leftImg8bit',
ground_truth_suffix='gtFine_labelIds',
check_existence=True,
num_classes=num_classes)
num_train_images = train_dataset.get_num_files()
num_val_images = val_dataset.get_num_files()
print("Size of training dataset: ", num_train_images, " images")
print("Size of validation dataset: ", num_val_images, " images")
In [3]:
# TODO: Set the batch size. I'll use the same batch size for both generators here.
batch_size = 4
train_generator = train_dataset.generate(batch_size=batch_size,
convert_colors_to_ids=False,
convert_ids_to_ids=False,
convert_to_one_hot=True,
void_class_id=None,
random_crop=False,
crop=False,
resize=False,
brightness=False,
flip=0.5,
translate=False,
scale=False,
gray=False,
to_disk=False,
shuffle=True)
val_generator = val_dataset.generate(batch_size=batch_size,
convert_colors_to_ids=False,
convert_ids_to_ids=False,
convert_to_one_hot=True,
void_class_id=None,
random_crop=False,
crop=False,
resize=False,
brightness=False,
flip=False,
translate=False,
scale=False,
gray=False,
to_disk=False,
shuffle=True)
In [5]:
# Print out some diagnostics to make sure that our batches aren't empty and it doesn't take forever to generate them.
start_time = time.time()
images, gt_images = next(train_generator)
print('Time to generate one batch: {:.3f} seconds'.format(time.time() - start_time))
print('Number of images generated:' , len(images))
print('Number of ground truth images generated:' , len(gt_images))
In [8]:
# Generate batches from the train_generator where the ground truth does not get converted to one-hot
# so that we can plot it as images.
example_generator = train_dataset.generate(batch_size=batch_size,
convert_to_one_hot=False)
In [64]:
# Generate a batch.
example_images, example_gt_images = next(example_generator)
In [65]:
i = 0 # Select which sample from the batch to display below.
figure, cells = plt.subplots(1, 2, figsize=(16,8))
cells[0].imshow(example_images[i])
cells[1].imshow(example_gt_images[i])
Out[65]:
In [66]:
plt.figure(figsize=(16, 8))
plt.imshow(example_gt_images[i])
Out[66]:
Instantiate an FCN8s. The constructor arguments might seem a bit confusing, but here is how it works. You can do either of three things:
vgg16_dir (the directory that contains the pre-trained, convolutionalized VGG-16) and for num_classes. This is what you will want to do when you are using this model for the first time. You can find the download link to a convolutionalized VGG-16 trained to convergence on ImageNet classification in the README.SavedModel protocol buffer. In order to do so, you need to pass values only for model_load_dir and tags. This is what you will likely want to do if you want to use or continue to train a previously saved FCN-8s. If you are unfamiliar with the SavedModel API, take a look at TensorFlow's documentation on this topic.tf.train.Saver. In order to do so, you need to pass values only for variables_load_dir and vgg16_dir. This is what you will want to do if you made any changes to the graph, but still want to load the saved variables from an earlier version of the graph. Unfortunately you still need to provide a VGG-16 SavedModel, because I have not manually rebuilt the VGG-16 graph in this implementation, so it needs to be loaded from a saved model.
In [6]:
model = FCN8s(model_load_dir=None,
tags=None,
vgg16_dir='../VGG-16_mod2FCN_ImageNet-Classification',
num_classes=num_classes,
variables_load_dir=None)
Now just call the train() method to train the model. Refer to the documentation for details on all the arguments, but here are a few notes:
train_generator and val_generator, where the latter is optional.
In [7]:
epochs = 6 # TODO: Set the number of epochs to train for.
# TODO: Define a learning rate schedule function to be passed to the `train()` method.
def learning_rate_schedule(step):
if step <= 10000: return 0.0001
elif 10000 < step <= 20000: return 0.00001
elif 20000 < step <= 40000: return 0.000003
else: return 0.000001
model.train(train_generator=train_generator,
epochs=epochs,
steps_per_epoch=ceil(num_train_images/batch_size),
learning_rate_schedule=learning_rate_schedule,
keep_prob=0.5,
l2_regularization=0.0,
eval_dataset='val',
eval_frequency=2,
val_generator=val_generator,
val_steps=ceil(num_val_images/batch_size),
metrics={'loss', 'mean_iou', 'accuracy'},
save_during_training=True,
save_dir='cityscapes_model',
save_best_only=True,
save_tags=['default'],
save_name='(batch-size-4)',
save_frequency=2,
saver='saved_model',
monitor='loss',
record_summaries=True,
summaries_frequency=10,
summaries_dir='tensorboard_log/cityscapes',
summaries_name='configuration_01',
training_loss_display_averaging=3)
In [8]:
model.save(model_save_dir='cityscapes_model',
saver='saved_model',
tags=['default'],
name='(batch-size-4)',
include_global_step=True,
include_last_training_loss=True,
include_metrics=True,
force_save=False)
I already set the train() method above to evaluate the model every few epochs during training, but you can evaluate the model explicitly as shown below. There are currently three metrics built in: (1) Mean intersection over union, which is probably the most important metric for semantic segmentation models, (2) accuracy, which simply measures the ratio of images pixels that were classified correctly, and (3) loss, which is simply the output of the loss function. You can evaluate the model on any subset of them.
In [9]:
model.evaluate(data_generator=val_generator,
metrics={'loss', 'mean_iou', 'accuracy'},
num_batches=ceil(num_val_images/batch_size),
l2_regularization=0.0,
dataset='val')
In [10]:
images, labels = next(val_generator)
In [17]:
n = 3 # Select which image of the batch you would like to visualize.
# Make a prediction.
prediction = model.predict([images[n]], argmax=False)
# Print the predicted segmentation onto the image.
segmented_image = print_segmentation_onto_image(images[n], prediction, color_map=TRAINIDS_TO_RGBA_DICT)
plt.figure(figsize=(20,14))
plt.imshow(segmented_image)
Out[17]:
In [18]:
model.predict_and_save(results_dir='demo_video_images',
images_dir='../../datasets/Cityscapes_small/leftImg8bit/demoVideo/stuttgart_00',
color_map=TRAINIDS_TO_RGBA_DICT,
resize=False,
image_file_extension='png',
include_unprocessed_image=True,
arrangement='vertical')
Let's make a video from the predictions above:
In [19]:
create_video_from_images(video_output_name='demo_video',
image_input_dir='demo_video_images',
frame_rate=30.0,
image_file_extension='png')
In [10]:
model.close()
In [ ]: