FCN-8s Tutorial


In [1]:
from fcn8s_tensorflow import FCN8s
from data_generator.batch_generator import BatchGenerator
from helpers.visualization_utils import print_segmentation_onto_image, create_video_from_images
from cityscapesscripts.helpers.labels import TRAINIDS_TO_COLORS_DICT, TRAINIDS_TO_RGBA_DICT

from math import ceil
import time
import matplotlib.pyplot as plt
%matplotlib inline

This notebook walks you through how to work with this FCN-8s implementation. I will take the Cityscapes dataset as an example to train the model on in this notebook, but the described setup is applicable to arbitrary datasets. Here is an overview of what using this model looks like:

First, you create an instance of the FCN8s model class. The constructor is explained in a subsequent next section.

The instantiated FCN8s model has the following main public methods:

  1. train(): Trains the model.
  2. evaluate(): Evaluates the model.
  3. predict(): Makes predictions.
  4. predict_and_save(): Makes predictions for a sequence of images, prints the predicted segmentations onto them and saves a copy of them to disk.
  5. save(): Saves the model to disk.
  6. close(): Closes the TensorFlow session. Once you instantiated a model, a session will be started and kept open until you manually close it. It is therefore important that you close the session when you're done working with the model.

fcn8s_tensorflow.py provides detailed documentation on the class and all of it's public methods, so take a look.

You can find a link to download a fully convolutionalized VGG-16 that was pre-trained on ImageNet classification in the README.

In the subsequent sections I'll go step by step over training, evaluation, prediction, and visualization.

1. Create a batch generator for training and evaluation

Let's get the preparation out of the way first. The train() and evaluate() methods need a generator that feeds them with batches of images and corresponding ground truth images. Ideally we want two generators, onr that serves data from a training dataset and another that serves data from a validation dataset. The Cityscapes dataset already provides a split of the data for us, so I'll just stick with that.

In order to train on the Cityscapes dataset, the only thing you really need to do here is set the appropriate paths to the dataset on your machine, for other datasets you will have to pass some different values to the BatchGenerator constructor, check the documentation for details.

If you need to preprocess your dataset, e.g. to change the image size or to convert the segmentation class labels, I suggest you do that offline. Take a look at how to use BatchGenerator as an offline preprocessor.


In [2]:
# TODO: Set the paths to the images.
train_images = '../../datasets/Cityscapes_small/leftImg8bit/train/'
val_images = '../../datasets/Cityscapes_small/leftImg8bit/val/'
test_images = '../../datasets/Cityscapes_small/leftImg8bit/test/'

# TODO: Set the paths to the ground truth images.
train_gt = '../../datasets/Cityscapes_small/gtFine/train/'
val_gt = '../../datasets/Cityscapes_small/gtFine/val/'

# Put the paths to the datasets in lists, because that's what `BatchGenerator` requires as input.
train_image_dirs = [train_images]
train_ground_truth_dirs = [train_gt]
val_image_dirs = [val_images]
val_ground_truth_dirs = [val_gt]

num_classes = 20 # TODO: Set the number of segmentation classes.

train_dataset = BatchGenerator(image_dirs=train_image_dirs,
                               image_file_extension='png',
                               ground_truth_dirs=train_ground_truth_dirs,
                               image_name_split_separator='leftImg8bit',
                               ground_truth_suffix='gtFine_labelIds',
                               check_existence=True,
                               num_classes=num_classes)

val_dataset = BatchGenerator(image_dirs=val_image_dirs,
                             image_file_extension='png',
                             ground_truth_dirs=val_ground_truth_dirs,
                             image_name_split_separator='leftImg8bit',
                             ground_truth_suffix='gtFine_labelIds',
                             check_existence=True,
                             num_classes=num_classes)

num_train_images = train_dataset.get_num_files()
num_val_images = val_dataset.get_num_files()

print("Size of training dataset: ", num_train_images, " images")
print("Size of validation dataset: ", num_val_images, " images")


Size of training dataset:  2975  images
Size of validation dataset:  500  images

In [3]:
# TODO: Set the batch size. I'll use the same batch size for both generators here.
batch_size = 4

train_generator = train_dataset.generate(batch_size=batch_size,
                                         convert_colors_to_ids=False,
                                         convert_ids_to_ids=False,
                                         convert_to_one_hot=True,
                                         void_class_id=None,
                                         random_crop=False,
                                         crop=False,
                                         resize=False,
                                         brightness=False,
                                         flip=0.5,
                                         translate=False,
                                         scale=False,
                                         gray=False,
                                         to_disk=False,
                                         shuffle=True)

val_generator = val_dataset.generate(batch_size=batch_size,
                                     convert_colors_to_ids=False,
                                     convert_ids_to_ids=False,
                                     convert_to_one_hot=True,
                                     void_class_id=None,
                                     random_crop=False,
                                     crop=False,
                                     resize=False,
                                     brightness=False,
                                     flip=False,
                                     translate=False,
                                     scale=False,
                                     gray=False,
                                     to_disk=False,
                                     shuffle=True)

In [5]:
# Print out some diagnostics to make sure that our batches aren't empty and it doesn't take forever to generate them.
start_time = time.time()
images, gt_images = next(train_generator)
print('Time to generate one batch: {:.3f} seconds'.format(time.time() - start_time))
print('Number of images generated:' , len(images))
print('Number of ground truth images generated:' , len(gt_images))


Time to generate one batch: 0.046 seconds
Number of images generated: 4
Number of ground truth images generated: 4

1.1 Visualize the dataset

Let's visualize the dataset just to get a better understanding of the ground truth data.


In [8]:
# Generate batches from the train_generator where the ground truth does not get converted to one-hot
# so that we can plot it as images.
example_generator = train_dataset.generate(batch_size=batch_size,
                                           convert_to_one_hot=False)

In [64]:
# Generate a batch.
example_images, example_gt_images = next(example_generator)

In [65]:
i = 0 # Select which sample from the batch to display below.

figure, cells = plt.subplots(1, 2, figsize=(16,8))
cells[0].imshow(example_images[i])
cells[1].imshow(example_gt_images[i])


Out[65]:
<matplotlib.image.AxesImage at 0x7fc6da0819b0>

In [66]:
plt.figure(figsize=(16, 8))
plt.imshow(example_gt_images[i])


Out[66]:
<matplotlib.image.AxesImage at 0x7fc6d9fa9be0>

2. Create the model

Instantiate an FCN8s. The constructor arguments might seem a bit confusing, but here is how it works. You can do either of three things:

  1. Build the FCN-8s model from scratch, but load a pre-trained VGG-16 model into it. In order to do so, you need to pass values only for vgg16_dir (the directory that contains the pre-trained, convolutionalized VGG-16) and for num_classes. This is what you will want to do when you are using this model for the first time. You can find the download link to a convolutionalized VGG-16 trained to convergence on ImageNet classification in the README.
  2. Load a saved model from a SavedModel protocol buffer. In order to do so, you need to pass values only for model_load_dir and tags. This is what you will likely want to do if you want to use or continue to train a previously saved FCN-8s. If you are unfamiliar with the SavedModel API, take a look at TensorFlow's documentation on this topic.
  3. Build the FCN-8s model from scratch, but load variables into it that were saved using tf.train.Saver. In order to do so, you need to pass values only for variables_load_dir and vgg16_dir. This is what you will want to do if you made any changes to the graph, but still want to load the saved variables from an earlier version of the graph. Unfortunately you still need to provide a VGG-16 SavedModel, because I have not manually rebuilt the VGG-16 graph in this implementation, so it needs to be loaded from a saved model.

In [6]:
model = FCN8s(model_load_dir=None,
              tags=None,
              vgg16_dir='../VGG-16_mod2FCN_ImageNet-Classification',
              num_classes=num_classes,
              variables_load_dir=None)


TensorFlow Version: 1.3.0
INFO:tensorflow:Restoring parameters from b'../../Google Drive/GitHub/Trained Models/VGG-16_mod2FCN_ImageNet-Classification/variables/variables'

3. Train the model

Now just call the train() method to train the model. Refer to the documentation for details on all the arguments, but here are a few notes:

  1. You'll have to pass some learning rate schedule function, however simple it may be. This function takes as input an integer (the trining step) and returns a float (the learning rate). I'll just define a simple step function below.
  2. Pass the generator(s) we instantiated above. Note that there are two arguments that take a generator as input, train_generator and val_generator, where the latter is optional.

In [7]:
epochs = 6 # TODO: Set the number of epochs to train for.

# TODO: Define a learning rate schedule function to be passed to the `train()` method.
def learning_rate_schedule(step):
    if step <= 10000: return 0.0001
    elif 10000 < step <= 20000: return 0.00001
    elif 20000 < step <= 40000: return 0.000003
    else: return 0.000001
    
model.train(train_generator=train_generator,
            epochs=epochs,
            steps_per_epoch=ceil(num_train_images/batch_size),
            learning_rate_schedule=learning_rate_schedule,
            keep_prob=0.5,
            l2_regularization=0.0,
            eval_dataset='val',
            eval_frequency=2,
            val_generator=val_generator,
            val_steps=ceil(num_val_images/batch_size),
            metrics={'loss', 'mean_iou', 'accuracy'},
            save_during_training=True,
            save_dir='cityscapes_model',
            save_best_only=True,
            save_tags=['default'],
            save_name='(batch-size-4)',
            save_frequency=2,
            saver='saved_model',
            monitor='loss',
            record_summaries=True,
            summaries_frequency=10,
            summaries_dir='tensorboard_log/cityscapes',
            summaries_name='configuration_01',
            training_loss_display_averaging=3)


Default GPU Device: /gpu:0
Epoch 1/6: 100%|██████████| 744/744 [07:19<00:00,  1.37it/s, loss=1.1, learning rate=0.0001]  
Epoch 2/6: 100%|██████████| 744/744 [07:32<00:00,  1.75it/s, loss=0.59, learning rate=0.0001] 
Evaluation on validation dataset: 100%|██████████| 125/125 [01:12<00:00,  1.74it/s]
loss: 0.8286  mean_iou: 0.2531  accuracy: 0.8111  
New best loss value, saving model.
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'cityscapes_model/saved_model_(batch-size-4)_(globalstep-1488)_(trainloss-0.5900)_(eval_on_val_dataset)_(loss-0.8286)_(mean_iou-0.2531)_(accuracy-0.8111)/saved_model.pb'
Epoch 3/6: 100%|██████████| 744/744 [07:17<00:00,  1.74it/s, loss=0.722, learning rate=0.0001]
Epoch 4/6: 100%|██████████| 744/744 [07:18<00:00,  1.74it/s, loss=0.746, learning rate=0.0001]
Evaluation on validation dataset: 100%|██████████| 125/125 [01:11<00:00,  1.73it/s]
loss: 0.7130  mean_iou: 0.2894  accuracy: 0.8317  
New best loss value, saving model.
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'cityscapes_model/saved_model_(batch-size-4)_(globalstep-2976)_(trainloss-0.7464)_(eval_on_val_dataset)_(loss-0.7130)_(mean_iou-0.2894)_(accuracy-0.8317)/saved_model.pb'
Epoch 5/6: 100%|██████████| 744/744 [07:17<00:00,  1.79it/s, loss=0.897, learning rate=0.0001]
Epoch 6/6: 100%|██████████| 744/744 [07:23<00:00,  1.79it/s, loss=0.598, learning rate=0.0001]
Evaluation on validation dataset: 100%|██████████| 125/125 [01:09<00:00,  1.80it/s]
loss: 0.6854  mean_iou: 0.3092  accuracy: 0.8392  
New best loss value, saving model.
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'cityscapes_model/saved_model_(batch-size-4)_(globalstep-4464)_(trainloss-0.5979)_(eval_on_val_dataset)_(loss-0.6854)_(mean_iou-0.3092)_(accuracy-0.8392)/saved_model.pb'

3. Save the model

I already set the train() method above to save the model to disk during the training, so the model has already been saved (potentially multiple times) and it's not necessary to save it manually, but here is the exemplary method call just for the sake of completeness.


In [8]:
model.save(model_save_dir='cityscapes_model',
           saver='saved_model',
           tags=['default'],
           name='(batch-size-4)',
           include_global_step=True,
           include_last_training_loss=True,
           include_metrics=True,
           force_save=False)


Abort: Nothing to save, no training has been performed since the model was last saved.

4. Evaluate the model

I already set the train() method above to evaluate the model every few epochs during training, but you can evaluate the model explicitly as shown below. There are currently three metrics built in: (1) Mean intersection over union, which is probably the most important metric for semantic segmentation models, (2) accuracy, which simply measures the ratio of images pixels that were classified correctly, and (3) loss, which is simply the output of the loss function. You can evaluate the model on any subset of them.


In [9]:
model.evaluate(data_generator=val_generator,
               metrics={'loss', 'mean_iou', 'accuracy'},
               num_batches=ceil(num_val_images/batch_size),
               l2_regularization=0.0,
               dataset='val')


Running evaluation: 100%|██████████| 125/125 [01:07<00:00,  1.85it/s]
loss: 0.6854  mean_iou: 0.3092  accuracy: 0.8392  

5. Make predictions and visualize them


In [10]:
images, labels = next(val_generator)

In [17]:
n = 3 # Select which image of the batch you would like to visualize.

# Make a prediction.
prediction = model.predict([images[n]], argmax=False)

# Print the predicted segmentation onto the image.
segmented_image = print_segmentation_onto_image(images[n], prediction, color_map=TRAINIDS_TO_RGBA_DICT)

plt.figure(figsize=(20,14))
plt.imshow(segmented_image)


Out[17]:
<matplotlib.image.AxesImage at 0x7f92645ea390>

6. Process a sequence of images, save them to disk, and generate a video from them

In case you find it useful, with the method below you can just let the model run predictions on all images in a given directory, print the predicted segmentations onto them, and save a copy of them to disk.


In [18]:
model.predict_and_save(results_dir='demo_video_images',
                       images_dir='../../datasets/Cityscapes_small/leftImg8bit/demoVideo/stuttgart_00',
                       color_map=TRAINIDS_TO_RGBA_DICT,
                       resize=False,
                       image_file_extension='png',
                       include_unprocessed_image=True,
                       arrangement='vertical')


The segmented images will be saved to "demo_video_images"
Processing images: 100%|██████████| 599/599 [02:23<00:00,  4.16it/s]

Let's make a video from the predictions above:


In [19]:
create_video_from_images(video_output_name='demo_video',
                         image_input_dir='demo_video_images',
                         frame_rate=30.0,
                         image_file_extension='png')


[MoviePy] >>>> Building video demo_video.mp4
[MoviePy] Writing video demo_video.mp4
100%|██████████| 600/600 [00:06<00:00, 84.96it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: demo_video.mp4 

7. Close the session

Remember, the TensorFlow session is being kept open and keeps owning resources until you manually close it, so don't forget to close it when you're done in order to release the resources.


In [10]:
model.close()


The session has been closed.

In [ ]: