Introduction to Image Segmentation with TensorFlow

There are a variety of important image analysis deep learning applications that need to go beyond detecting individual objects within an image and instead segment the image into spatial regions of interest. For example, in medical imagery analysis it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells so that we can isolate a particular organ. In this self-paced, hands-on lab we will use the TensorFlow machine learning framework to train and evaluate an image segmentation network using a medical imagery dataset.

Lab created by Jonathan Bentz (follow @jnbntz on Twitter).

Before we begin, let's verify WebSockets are working on your system. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the gray cell. If not, please consult the Self-paced Lab Troubleshooting FAQ to debug the issue.



In [1]:

    
print "The answer should be three: " + str(1+2)









    



The answer should be three: 3

Let's execute the cell below to display information about the GPUs running on the server.



In [2]:

    
!nvidia-smi









    



Tue May  9 17:02:40 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
| N/A   25C    P8    31W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Finally, execute the following cell to show the version of TensorFlow that is used in this lab.



In [3]:

    
!python -c 'import tensorflow as tf; print(tf.__version__)'

If you have never before taken an IPython Notebook based self-paced lab from NVIDIA, recommend watching this short YouTube video

Image Segmentation

In this lab you will work through a series of exercises performing image segmentation, also called semantic segmentation. Semantic segmentation is the task of placing each pixel into a specific class. In a sense it's a classification problem where you'll classify on a pixel basis rather than an entire image. In this lab the task will be classifying each pixel in a cardiac MRI image based on whether the pixel is a part of the left ventricle (LV) or not.

This lab is not an introduction to deep learning, nor is it intended to be a rigorous mathematical formalism of convolutional neural networks. We'll assume that you have at least a passing understanding of neural networks including concepts like forward and backpropagation, activations, SGD, convolutions, pooling, bias, and the like. It is helpful if you've encountered convolutional neural networks (CNN) already and you understand image recognition tasks. The lab will use Google's TensorFlow machine learning framework so if you have Python and TensorFlow experience it is helpful, but not required. Most of the work we'll do in this lab is not coding per se, but setting up and running training and evaluation tasks using TensorFlow.

Input Data Set

The data set you'll be utilizing is a series of cardiac images (specifically MRI short-axis (SAX) scans) that have been expertly labeled. See References [1, 2, 3] for full citation information.

Four representative examples of the data are shown below. Each row of images is an instance of the data. On the left are the MRI images and the right are the expertly-segmented regions (often called contours). The portions of the images that are part of the LV are denoted in white. Note that the size of LV varies from image to image, but the LV typically takes up a relatively small region of the entire image.

The data extraction from the raw images and then subsequent preparation of these images for ingestion into TensorFlow will not be showcased in this lab. Suffice it to say that data preparation is a non-trivial aspect of machine learning workflows and is outside the scope of this lab.

For those that are interested in the details, we obtained guidance and partial code from a prior Kaggle competition on how to extract the images properly. At that point we took the images, converted them to TensorFlow records (TFRecords), and stored them to files. TFRecords are a special file format provided by TensorFlow, which allow you to use built-in TensorFlow functions for data management including multi-threaded data reading and sophisticated pre-processing of the data such as randomizing and even augmenting the training data.

The images themselves are originally 256 x 256 grayscale DICOM format, a common image format in medical imaging. The label is a tensor of size 256 x 256 x 2. The reason the last dimension is a 2 is that the pixel is in one of two classes so each pixel label has a vector of size 2 associated with it. The training set is 234 images and the validation set (data NOT used for training but used to test the accuracy of the model) is 26 images.

Deep Learning with TensorFlow

This lab is part of a series of self-paced labs designed to introduce you to some of the publicly-available deep learning frameworks available today. TensorFlow is a framework developed by Google and used by numerous researchers and product groups within Google.

TensorFlow is an open source software library for machine intelligence. The computations are expressed as data flow graphs which operate on tensors (hence the name). If you can express your computation in this manner you can run your algorithm in the TensorFlow framework.

TensorFlow is portable in the sense that you can run on CPUs and GPUs and utilize workstations, servers, and even deploy models on mobile platforms. At present TensorFlow offers the options of expressing your computation in either Python or C++, with experimental support for Go and JAVA. A typical usage of TensorFlow would be performing training and testing in Python and once you have finalized your model you might deploy with C++.

TensorFlow is designed and built for performance on both CPUs and GPUs. Within a single TensorFlow execution you have lots of flexibility in that you can assign different tasks to CPUs and GPUs explicitly if necessary. When running on GPUs TensorFlow utilizes a number of GPU libraries including cuDNN allowing it to extract the most performance possible from the very newest GPUs available.

One of the intents of this lab is to gain an introductory level of familiarity with TensorFlow. In the course of this short lab we won't be able to discuss all the features and options of TensorFlow but we hope that after completion of this lab you'll feel comfortable with and have a good idea how to move forward using TensorFlow to solve your specific machine learning problems.

For comprehensive documentation on TensorFlow we recommend the TensorFlow website, the whitepaper, and the GitHub site.

TensorFlow Basics

TensorFlow has multiple ways to be used depending on your preferences. For designing training tasks a common way is to use the TensorFlow Python API. Running a machine learning training task on TensorFlow consists of (at least) two distinct steps.

Data Flow Graph

First you will construct a data flow graph which is the specification and ordering of exactly what computations you'd like to perform. Using the TensorFlow API you construct a neural network layer-by-layer using any of the TensorFlow-provided operations such as convolutions, activations, pooling, etc. This stage of the process doesn't do any actual computation on your data; it merely constructs the graph that you've specified.

When you build the graph you must specify each so-called Variable (in TensorFlow lexicon). Specifying a piece of data as a Variable tells TensorFlow that it will be a parameter to be "learned", i.e., it is a weight that will be updated as the training proceeds.

Session

Once you've defined your neural network as a data flow graph, you will launch a Session. This is the mechanism whereby you specify the input data and training parameters to your previously-constructed graph and then the computation proceeds.

In general these two steps will be repeated each time you wish to change your graph, i.e., you'll update the graph and launch a new session.

Sample Workflow

A sample workflow of training and evaluating a model might look like the following.

Prepare input data--Input data can be Numpy arrays but for very large datasets TensorFlow provides a specialized format called TFRecords.
Build the Computation Graph--Create the graph of your neural network including specialized nodes such as inference, loss and training nodes.
Train the model--inject input data into the graph in a TensorFlow Session and loop over your input data. Customize your batch size, number of epochs, learning rate, etc.
Evaluate the model--run inference (using the same graph from training) on previously unseen data and evaluate the accuracy of your model based on a suitable metric.

TensorBoard

TensorFlow provides a feature-rich tool called TensorBoard that allows you to visualize many aspects of your program. In TensorBoard you can see a visual representation of your computation graph and you can plot different metrics of your computation such as loss, accuracy, and learning rate. Essentially any data that is generated during the execution of TensorFlow can be visually displayed by TensorBoard with the addition of a few extra API calls in your program.

For example, consider the following code snippet that creates a neural network with one hidden layer (don't worry about the details of the code at this time).

with tf.name_scope('Hidden1'):
    W_fc = tf.Variable(tf.truncated_normal( [256*256, 512],
                 stddev=0.1, dtype=tf.float32), name='W_fc')
    flatten1_op = tf.reshape( images_re, [-1, 256*256])
    h_fc1 = tf.matmul( flatten1_op, W_fc )

with tf.name_scope('Final'):
    W_fc2 = tf.Variable(tf.truncated_normal( [512, 256*256*2],
                stddev=0.1, dtype=tf.float32), name='W_fc2' )
    h_fc2 = tf.matmul( h_fc1, W_fc2 )
    h_fc2_re = tf.reshape( h_fc2, [-1, 256, 256, 2] )

return h_fc2_re

TensorBoard will display the neural network like the figure below. If you look closely you'll see that the edges have the tensor dimensions printed, i.e., as you move node-to-node you can follow how the data (as a tensor) and it's size changes throughout the graph.

Task 1 -- One Hidden Layer

The first task we'll consider will be to create, train and evaluate a fully-connected neural network with one hidden layer. The input to the neural network will be the value of each pixel, i.e., a size 256 x 256 (or 65,536) array. The hidden layer will have a size that you can adjust, and the output will be an array of 256 x 256 x 2, i.e., each input pixel can be in either one of two classes so the output value associated with each pixel will be the probability that the pixel is in that particular class. In our case the two classes are LV or not. We'll compute the loss via a TensorFlow function called sparse_softmax_cross_entropy_with_logits which simply combines the softmax with the cross entropy calculation into one function call.

Training

For the first exercise the code has already been written for you. To begin the task of training the neural network, execute the cell below.



In [4]:

    
!python exercises/simple/runTraining.py --data_dir /data









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_fc shape (65536, 512)
DEBUG flatten1_op shape (1, 65536)
DEBUG h_fc1 shape (1, 512)
DEBUG W_fc2 shape (512, 131072)
DEBUG h_fc2 shape (1, 131072)
DEBUG h_fc2_re shape (1, 256, 256, 2)
DEBUG logits shape before (1, 256, 256, 2)
DEBUG labels shape before (1, 256, 256)
DEBUG logits shape after (1, 256, 256, 2)
DEBUG labels shape after (1, 256, 256)
DEBUG cross_entropy shape (1, 256, 256)
OUTPUT: Step 0: loss = 2.900 (0.189 sec)
OUTPUT: Step 100: loss = 2.620 (0.019 sec)
OUTPUT: Step 200: loss = 4.322 (0.019 sec)
OUTPUT: Done training for 1 epochs, 231 steps.

If everything is working properly you may see some messages printed to the screen. Some of those are informational message from TensorFlow and can typically be ignored. You'll want to look for the lines that start with "OUTPUT", as those are the lines we inserted in the program specifically to output information at particular points, such as the loss that is computed every 100 steps. The very last line you'll see is:

OUTPUT: Done training for 1 epochs, 231 steps.

This means your training job was one complete epoch through all the training data.

Evaluation

Once we have a trained model we want to evaulate how well it works on data that it hasn't seen before. To evaluate our trained model execute the cell below.



In [5]:

    
!python exercises/simple/runEval.py --data_dir /data









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_fc shape (65536, 512)
DEBUG flatten1_op shape (1, 65536)
DEBUG h_fc1 shape (1, 512)
DEBUG W_fc2 shape (512, 131072)
DEBUG h_fc2 shape (1, 131072)
DEBUG h_fc2_re shape (1, 256, 256, 2)
DEBUG logits eval shape before (1, 256, 256, 2)
DEBUG labels eval shape before (1, 256, 256)
DEBUG logits_re eval shape after (65536, 2)
DEBUG labels_re eval shape after (65536,)
DEBUG correct shape (65536,)
OUTPUT: 2017-05-09 17:07:13.636280: precision = 0.504
OUTPUT: 26 images evaluated from file /data/val_images.tfrecords

Again you're ignoring most of the TensorFlow output while focusing your attention on the lines that begin with "OUTPUT". When I ran this I obtained the following output. You should see something similar.

OUTPUT: 2017-01-26 17:12:28.929741: precision = 0.503
OUTPUT: 26 images evaluated from file /data/val_images.tfrecords

The final output lines show the accuracy of the model, i.e., how well the model predicted whether each pixel is in LV (or not) versus the ground truth. In the case above, 0.503 is 50.3%, so the model predicted the correct class for each pixel roughly half the time. This is not a great result, but it's not awful either considering we only ran a very simple network and only ran training for one epoch.

TensorBoard

At this point, if you haven't already you'll want to launch TensorBoard here. TensorBoard has a lot of impressive visualization features. At the top menu there is a link called "Scalars" which you can click to show you some of the information that has been captured. You can click to expand any of them and see a plot of that data.

Another menu choice at the top is "Graphs". If you choose that you can view both your training and evaluation data flow graphs. Each node in the graph can be clicked to expand it and you can obtain more detailed information about that node. In the upper left of the page you can choose whether you want to view the training or evaluation graph, via a small dropdown selection.

The code solution to this task is given below.

with tf.name_scope('Hidden1'):
    W_fc = tf.Variable(tf.truncated_normal( [256*256, 512],
                 stddev=0.1, dtype=tf.float32), name='W_fc')
    flatten1_op = tf.reshape( images_re, [-1, 256*256])
    h_fc1 = tf.matmul( flatten1_op, W_fc )

with tf.name_scope('Final'):
    W_fc2 = tf.Variable(tf.truncated_normal( [512, 256*256*2],
                stddev=0.1, dtype=tf.float32), name='W_fc2' )
    h_fc2 = tf.matmul( h_fc1, W_fc2 )
    h_fc2_re = tf.reshape( h_fc2, [-1, 256, 256, 2] )

return h_fc2_re

You'll notice it's Python syntax with some TensorFlow API calls.

tf.name_scope() lets you name that particular scope of the program. It's useful both for organizing the code and for giving a name to the node in the TensorBoard graph.
tf.Variable() indicates a TensorFlow variable that will be trained, i.e., it's a tensor of weights.
tf.reshape() is a TensorFlow auxiliary function for reshaping tensors so that they'll be the proper shape for upcoming operations.
tf.matmul() is as you'd expect. It's a matrix multiply of two TensorFlow tensors.

Topics not covered in detail

We skipped over a number of topics that we'll mention for completeness but won't discuss in detail.

We assumed all the data was setup for us already. As previously shown earlier we're using TFRecords file data format that has already been setup for us.
We are using a TensorFlow mechanism using multiple threads to read the data from those files. This allows us to use built-in TensorFlow functions to randomize the data, as well as handle things like batch_size and num_epochs automatically.
We have only given a brief discussion of the actual construction of the model via a data flow graph. This is a lot of Python syntax that you can view in the code if you like.
Finally we have inserted special API calls to export data to TensorBoard so that it can be plotted and viewed easily. Again this is boilerplate Python code that you can view if you like.

Task 2 -- Convolutional Neural Network (CNN)

Our second task will be to convert our model to a more sophisticated network that includes more layers and types than above. The previous example focused on each individual pixel but had no way to account for the fact that the regions of interest are likely larger than a single pixel. We'd like to capture small regions of interest as well and for that we'll utilize convolutional layers which can capture larger receptive fields.

We'll also add pooling layers which down-sample the data while attempting to retain most of the information. This eliminates some computational complexity.

Up to this point we've described layers that are commonly associated with image recognition neural networks, where the number of output nodes is equal to the number of classes. Recall that we're doing more than classifying the image; we're classifying each pixel in the image so our output size will be the number of classes (2) times the number of pixels (256 x 256). Additionally the spatial location of the output nodes are important as well, since each pixel has an associated probability that it's part of LV (or not).

CNN's are well-established as excellent choices for image recognition or classification tasks. Our task in this lab is segmentation, which is related to classification in a sense. We are classifying each pixel in the image, rather than the entire image altogether. So the question becomes, can we utilize the same type of CNN that is already shown to do very well on image recognition, for our task of segmentation? It turns out that we can make some modifications to CNN models to do this.

We can accomplish this by using a standard image recognition neural network, and replacing the fully-connected layers (typically the last few layers) with deconvolution layers (arguably more accurately called transpose convolution layers)

Deconvolution is an upsampling method that brings a smaller image data set back up to it's original size for final pixel classification. There are a few good resources [4, 5, 6] that we'd recommend on this topic. When modifying a CNN to adapt it to segmentation the resulting neural network is commonly called a fully convolutional network, or FCN.

It can be helpful to visualize how the input data (in our case a tensor of size 256 x 256 x 1) "flows" through the graph, i.e., how the data is transformed via the different operations of convolution, pooling and such. The figure below represents the transformations that our data will undergo in the next task.

The network represented by the figure above is similar to the network that's found in ref [7]. It consists of convolution layers, pooling layers, and a final deconvolution layer, with the input image being transformed as indicated in the image.

This task requires you to finish this neural network and then run the training. To accomplish this edit the file exercises/cnn/neuralnetwork.py and replace all the instances of FIXME with code. There are comments in the code to help you and you can use the following network structure to help as well. The names of the layers will make more sense as you examine and complete the code.

Convolution1, 5 x 5 kernel, stride 2
Maxpooling1, 2 x 2 window, stride 2
Convolution2, 5 x 5 kernel, stride 2
Maxpooling2, 2 x 2 window, stride 2
Convolution3, 3 x 3 kernel, stride 1
Convolution4, 3 x 3 kernel, stride 1
Score_classes, 1x1 kernel, stride 1
Upscore (deconvolution), 31 x 31 kernel, stride 16

If you want to check your work you can view the solution at exercise_solutions/cnn/neuralnetwork.py.

Once you complete the code you can begin running training using the box below and visualizing your results via the TensorBoard browser window you opened in the previous task. If you don't immediately see your results, you may have to wait a short time for TensorBoard to recognize that there are new graphs to visualize. You may also have to refresh your browser.



In [8]:

    
!python exercises/cnn/runTraining.py --num_epochs 20 --data_dir /data









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_conv1 shape (5, 5, 1, 100)
DEBUG conv1_op shape (1, 128, 128, 100)
DEBUG relu1_op shape (1, 128, 128, 100)
DEBUG pool1_op shape (1, 64, 64, 100)
DEBUG W_conv2 shape (5, 5, 100, 200)
DEBUG conv2_op shape (1, 32, 32, 200)
DEBUG relu2_op shape (1, 32, 32, 200)
DEBUG pool2_op shape (1, 16, 16, 200)
DEBUG W_conv3 shape (3, 3, 200, 300)
DEBUG conv3_op shape (1, 16, 16, 300)
DEBUG relu3_op shape (1, 16, 16, 300)
DEBUG W_conv4 shape (3, 3, 300, 300)
DEBUG conv4_op shape (1, 16, 16, 300)
DEBUG relu4_op shape (1, 16, 16, 300)
DEBUG drop_op shape (1, 16, 16, 300)
DEBUG W_score_classes_shape (1, 1, 300, 2)
DEBUG score_conv_op shape (1, 16, 16, 2)
DEBUG W_upscore shape (31, 31, 2, 2)
DEBUG upscore_conv_op shape (1, 256, 256, 2)
DEBUG logits shape before (1, 256, 256, 2)
DEBUG labels shape before (1, 256, 256)
DEBUG logits shape after (1, 256, 256, 2)
DEBUG labels shape after (1, 256, 256)
OUTPUT: Step 0: loss = 0.720 (0.570 sec)
OUTPUT: Step 100: loss = 0.701 (0.010 sec)
OUTPUT: Step 200: loss = 0.694 (0.010 sec)
OUTPUT: Step 300: loss = 0.700 (0.010 sec)
OUTPUT: Step 400: loss = 0.694 (0.011 sec)
OUTPUT: Step 500: loss = 0.707 (0.011 sec)
OUTPUT: Step 600: loss = 0.689 (0.010 sec)
OUTPUT: Step 700: loss = 0.674 (0.010 sec)
OUTPUT: Step 800: loss = 0.653 (0.010 sec)
OUTPUT: Step 900: loss = 0.619 (0.010 sec)
OUTPUT: Step 1000: loss = 0.542 (0.010 sec)
OUTPUT: Step 1100: loss = 0.429 (0.011 sec)
OUTPUT: Step 1200: loss = 0.305 (0.011 sec)
OUTPUT: Step 1300: loss = 0.206 (0.011 sec)
OUTPUT: Step 1400: loss = 0.247 (0.010 sec)
OUTPUT: Step 1500: loss = 0.249 (0.010 sec)
OUTPUT: Step 1600: loss = 0.170 (0.010 sec)
OUTPUT: Step 1700: loss = 0.173 (0.011 sec)
OUTPUT: Step 1800: loss = 0.192 (0.010 sec)
OUTPUT: Step 1900: loss = 0.212 (0.011 sec)
OUTPUT: Step 2000: loss = 0.222 (0.011 sec)
OUTPUT: Step 2100: loss = 0.114 (0.010 sec)
OUTPUT: Step 2200: loss = 0.122 (0.010 sec)
OUTPUT: Step 2300: loss = 0.192 (0.011 sec)
OUTPUT: Step 2400: loss = 0.113 (0.010 sec)
OUTPUT: Step 2500: loss = 0.134 (0.011 sec)
OUTPUT: Step 2600: loss = 0.102 (0.010 sec)
OUTPUT: Step 2700: loss = 0.118 (0.010 sec)
OUTPUT: Step 2800: loss = 0.114 (0.011 sec)
OUTPUT: Step 2900: loss = 0.054 (0.010 sec)
OUTPUT: Step 3000: loss = 0.208 (0.011 sec)
OUTPUT: Step 3100: loss = 0.067 (0.010 sec)
OUTPUT: Step 3200: loss = 0.088 (0.011 sec)
OUTPUT: Step 3300: loss = 0.082 (0.010 sec)
OUTPUT: Step 3400: loss = 0.163 (0.010 sec)
OUTPUT: Step 3500: loss = 0.068 (0.010 sec)
OUTPUT: Step 3600: loss = 0.042 (0.010 sec)
OUTPUT: Step 3700: loss = 0.063 (0.011 sec)
OUTPUT: Step 3800: loss = 0.118 (0.011 sec)
OUTPUT: Step 3900: loss = 0.113 (0.010 sec)
OUTPUT: Step 4000: loss = 0.104 (0.010 sec)
OUTPUT: Step 4100: loss = 0.125 (0.010 sec)
OUTPUT: Step 4200: loss = 0.081 (0.011 sec)
OUTPUT: Step 4300: loss = 0.064 (0.010 sec)
OUTPUT: Step 4400: loss = 0.115 (0.011 sec)
OUTPUT: Step 4500: loss = 0.174 (0.011 sec)
OUTPUT: Step 4600: loss = 0.055 (0.010 sec)
OUTPUT: Done training for 20 epochs, 4633 steps.

After training is completed execute the following cell to determine how accurate your model is.



In [9]:

    
!python exercises/cnn/runEval.py --data_dir /data









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_conv1 shape (5, 5, 1, 100)
DEBUG conv1_op shape (1, 128, 128, 100)
DEBUG relu1_op shape (1, 128, 128, 100)
DEBUG pool1_op shape (1, 64, 64, 100)
DEBUG W_conv2 shape (5, 5, 100, 200)
DEBUG conv2_op shape (1, 32, 32, 200)
DEBUG relu2_op shape (1, 32, 32, 200)
DEBUG pool2_op shape (1, 16, 16, 200)
DEBUG W_conv3 shape (3, 3, 200, 300)
DEBUG conv3_op shape (1, 16, 16, 300)
DEBUG relu3_op shape (1, 16, 16, 300)
DEBUG W_conv4 shape (3, 3, 300, 300)
DEBUG conv4_op shape (1, 16, 16, 300)
DEBUG relu4_op shape (1, 16, 16, 300)
DEBUG drop_op shape (1, 16, 16, 300)
DEBUG W_score_classes_shape (1, 1, 300, 2)
DEBUG score_conv_op shape (1, 16, 16, 2)
DEBUG W_upscore shape (31, 31, 2, 2)
DEBUG upscore_conv_op shape (1, 256, 256, 2)
DEBUG logits eval shape before (1, 256, 256, 2)
DEBUG labels eval shape before (1, 256, 256)
DEBUG logits eval shape after (1, 256, 256, 2)
DEBUG labels eval shape after (1, 256, 256)
DEBUG correct shape (65536,)
OUTPUT: 2017-05-09 17:39:12.155020: precision = 0.985
OUTPUT: 26 images evaluated from file /data/val_images.tfrecords

In the runTraining.py command you ran two cells above you can add a few more command line arguments to test different training parameters. If you have time, experiment with the --num_epochs argument and see how that affects your training accuracy.

The full list of command line arguments you can are given below.

optional arguments:
  -h, --help            show this help message and exit
  --learning_rate LEARNING_RATE
                        Initial learning rate.
  --decay_rate DECAY_RATE
                        Learning rate decay.
  --decay_steps DECAY_STEPS
                        Steps at each learning rate.
  --num_epochs NUM_EPOCHS
                        Number of epochs to run trainer.
  --data_dir DATA_DIR   Directory with the training data.
  --checkpoint_dir CHECKPOINT_DIR
                        Directory where to write model checkpoints.

NOTE: If you examine the source code you'll also see an option to change the batch size. For purposes of this lab please leave the batch size set to 1.

What's the best accuracy you can achieve? As an example, with 1 epoch of training we achieved an accuracy of 56.7%:

OUTPUT: 2017-01-27 17:41:52.015709: precision = 0.567

When increasing the number of epochs to 30, we obtained a much higher accuracy of this:

OUTPUT: 2017-01-27 17:47:59.604529: precision = 0.983

As you can see when we increase the training epochs we see a significant increase in accuracy. In fact an accuracy of 98.3% is quite good. Is this accuracy good enough? Are we finished?

Accuracy

As part of the discussion of our accuracy we need to take a step back and consider fully what exactly we are computing when we check accuracy. Our current accuracy metric is simply telling us how many pixels we are computing correctly. So in the case above with 30 epochs, we are correctly predicting the value of a pixel 98.3% of the time. However, notice from the images above that the region of LV is typically quite small compared to the entire image size. This leads to a problem called class imbalance, i.e., one class is much more probable than the other class. In our case, if we simply designed a network to output the class notLV for every output pixel, we'd still have something like 95% accuracy. But that would be a seemingly useless network. What we need is an accuracy metric that gives us some indication of how well our network segments the left ventricle irrespective of the imbalance.

Task 3 -- CNN with Dice Metric

One metric we can use to more accurately determine how well our network is segmenting LV is called the Dice metric or Sorensen-Dice coefficient, among other names. This is a metric to compare the similarity of two samples. In our case we'll use it to compare the two areas of interest, i.e., the area of the expertly-labelled contour and the area of our predicted contour. The formula for computing the Dice metric is:

$$ \frac{2A_{nl}}{A_{n} + A_{l}} $$

where $A_n$ is the area of the contour predicted by our neural network, $A_l$ is the area of the contour from the expertly-segmented label and $A_{nl}$ is the intersection of the two, i.e., the area of the contour that is predicted correctly by the network. 1.0 means perfect score.

This metric will more accurately compute how well our network is segmenting the LV because the class imbalance problem is negated. Since we're trying to determine how much area is contained in a particular contour, we can simply count the pixels to give us the area.

If you're interested in see how the Dice metric is added to the accuracy computation you can view that in the source code file neuralnetwork.py.

Run the training by executing the cell below with 1 epoch and then check your accuracy by running the evaluation (two cells down). Then try running the training with 30 epochs. This is similar to what you may have done with the previous task. Check your accuracy after 30 epochs as well. Visualize the results in TensorBoard.



In [12]:

    
!python exercises/cnnDice/runTraining.py --data_dir /data --num_epochs 30









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_conv1 shape (5, 5, 1, 100)
DEBUG conv1_op shape (1, 128, 128, 100)
DEBUG relu1_op shape (1, 128, 128, 100)
DEBUG pool1_op shape (1, 64, 64, 100)
DEBUG W_conv2 shape (5, 5, 100, 200)
DEBUG conv2_op shape (1, 32, 32, 200)
DEBUG relu2_op shape (1, 32, 32, 200)
DEBUG pool2_op shape (1, 16, 16, 200)
DEBUG W_conv3 shape (3, 3, 200, 300)
DEBUG conv3_op shape (1, 16, 16, 300)
DEBUG relu3_op shape (1, 16, 16, 300)
DEBUG W_conv4 shape (3, 3, 300, 300)
DEBUG conv4_op shape (1, 16, 16, 300)
DEBUG drop_op shape (1, 16, 16, 300)
DEBUG W_score_classes_shape (1, 1, 300, 2)
DEBUG score_conv_op shape (1, 16, 16, 2)
DEBUG W_upscore shape (31, 31, 2, 2)
DEBUG upscore_conv_op shape (1, 256, 256, 2)
DEBUG logits shape before (1, 256, 256, 2)
DEBUG labels shape before (1, 256, 256)
DEBUG logits shape after (1, 256, 256, 2)
DEBUG labels shape after (1, 256, 256)
OUTPUT: Step 0: loss = 0.723 (0.582 sec)
OUTPUT: Step 100: loss = 0.701 (0.010 sec)
OUTPUT: Step 200: loss = 0.696 (0.010 sec)
OUTPUT: Step 300: loss = 0.715 (0.011 sec)
OUTPUT: Step 400: loss = 0.688 (0.010 sec)
OUTPUT: Step 500: loss = 0.686 (0.010 sec)
OUTPUT: Step 600: loss = 0.681 (0.011 sec)
OUTPUT: Step 700: loss = 0.659 (0.010 sec)
OUTPUT: Step 800: loss = 0.625 (0.010 sec)
OUTPUT: Step 900: loss = 0.553 (0.010 sec)
OUTPUT: Step 1000: loss = 0.397 (0.011 sec)
OUTPUT: Step 1100: loss = 0.238 (0.010 sec)
OUTPUT: Step 1200: loss = 0.227 (0.010 sec)
OUTPUT: Step 1300: loss = 0.291 (0.010 sec)
OUTPUT: Step 1400: loss = 0.212 (0.011 sec)
OUTPUT: Step 1500: loss = 0.278 (0.010 sec)
OUTPUT: Step 1600: loss = 0.168 (0.010 sec)
OUTPUT: Step 1700: loss = 0.230 (0.010 sec)
OUTPUT: Step 1800: loss = 0.270 (0.010 sec)
OUTPUT: Step 1900: loss = 0.214 (0.011 sec)
OUTPUT: Step 2000: loss = 0.159 (0.010 sec)
OUTPUT: Step 2100: loss = 0.153 (0.010 sec)
OUTPUT: Step 2200: loss = 0.165 (0.010 sec)
OUTPUT: Step 2300: loss = 0.140 (0.010 sec)
OUTPUT: Step 2400: loss = 0.081 (0.010 sec)
OUTPUT: Step 2500: loss = 0.183 (0.011 sec)
OUTPUT: Step 2600: loss = 0.179 (0.011 sec)
OUTPUT: Step 2700: loss = 0.065 (0.010 sec)
OUTPUT: Step 2800: loss = 0.172 (0.011 sec)
OUTPUT: Step 2900: loss = 0.121 (0.011 sec)
OUTPUT: Step 3000: loss = 0.083 (0.010 sec)
OUTPUT: Step 3100: loss = 0.061 (0.010 sec)
OUTPUT: Step 3200: loss = 0.115 (0.011 sec)
OUTPUT: Step 3300: loss = 0.120 (0.011 sec)
OUTPUT: Step 3400: loss = 0.143 (0.011 sec)
OUTPUT: Step 3500: loss = 0.110 (0.010 sec)
OUTPUT: Step 3600: loss = 0.093 (0.010 sec)
OUTPUT: Step 3700: loss = 0.142 (0.010 sec)
OUTPUT: Step 3800: loss = 0.042 (0.010 sec)
OUTPUT: Step 3900: loss = 0.039 (0.010 sec)
OUTPUT: Step 4000: loss = 0.054 (0.011 sec)
OUTPUT: Step 4100: loss = 0.167 (0.011 sec)
OUTPUT: Step 4200: loss = 0.043 (0.010 sec)
OUTPUT: Step 4300: loss = 0.074 (0.010 sec)
OUTPUT: Step 4400: loss = 0.069 (0.011 sec)
OUTPUT: Step 4500: loss = 0.107 (0.011 sec)
OUTPUT: Step 4600: loss = 0.147 (0.010 sec)
OUTPUT: Step 4700: loss = 0.074 (0.010 sec)
OUTPUT: Step 4800: loss = 0.222 (0.010 sec)
OUTPUT: Step 4900: loss = 0.035 (0.010 sec)
OUTPUT: Step 5000: loss = 0.082 (0.010 sec)
OUTPUT: Step 5100: loss = 0.088 (0.010 sec)
OUTPUT: Step 5200: loss = 0.061 (0.010 sec)
OUTPUT: Step 5300: loss = 0.054 (0.011 sec)
OUTPUT: Step 5400: loss = 0.061 (0.010 sec)
OUTPUT: Step 5500: loss = 0.083 (0.010 sec)
OUTPUT: Step 5600: loss = 0.060 (0.010 sec)
OUTPUT: Step 5700: loss = 0.061 (0.010 sec)
OUTPUT: Step 5800: loss = 0.054 (0.011 sec)
OUTPUT: Step 5900: loss = 0.046 (0.010 sec)
OUTPUT: Step 6000: loss = 0.058 (0.010 sec)
OUTPUT: Step 6100: loss = 0.047 (0.010 sec)
OUTPUT: Step 6200: loss = 0.105 (0.010 sec)
OUTPUT: Step 6300: loss = 0.116 (0.011 sec)
OUTPUT: Step 6400: loss = 0.061 (0.010 sec)
OUTPUT: Step 6500: loss = 0.068 (0.011 sec)
OUTPUT: Step 6600: loss = 0.068 (0.011 sec)
OUTPUT: Step 6700: loss = 0.108 (0.010 sec)
OUTPUT: Step 6800: loss = 0.047 (0.010 sec)
OUTPUT: Step 6900: loss = 0.084 (0.011 sec)
OUTPUT: Done training for 30 epochs, 6950 steps.



In [13]:

    
!python exercises/cnnDice/runEval.py --data_dir /data









    



DEBUG images shape inference (1, 256, 256)
DEBUG images shape inference (1, 256, 256)
DEBUG W_conv1 shape (5, 5, 1, 100)
DEBUG conv1_op shape (1, 128, 128, 100)
DEBUG relu1_op shape (1, 128, 128, 100)
DEBUG pool1_op shape (1, 64, 64, 100)
DEBUG W_conv2 shape (5, 5, 100, 200)
DEBUG conv2_op shape (1, 32, 32, 200)
DEBUG relu2_op shape (1, 32, 32, 200)
DEBUG pool2_op shape (1, 16, 16, 200)
DEBUG W_conv3 shape (3, 3, 200, 300)
DEBUG conv3_op shape (1, 16, 16, 300)
DEBUG relu3_op shape (1, 16, 16, 300)
DEBUG W_conv4 shape (3, 3, 300, 300)
DEBUG conv4_op shape (1, 16, 16, 300)
DEBUG drop_op shape (1, 16, 16, 300)
DEBUG W_score_classes_shape (1, 1, 300, 2)
DEBUG score_conv_op shape (1, 16, 16, 2)
DEBUG W_upscore shape (31, 31, 2, 2)
DEBUG upscore_conv_op shape (1, 256, 256, 2)
DEBUG logits eval shape before (1, 256, 256, 2)
DEBUG labels eval shape before (1, 256, 256)
DEBUG logits_re eval shape after (65536, 2)
DEBUG labels_re eval shape after (65536,)
DEBUG labels_re eval shape after (65536, 1)
DEBUG indices shape (65536, 1)
DEBUG example_sum shape ()
DEBUG label_sum shape ()
DEBUG sum_tensor shape (65536, 1)
DEBUG twos shape (65536, 1)
DEBUG divs shape (65536, 1)
OUTPUT: 2017-05-09 17:44:14.256720: Dice metric = 0.569
OUTPUT: 26 images evaluated from file /data/val_images.tfrecords

If you run with one epoch you're likely to get a result of less than 1% accuracy. In a prior run we obtained

OUTPUT: 2017-01-27 18:44:04.103153: Dice metric = 0.034

for 1 epoch. If you try with 30 epochs you might get around 57% accuracy.

OUTPUT: 2017-01-27 18:56:45.501209: Dice metric = 0.568

With a more realistic accuracy metric, you can see that there is some room for improvement in the neural network.

Parameter Search

At this point we've created a neural network that we think has the right structure to do a reasonably good job and we've used an accuracy metric that correctly tells us how well our network is learning the segmentation task. But up to this point our evaluation accuracy hasn't been as high as we'd like. The next thing to consider is that we should try to search the parameter space a bit more. Up to now we've changed the number of epochs but that's all we've adjusted. There are a few more parameters we can test that could push our accuracy score higher. These are:

--learning_rate: the initial learning rate
--decay_rate: the rate that the initial learning rate decays., e.g., 1.0 is no decay, 0.5 means cut the decay rate in half each step, etc.
--decay_steps: the number of steps to execute before changing the learning rate

The learning rate is the rate at which the weights are adjusted each time we run back propogation. If the learning rate is too large, we might end up adjusting the weights by values that are too large and we'll end up oscillating around a correct solution instead of converging. If the learning rate is too small, the adjustments to the weights will be too small and it might take a very long time before we converge to a solution that we are happy with. One technique often utilized is a variable, or adjustable learning rate. At the beginning of training, we'll use a larger learning rate so that we make large adjustments to the weights and hopefully get in the neighborhood of a good solution. Then as we continue to train we'll successively decrease the learning rate so that we can begin to zero in on a solution. The three parameters listed above will help you control the learning rate, how much it changes, and how often it changes. As a baseline, if you don't select those options, the default used (and that you've been using so far in this lab) are:

--learning_rate 0.01
--decay_rate 1.0
--decay_steps 1000
--num_epochs 1

Play around with these values by running training in the next cell and see if you can come up with better accuracy than seen earlier. We don't recommend running more than 100 epochs due to time constraints of the lab, but in production you would quite likely run a lot more epochs.

Conveniently, if you start a training run and realize the number of epochs is too large you can kill that run and still test the model by running the evaluation (two cells down). TensorFlow has checkpointing abilities that snapshot the model periodically so the most recent snapshot will still be retained after you kill the training run.



In [ ]:

    
!python exercises/cnnDice/runTraining.py --data_dir /data --num_epochs 1 --learning_rate 0.01 --decay_rate 1.0 --decay_steps 1000



In [ ]:

    
!python exercises/cnnDice/runEval.py --data_dir /data

One of the solutions we obtained was 86% accuracy. Check A to see what parameters we used for the training.

Further enhancements

For illustrative purposes we focused on smaller tasks that could run in the time we have alloted for this lab, but if we were going to run an image segmentation task in a production setting what more would we do to accomplish this? A few things we'd do are the following.

Run Training Longer -- We ran very short training runs but in reality we'd run many more epochs.
More Training Data -- We only had 236 images in our training set. We could gather more data and we could also augment the data we have. TensorFlow has built-in functions to flip/rotate/transpose images automatically.
Larger networks -- We could try using AlexNet or other large CNN and convert them to FCN.

Summary

In this lab you had a chance to explore image segmentation with TensorFlow as the framework of choice. You saw how to convert a standard CNN into an FCN for use as a segmentation network. You also saw how choosing the correct accuracy metric is crucial in training the network. Finally you had a chance to see how performing a parameter search is an integral part of the deep learning workflow to ultimately settle on a network that performs with acceptable accuracy for the task at hand.

Learn More

If you are interested in learning more, you can use the following resources:

Learn more at the CUDA Developer Zone.
If you have an NVIDIA GPU in your system, you can download and install the CUDA tookit.
Take the fantastic online and free Udacity Intro to Parallel Programming course which uses CUDA C.
Search or ask questions on Stackoverflow using the cuda tag

Post-Lab

Finally, don't forget to save your work from this lab before time runs out and the instance shuts down!!

Save this IPython Notebook by going to File -> Download as -> IPython (.ipynb) at the top of this window
You can execute the following cell block to create a zip-file of the files you've been working on, and download it with the link below.

Lab FAQ

Q: I'm encountering issues executing the cells, or other technical problems?
A: Please see this infrastructure FAQ.

Q: I'm getting unexpected behavior (i.e., incorrect output) when running any of the tasks.
A: It's possible that one or more of the CUDA Runtime API calls are actually returning an error. Are you getting any errors printed to the screen about CUDA Runtime errors?

References

[1] Sunnybrook cardiac images from earlier competition http://smial.sri.utoronto.ca/LV_Challenge/Data.html

[2] This "Sunnybrook Cardiac MR Database" is made available under the CC0 1.0 Universal license described above, and with more detail here: http://creativecommons.org/publicdomain/zero/1.0/

[3] Attribution: Radau P, Lu Y, Connelly K, Paul G, Dick AJ, Wright GA. "Evaluation Framework for Algorithms Segmenting Short Axis Cardiac MRI." The MIDAS Journal -Cardiac MR Left Ventricle Segmentation Challenge, http://hdl.handle.net/10380/3070

[4] http://fcn.berkeleyvision.org/

[5] Long, Shelhamer, Darrell; "Fully Convoutional Networks for Semantic Segmentation", CVPR 2015.

[6] Zeiler, Krishnan, Taylor, Fergus; "Deconvolutional Networks", CVPR 2010.

[7] https://www.kaggle.com/c/second-annual-data-science-bowl/details/deep-learning-tutorial

Solutions

[A] The following configuration will result in roughly 86% accuracy.

--learning_rate 0.03
--decay_rate 0.75
--num_epochs 100
--decay_steps 10000
OUTPUT: 2017-01-27 20:19:08.702868: Dice metric = 0.862