There are a variety of important image analysis deep learning applications that need to go beyond detecting individual objects within an image and instead segment the image into spatial regions of interest. For example, in medical imagery analysis it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells so that we can isolate a particular organ. In this self-paced, hands-on lab we will use the TensorFlow machine learning framework to train and evaluate an image segmentation network using a medical imagery dataset.
Lab created by Jonathan Bentz (follow @jnbntz on Twitter).
Before we begin, let's verify WebSockets are working on your system. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the gray cell. If not, please consult the Self-paced Lab Troubleshooting FAQ to debug the issue.
In [1]:
print "The answer should be three: " + str(1+2)
Let's execute the cell below to display information about the GPUs running on the server.
In [2]:
!nvidia-smi
Finally, execute the following cell to show the version of TensorFlow that is used in this lab.
In [3]:
!python -c 'import tensorflow as tf; print(tf.__version__)'
If you have never before taken an IPython Notebook based self-paced lab from NVIDIA, recommend watching this short YouTube video
In this lab you will work through a series of exercises performing image segmentation, also called semantic segmentation. Semantic segmentation is the task of placing each pixel into a specific class. In a sense it's a classification problem where you'll classify on a pixel basis rather than an entire image. In this lab the task will be classifying each pixel in a cardiac MRI image based on whether the pixel is a part of the left ventricle (LV) or not.
This lab is not an introduction to deep learning, nor is it intended to be a rigorous mathematical formalism of convolutional neural networks. We'll assume that you have at least a passing understanding of neural networks including concepts like forward and backpropagation, activations, SGD, convolutions, pooling, bias, and the like. It is helpful if you've encountered convolutional neural networks (CNN) already and you understand image recognition tasks. The lab will use Google's TensorFlow machine learning framework so if you have Python and TensorFlow experience it is helpful, but not required. Most of the work we'll do in this lab is not coding per se, but setting up and running training and evaluation tasks using TensorFlow.
The data set you'll be utilizing is a series of cardiac images (specifically MRI short-axis (SAX) scans) that have been expertly labeled. See References [1, 2, 3] for full citation information.
Four representative examples of the data are shown below. Each row of images is an instance of the data. On the left are the MRI images and the right are the expertly-segmented regions (often called contours). The portions of the images that are part of the LV are denoted in white. Note that the size of LV varies from image to image, but the LV typically takes up a relatively small region of the entire image.
The data extraction from the raw images and then subsequent preparation of these images for ingestion into TensorFlow will not be showcased in this lab. Suffice it to say that data preparation is a non-trivial aspect of machine learning workflows and is outside the scope of this lab.
For those that are interested in the details, we obtained guidance and partial code from a prior Kaggle competition on how to extract the images properly. At that point we took the images, converted them to TensorFlow records (TFRecords), and stored them to files. TFRecords are a special file format provided by TensorFlow, which allow you to use built-in TensorFlow functions for data management including multi-threaded data reading and sophisticated pre-processing of the data such as randomizing and even augmenting the training data.
The images themselves are originally 256 x 256 grayscale DICOM format, a common image format in medical imaging. The label is a tensor of size 256 x 256 x 2. The reason the last dimension is a 2 is that the pixel is in one of two classes so each pixel label has a vector of size 2 associated with it. The training set is 234 images and the validation set (data NOT used for training but used to test the accuracy of the model) is 26 images.
This lab is part of a series of self-paced labs designed to introduce you to some of the publicly-available deep learning frameworks available today. TensorFlow is a framework developed by Google and used by numerous researchers and product groups within Google.
TensorFlow is an open source software library for machine intelligence. The computations are expressed as data flow graphs which operate on tensors (hence the name). If you can express your computation in this manner you can run your algorithm in the TensorFlow framework.
TensorFlow is portable in the sense that you can run on CPUs and GPUs and utilize workstations, servers, and even deploy models on mobile platforms. At present TensorFlow offers the options of expressing your computation in either Python or C++, with experimental support for Go and JAVA. A typical usage of TensorFlow would be performing training and testing in Python and once you have finalized your model you might deploy with C++.
TensorFlow is designed and built for performance on both CPUs and GPUs. Within a single TensorFlow execution you have lots of flexibility in that you can assign different tasks to CPUs and GPUs explicitly if necessary. When running on GPUs TensorFlow utilizes a number of GPU libraries including cuDNN allowing it to extract the most performance possible from the very newest GPUs available.
One of the intents of this lab is to gain an introductory level of familiarity with TensorFlow. In the course of this short lab we won't be able to discuss all the features and options of TensorFlow but we hope that after completion of this lab you'll feel comfortable with and have a good idea how to move forward using TensorFlow to solve your specific machine learning problems.
For comprehensive documentation on TensorFlow we recommend the TensorFlow website, the whitepaper, and the GitHub site.
TensorFlow has multiple ways to be used depending on your preferences. For designing training tasks a common way is to use the TensorFlow Python API. Running a machine learning training task on TensorFlow consists of (at least) two distinct steps.
First you will construct a data flow graph which is the specification and ordering of exactly what computations you'd like to perform. Using the TensorFlow API you construct a neural network layer-by-layer using any of the TensorFlow-provided operations such as convolutions, activations, pooling, etc. This stage of the process doesn't do any actual computation on your data; it merely constructs the graph that you've specified.
When you build the graph you must specify each so-called Variable
(in TensorFlow lexicon). Specifying a piece of data as a Variable
tells TensorFlow that it will be a parameter to be "learned", i.e., it is a weight that will be updated as the training proceeds.
Once you've defined your neural network as a data flow graph, you will launch a Session
. This is the mechanism whereby you specify the input data and training parameters to your previously-constructed graph and then the computation proceeds.
In general these two steps will be repeated each time you wish to change your graph, i.e., you'll update the graph and launch a new session.
A sample workflow of training and evaluating a model might look like the following.
Session
and loop over your input data. Customize your batch size, number of epochs, learning rate, etc.TensorFlow provides a feature-rich tool called TensorBoard that allows you to visualize many aspects of your program. In TensorBoard you can see a visual representation of your computation graph and you can plot different metrics of your computation such as loss, accuracy, and learning rate. Essentially any data that is generated during the execution of TensorFlow can be visually displayed by TensorBoard with the addition of a few extra API calls in your program.
For example, consider the following code snippet that creates a neural network with one hidden layer (don't worry about the details of the code at this time).
with tf.name_scope('Hidden1'):
W_fc = tf.Variable(tf.truncated_normal( [256*256, 512],
stddev=0.1, dtype=tf.float32), name='W_fc')
flatten1_op = tf.reshape( images_re, [-1, 256*256])
h_fc1 = tf.matmul( flatten1_op, W_fc )
with tf.name_scope('Final'):
W_fc2 = tf.Variable(tf.truncated_normal( [512, 256*256*2],
stddev=0.1, dtype=tf.float32), name='W_fc2' )
h_fc2 = tf.matmul( h_fc1, W_fc2 )
h_fc2_re = tf.reshape( h_fc2, [-1, 256, 256, 2] )
return h_fc2_re
TensorBoard will display the neural network like the figure below. If you look closely you'll see that the edges have the tensor dimensions printed, i.e., as you move node-to-node you can follow how the data (as a tensor) and it's size changes throughout the graph.
The first task we'll consider will be to create, train and evaluate a fully-connected neural network with one hidden layer. The input to the neural network will be the value of each pixel, i.e., a size 256 x 256 (or 65,536) array. The hidden layer will have a size that you can adjust, and the output will be an array of 256 x 256 x 2, i.e., each input pixel can be in either one of two classes so the output value associated with each pixel will be the probability that the pixel is in that particular class. In our case the two classes are LV or not. We'll compute the loss via a TensorFlow function called sparse_softmax_cross_entropy_with_logits
which simply combines the softmax with the cross entropy calculation into one function call.
In [4]:
!python exercises/simple/runTraining.py --data_dir /data
If everything is working properly you may see some messages printed to the screen. Some of those are informational message from TensorFlow and can typically be ignored. You'll want to look for the lines that start with "OUTPUT", as those are the lines we inserted in the program specifically to output information at particular points, such as the loss that is computed every 100 steps. The very last line you'll see is:
OUTPUT: Done training for 1 epochs, 231 steps
.
This means your training job was one complete epoch through all the training data.
In [5]:
!python exercises/simple/runEval.py --data_dir /data
Again you're ignoring most of the TensorFlow output while focusing your attention on the lines that begin with "OUTPUT". When I ran this I obtained the following output. You should see something similar.
OUTPUT: 2017-01-26 17:12:28.929741: precision = 0.503
OUTPUT: 26 images evaluated from file /data/val_images.tfrecords
The final output lines show the accuracy of the model, i.e., how well the model predicted whether each pixel is in LV (or not) versus the ground truth. In the case above, 0.503 is 50.3%, so the model predicted the correct class for each pixel roughly half the time. This is not a great result, but it's not awful either considering we only ran a very simple network and only ran training for one epoch.
At this point, if you haven't already you'll want to launch TensorBoard here. TensorBoard has a lot of impressive visualization features. At the top menu there is a link called "Scalars" which you can click to show you some of the information that has been captured. You can click to expand any of them and see a plot of that data.
Another menu choice at the top is "Graphs". If you choose that you can view both your training and evaluation data flow graphs. Each node in the graph can be clicked to expand it and you can obtain more detailed information about that node. In the upper left of the page you can choose whether you want to view the training or evaluation graph, via a small dropdown selection.
The code solution to this task is given below.
with tf.name_scope('Hidden1'):
W_fc = tf.Variable(tf.truncated_normal( [256*256, 512],
stddev=0.1, dtype=tf.float32), name='W_fc')
flatten1_op = tf.reshape( images_re, [-1, 256*256])
h_fc1 = tf.matmul( flatten1_op, W_fc )
with tf.name_scope('Final'):
W_fc2 = tf.Variable(tf.truncated_normal( [512, 256*256*2],
stddev=0.1, dtype=tf.float32), name='W_fc2' )
h_fc2 = tf.matmul( h_fc1, W_fc2 )
h_fc2_re = tf.reshape( h_fc2, [-1, 256, 256, 2] )
return h_fc2_re
You'll notice it's Python syntax with some TensorFlow API calls.
tf.name_scope()
lets you name that particular scope of the program. It's useful both for organizing the code and for giving a name to the node in the TensorBoard graph.tf.Variable()
indicates a TensorFlow variable that will be trained, i.e., it's a tensor of weights.tf.reshape()
is a TensorFlow auxiliary function for reshaping tensors so that they'll be the proper shape for upcoming operations.tf.matmul()
is as you'd expect. It's a matrix multiply of two TensorFlow tensors.We skipped over a number of topics that we'll mention for completeness but won't discuss in detail.
batch_size
and num_epochs
automatically. Our second task will be to convert our model to a more sophisticated network that includes more layers and types than above. The previous example focused on each individual pixel but had no way to account for the fact that the regions of interest are likely larger than a single pixel. We'd like to capture small regions of interest as well and for that we'll utilize convolutional layers which can capture larger receptive fields.
We'll also add pooling layers which down-sample the data while attempting to retain most of the information. This eliminates some computational complexity.
Up to this point we've described layers that are commonly associated with image recognition neural networks, where the number of output nodes is equal to the number of classes. Recall that we're doing more than classifying the image; we're classifying each pixel in the image so our output size will be the number of classes (2) times the number of pixels (256 x 256). Additionally the spatial location of the output nodes are important as well, since each pixel has an associated probability that it's part of LV (or not).
CNN's are well-established as excellent choices for image recognition or classification tasks. Our task in this lab is segmentation, which is related to classification in a sense. We are classifying each pixel in the image, rather than the entire image altogether. So the question becomes, can we utilize the same type of CNN that is already shown to do very well on image recognition, for our task of segmentation? It turns out that we can make some modifications to CNN models to do this.
We can accomplish this by using a standard image recognition neural network, and replacing the fully-connected layers (typically the last few layers) with deconvolution layers (arguably more accurately called transpose convolution layers)
Deconvolution is an upsampling method that brings a smaller image data set back up to it's original size for final pixel classification. There are a few good resources [4, 5, 6] that we'd recommend on this topic. When modifying a CNN to adapt it to segmentation the resulting neural network is commonly called a fully convolutional network, or FCN.
It can be helpful to visualize how the input data (in our case a tensor of size 256 x 256 x 1) "flows" through the graph, i.e., how the data is transformed via the different operations of convolution, pooling and such. The figure below represents the transformations that our data will undergo in the next task.
The network represented by the figure above is similar to the network that's found in ref [7]. It consists of convolution layers, pooling layers, and a final deconvolution layer, with the input image being transformed as indicated in the image.
This task requires you to finish this neural network and then run the training. To accomplish this edit the file exercises/cnn/neuralnetwork.py
and replace all the instances of FIXME
with code. There are comments in the code to help you and you can use the following network structure to help as well. The names of the layers will make more sense as you examine and complete the code.
If you want to check your work you can view the solution at exercise_solutions/cnn/neuralnetwork.py
.
Once you complete the code you can begin running training using the box below and visualizing your results via the TensorBoard browser window you opened in the previous task. If you don't immediately see your results, you may have to wait a short time for TensorBoard to recognize that there are new graphs to visualize. You may also have to refresh your browser.
In [8]:
!python exercises/cnn/runTraining.py --num_epochs 20 --data_dir /data
After training is completed execute the following cell to determine how accurate your model is.
In [9]:
!python exercises/cnn/runEval.py --data_dir /data
In the runTraining.py
command you ran two cells above you can add a few more command line arguments to test different training parameters. If you have time, experiment with the --num_epochs
argument and see how that affects your training accuracy.
The full list of command line arguments you can are given below.
optional arguments:
-h, --help show this help message and exit
--learning_rate LEARNING_RATE
Initial learning rate.
--decay_rate DECAY_RATE
Learning rate decay.
--decay_steps DECAY_STEPS
Steps at each learning rate.
--num_epochs NUM_EPOCHS
Number of epochs to run trainer.
--data_dir DATA_DIR Directory with the training data.
--checkpoint_dir CHECKPOINT_DIR
Directory where to write model checkpoints.
NOTE: If you examine the source code you'll also see an option to change the batch size. For purposes of this lab please leave the batch size set to 1.
What's the best accuracy you can achieve? As an example, with 1 epoch of training we achieved an accuracy of 56.7%:
OUTPUT: 2017-01-27 17:41:52.015709: precision = 0.567
When increasing the number of epochs to 30, we obtained a much higher accuracy of this:
OUTPUT: 2017-01-27 17:47:59.604529: precision = 0.983
As you can see when we increase the training epochs we see a significant increase in accuracy. In fact an accuracy of 98.3% is quite good. Is this accuracy good enough? Are we finished?
As part of the discussion of our accuracy we need to take a step back and consider fully what exactly we are computing when we check accuracy. Our current accuracy metric is simply telling us how many pixels we are computing correctly. So in the case above with 30 epochs, we are correctly predicting the value of a pixel 98.3% of the time. However, notice from the images above that the region of LV is typically quite small compared to the entire image size. This leads to a problem called class imbalance, i.e., one class is much more probable than the other class. In our case, if we simply designed a network to output the class notLV for every output pixel, we'd still have something like 95% accuracy. But that would be a seemingly useless network. What we need is an accuracy metric that gives us some indication of how well our network segments the left ventricle irrespective of the imbalance.
One metric we can use to more accurately determine how well our network is segmenting LV is called the Dice metric or Sorensen-Dice coefficient, among other names. This is a metric to compare the similarity of two samples. In our case we'll use it to compare the two areas of interest, i.e., the area of the expertly-labelled contour and the area of our predicted contour. The formula for computing the Dice metric is:
$$ \frac{2A_{nl}}{A_{n} + A_{l}} $$where $A_n$ is the area of the contour predicted by our neural network, $A_l$ is the area of the contour from the expertly-segmented label and $A_{nl}$ is the intersection of the two, i.e., the area of the contour that is predicted correctly by the network. 1.0 means perfect score.
This metric will more accurately compute how well our network is segmenting the LV because the class imbalance problem is negated. Since we're trying to determine how much area is contained in a particular contour, we can simply count the pixels to give us the area.
If you're interested in see how the Dice metric is added to the accuracy computation you can view that in the source code file neuralnetwork.py
.
Run the training by executing the cell below with 1 epoch and then check your accuracy by running the evaluation (two cells down). Then try running the training with 30 epochs. This is similar to what you may have done with the previous task. Check your accuracy after 30 epochs as well. Visualize the results in TensorBoard.
In [12]:
!python exercises/cnnDice/runTraining.py --data_dir /data --num_epochs 30
In [13]:
!python exercises/cnnDice/runEval.py --data_dir /data
If you run with one epoch you're likely to get a result of less than 1% accuracy. In a prior run we obtained
OUTPUT: 2017-01-27 18:44:04.103153: Dice metric = 0.034
for 1 epoch. If you try with 30 epochs you might get around 57% accuracy.
OUTPUT: 2017-01-27 18:56:45.501209: Dice metric = 0.568
With a more realistic accuracy metric, you can see that there is some room for improvement in the neural network.
At this point we've created a neural network that we think has the right structure to do a reasonably good job and we've used an accuracy metric that correctly tells us how well our network is learning the segmentation task. But up to this point our evaluation accuracy hasn't been as high as we'd like. The next thing to consider is that we should try to search the parameter space a bit more. Up to now we've changed the number of epochs but that's all we've adjusted. There are a few more parameters we can test that could push our accuracy score higher. These are:
The learning rate is the rate at which the weights are adjusted each time we run back propogation. If the learning rate is too large, we might end up adjusting the weights by values that are too large and we'll end up oscillating around a correct solution instead of converging. If the learning rate is too small, the adjustments to the weights will be too small and it might take a very long time before we converge to a solution that we are happy with. One technique often utilized is a variable, or adjustable learning rate. At the beginning of training, we'll use a larger learning rate so that we make large adjustments to the weights and hopefully get in the neighborhood of a good solution. Then as we continue to train we'll successively decrease the learning rate so that we can begin to zero in on a solution. The three parameters listed above will help you control the learning rate, how much it changes, and how often it changes. As a baseline, if you don't select those options, the default used (and that you've been using so far in this lab) are:
--learning_rate 0.01
--decay_rate 1.0
--decay_steps 1000
--num_epochs 1
Play around with these values by running training in the next cell and see if you can come up with better accuracy than seen earlier. We don't recommend running more than 100 epochs due to time constraints of the lab, but in production you would quite likely run a lot more epochs.
Conveniently, if you start a training run and realize the number of epochs is too large you can kill that run and still test the model by running the evaluation (two cells down). TensorFlow has checkpointing abilities that snapshot the model periodically so the most recent snapshot will still be retained after you kill the training run.
In [ ]:
!python exercises/cnnDice/runTraining.py --data_dir /data --num_epochs 1 --learning_rate 0.01 --decay_rate 1.0 --decay_steps 1000
In [ ]:
!python exercises/cnnDice/runEval.py --data_dir /data
One of the solutions we obtained was 86% accuracy. Check A to see what parameters we used for the training.
For illustrative purposes we focused on smaller tasks that could run in the time we have alloted for this lab, but if we were going to run an image segmentation task in a production setting what more would we do to accomplish this? A few things we'd do are the following.
In this lab you had a chance to explore image segmentation with TensorFlow as the framework of choice. You saw how to convert a standard CNN into an FCN for use as a segmentation network. You also saw how choosing the correct accuracy metric is crucial in training the network. Finally you had a chance to see how performing a parameter search is an integral part of the deep learning workflow to ultimately settle on a network that performs with acceptable accuracy for the task at hand.
If you are interested in learning more, you can use the following resources:
Finally, don't forget to save your work from this lab before time runs out and the instance shuts down!!
File -> Download as -> IPython (.ipynb)
at the top of this windowQ: I'm encountering issues executing the cells, or other technical problems?
A: Please see this infrastructure FAQ.
Q: I'm getting unexpected behavior (i.e., incorrect output) when running any of the tasks.
A: It's possible that one or more of the CUDA Runtime API calls are actually returning an error. Are you getting any errors printed to the screen about CUDA Runtime errors?
[1] Sunnybrook cardiac images from earlier competition http://smial.sri.utoronto.ca/LV_Challenge/Data.html
[2] This "Sunnybrook Cardiac MR Database" is made available under the CC0 1.0 Universal license described above, and with more detail here: http://creativecommons.org/publicdomain/zero/1.0/
[3] Attribution: Radau P, Lu Y, Connelly K, Paul G, Dick AJ, Wright GA. "Evaluation Framework for Algorithms Segmenting Short Axis Cardiac MRI." The MIDAS Journal -Cardiac MR Left Ventricle Segmentation Challenge, http://hdl.handle.net/10380/3070
[4] http://fcn.berkeleyvision.org/
[5] Long, Shelhamer, Darrell; "Fully Convoutional Networks for Semantic Segmentation", CVPR 2015.
[6] Zeiler, Krishnan, Taylor, Fergus; "Deconvolutional Networks", CVPR 2010.
[7] https://www.kaggle.com/c/second-annual-data-science-bowl/details/deep-learning-tutorial