This python package allows for the rapid training and application of deep neural networks (DNNs). It wraps the Tensorflow package, allowing for easy setup of an architecture without needing to know the nitty-gritty details of how Tensorflow operates.
The focus of this package is to train and apply convolutional DNNs (CNN) with images. One-dimensional data could, hypothetically, be treated as a pseudo-image with the dimensions (N x 1 x 1) (spatial, spatial, channels). This should work but is currently untested.
This vignette will present a example of how to use the package.
The primary class in the package is the 'DNNWrapper', which is the only class with which an end-user needs to directly interact.
The package was tested to work with Python 2.7.12 and requires several additional python packages to function properly:
All of these packages are either part of the python distribution or can be installed via pip:
pip install <PACKAGE_NAME>
Additionally, the package 'matplotlib' is required in order to run this vignette (but not for the actual package)
In [8]:
import dnnSwift
# The layout is a list of dictionaries defining each layer of the DNN.
# For this example, the DNN will consist of two blocks of
# "convolution - convolution - maximum pooling", followed by a fully
# connected layer and finally a cross entropy cost function
dnn_layout = [
{
"name": "data",
"type": "input"
},
{
"name": "conv1",
"type": "conv",
"n_kernels": 32,
"size": (3, 3),
"stride": (1, 1),
"padding" : "VALID",
"input": "data"
},
{
"name": "conv2",
"type": "conv",
"n_kernels": 32,
"size": (3, 3),
"stride": (1, 1),
"padding" : "VALID",
"input": "conv1"
},
{
"name": "maxpool1",
"type": "maxpool",
"size": (3, 3),
"stride": (2, 2),
"padding" : "VALID",
"input": "conv2"
},
{
"name": "conv3",
"type": "conv",
"n_kernels": 64,
"size": (3, 3),
"stride": (1, 1),
"padding" : "VALID",
"input": "maxpool1"
},
{
"name": "conv4",
"type": "conv",
"n_kernels": 64,
"size": (3, 3),
"stride": (1, 1),
"padding" : "VALID",
"input": "conv3"
},
{
"name": "maxpool2",
"type": "maxpool",
"size": (3, 3),
"stride": (2, 2),
"padding" : "VALID",
"input": "conv4"
},
{"name": "fc1", "type": "fc", "input": "maxpool2"},
{"name": "CrossEntropy", "type": "cross_entropy", "input": "fc1"}]
# This is a dictionary that assigns numerical values to each label. The
# keys of the dictionary are the actual labels used in the hdf5 file and
# the values should be integery from 0 to N. Internally, labels are turned
# into one-hot vectors, so in this set up, "0" becomes (1, 0, 0), "5"
# becomes (0, 1, 0), and "7" becomes (0, 0, 1)
categories = {"0": 0, "5": 1, "7": 2}
# Initialize the DNN object.
my_dnn = dnnSwift.DNNWrapper(
categories=categories, layout=dnn_layout)
# The training data must be initialized. This means that the hdf5 file is
# scanned and the images split into a training, validation, and test set.
# The indices of the lists are written into the 'outfile' as a pickled
# dictionary.
my_dnn.initialize_training_data(
filename="demo_data/MNISTDemo.h5",
outfile="demo_data/MNISTDemo_split.pkl")
# Train the network. Set 'verbose=True' to see the full output, including
# the remaining time
my_dnn.train_dnn(
num_epochs=5, batch_size=128,
weights_dir="demo_data/weights", verbose=False)
The dnnSwift object writes its output into the 'weights_dir' directory. The output consists of weights and validation data. Validation data is calculated from data that was not trained on, i.e. it can be used for 1-fold cross validation of the training epochs.
In [9]:
import pickle
import os
# For each epoch, a weights file and a validation file are created.
# Counting starts at 0, i.e. *_0.pkl is the output AFTER the first epoch!
all_files = os.listdir("demo_data/weights")
print "All Files: "
for f in all_files:
print " - %s" % f
The validation files are pickled dictionaries containing various metrics.
In [10]:
with open("demo_data/weights/val_0.pkl", "r") as f:
val_dat = pickle.load(f)
keys = val_dat.keys()
print "Validation file type: " + str(type(val_dat))
print "Validation file keys:"
for key in keys:
print " - %s" % key
Lets determine the epoch with the best validation accuracy.
In [11]:
val_accs = []
for i in range(5):
with open("demo_data/weights/val_%s.pkl" % str(i), "r") as f:
val_accs.append(pickle.load(f)["top_1_acc"])
best_epoch = val_accs.index(max(val_accs))
print "Validation accuracies:"
for val_acc in val_accs:
print " - %.4f" % val_acc
print "Best epoch: %s" % str(best_epoch)
We've already seen that "top_1_acc" represents the top-1 accuracy. This is a common measure for determining the accuracy of neural networks. It represents the probability of the network correctly predicting the class of a validation data point. Additionally, the top-N accuracy is also calculated. This is the probability that the label of a validation data point is in the top N most likely classes predicted by the neural network.
This is calculated for $1 <= N <= \# Categories$. Here we have three categories into which we classify images.
In [14]:
with open("demo_data/weights/val_0.pkl", "r") as f:
val = pickle.load(f)["top_n_acc"]
for key in val.keys():
print "Top-%s accuracy: %s" % (key, val[key])
Beyond that, we can also calculate the precision-recall curves from the "precision" and "recall" keys. These entries contain nested dictionaries and are calculated for each label independently. They are calculated for various probability thresholds. That means that instead of naively using the highest probability to determine the class, one can set custom thresholds for each category.
In [15]:
import matplotlib.pyplot as plt
# Precision and Recall are calculated for each category individually
print "Labels: " + str(best_val["precision"].keys())
In addition to the precision and recall, the number of items predicted for each category at each threshold level is also stored. This can be useful for computing averages of precision and recall between categories. Note that this is not the total number of elements for each class in the validation data, but the number of elements predicted by the DNN, correct or not.
Let's look at the precision-recall curves for the digit "5". Precision and Recall are calculated for various probability thresholds.
In [84]:
# We'll use the result of the first epoch to make the curve at least
# somewhat interesting
with open("demo_data/weights/val_0.pkl", "r") as f:
val_dat = pickle.load(f)
precision = val_dat["precision"]["5"]
recall = val_dat["recall"]["5"]
counts = val_dat["counts"]["5"]
# The disadvantage of using a dictionary is that the entries are not
# sorted. Future versions of this package may use a different storage
# method for the validation
p_thresh, p_val = zip(*precision.items())
sort_indices = [p_thresh.index(p) for p in sorted(p_thresh, key=lambda x: float(x))]
p_thresh = [float(p_thresh[i]) for i in sort_indices]
p_val = [p_val[i] for i in sort_indices]
r_thresh, r_val = zip(*recall.items())
sort_indices = [r_thresh.index(r) for r in sorted(r_thresh, key=lambda x: float(x))]
r_thresh = [float(r_thresh[i]) for i in sort_indices]
r_val = [r_val[i] for i in sort_indices]
c_thresh, c_val = zip(*counts.items())
sort_indices = [c_thresh.index(c) for c in sorted(c_thresh, key=lambda x: float(x))]
c_thresh = [float(c_thresh[i]) for i in sort_indices]
c_val = [c_val[i] for i in sort_indices]
# The thresholds for all categories are identical:
assert(p_thresh == r_thresh == c_thresh)
# A value of 'nan' indicates that nothing was predicted, correctly or not,
# with the given probability threshold.
print "Threshold | Precision | Recall | Counts"
print "--------------------------------------------"
for i in range(len(p_thresh)):
print "%8.4f |" % p_thresh[i], \
"%8.4f |" % p_val[i], \
"%8.4f |" % r_val[i], \
"%5d" % c_val[i]
print "--------------------------------------------"
In [1]:
from __future__ import division
import h5py
import numpy as np
import pickle
import matplotlib.pyplot as plt
%matplotlib inline
All training data is handled by the ImageHandler class. It expects training data to be in the HDF5 file format. The structure of this file must be as follows:
- FILE
- DATASET "images"
- SHAPE (Number of Images, Image Channels, Spatial Dimension, Spatial Dimension)
- DATASET "labels"
- SHAPE (Number of Images, 1)
The dataset images effectively contains a 4D numpy array with the aforementioned shape. The dataset labels contains a 2D numpy array of label names, e.g.
array([['Label_A'],
['Label_A'],
['Label_B'],
['Label_B'],
['Label_B'],
['Label_C'],
['Label_C'],
['Label_A']],
dtype='|S7')
The ImageHandler class automatically turns the string labels into one-hot vectors, e.g. (1, 0, 0, 0), with the appropriate dimensions. Note that the dataset names, images and labels can be changed; custom names can be passed to the ImageHandler constructor.
Note that the data file may, of course, have additional data sets (e.g. to store metadata). These are simply ignored.
For this demonstration, I generate a dataset consisting of noisy circles and squares, which we want to classify with a DNN.
In [2]:
# The size of our image segments (both dimensions have the same size)
img_dims = 25
# The number of images in each class
num_images = 10000
max_rad = ((img_dims - 1) // 2) - 1
center = (img_dims - 1) // 2
ind_y, ind_x = np.indices(dimensions=(img_dims, img_dims)) - center
# Circles
rand_rads = np.random.randint(5, max_rad+1, num_images)
rand_intensities = np.random.randint(1, 256, num_images)
noise_field = np.random.randint(-32, 33, (img_dims, img_dims, num_images))
rad_field = np.repeat(np.expand_dims(np.sqrt(ind_y**2 + ind_x**2), 2), num_images, 2) <= rand_rads
img_circles = (rad_field.astype(dtype=np.int16) * rand_intensities) + noise_field
img_circles[img_circles < 0] = 0
img_circles[img_circles > 255] = 255
# Squares
rand_rads = np.random.randint(5, max_rad+1, num_images)
rand_intensities = np.random.randint(1, 256, num_images)
noise_field = np.random.randint(-32, 33, (img_dims, img_dims, num_images))
ind_x_square = np.repeat(np.expand_dims(ind_x, 2), num_images, 2)
ind_y_square = np.repeat(np.expand_dims(ind_y, 2), num_images, 2)
square_field = (np.abs(ind_x_square) <= rand_rads) * (np.abs(ind_y_square) <= rand_rads)
img_squares = (square_field.astype(dtype=np.int16) * rand_intensities) + noise_field
img_squares[img_squares < 0] = 0
img_squares[img_squares > 255] = 255
imgs = np.concatenate((img_circles, img_squares), 2)
imgs = np.transpose(imgs, (2, 0, 1))
imgs = np.expand_dims(imgs, 1)
imgs = imgs.astype(np.uint8)
labels = np.expand_dims(np.repeat(("Circle", "Square"), (num_images, num_images)), 1)
with h5py.File("TrainingData.h5", "w") as h5handle:
h5handle.create_dataset(name="images", data=imgs, chunks=(1, 1, img_dims, img_dims), compression=3)
h5handle.create_dataset(name="labels", data=labels, chunks=(1, 1), compression=3)
The resulting file, TrainingData.h5, has the following structure:
In [3]:
with h5py.File("TrainingData.h5", "r") as h5handle:
print "Entries: ", h5handle.keys()
print h5handle["images"]
print h5handle["labels"]
In [4]:
import DNNSwift
img_handler = DNNSwift.ImageHandler.initialize(
filename="TrainingData.h5",
categories={"Circle": 0, "Square": 1})
The categories argument assigns numerical values to each label. The output of a tensorflow neural network is a vector of numbers, which can be interpreted as probabilities after the proper normalization. The integer value of the categories dictionary indicates the order of labels, i.e. in this case 'Circle' is assigned to the one-hot vector (1,0) and 'Square' to the one-hot vector (0,1). A normalized output of (0.4,0.6) would then be interpreted as a 60% probability for the input image being a square.
ImageHandler randomly splits apart the entire dataset into three groups for training, (cross-)validation, and testing and internally stores the image indices belonging to each group. Note that the order of images is randomized, during this split so that the data order in the hdf5 file is not preserved at runtime.
In [5]:
for list_name in img_handler.get_list_names():
print list_name, ":: Length:", img_handler.get_list_length(list_name)
By default, the split is 80%-10%-10% (Training, Validation, Testing), but this can be customized via the data_split argument of the constructor. It expects a list of values indicating the relative sizes of each image set (in order: training, validation, testing). The numbers are normalized internally so can be arbitrarily large.
Likewise, the custom names for the HDF5 datasets can also be passed to the ImageHandler constructor via the arguments image_key and label_key, if so desired.
In [6]:
img_handler = DNNSwift.ImageHandler.initialize(
filename="TrainingData.h5",
categories={"Circle": 0, "Square": 1},
data_split=[60, 20, 20])
The image split can be saved. If, for whatever reason, the network needs to be revalidated or training is interrupted and continued, the identical training dataset must be used. For this reason, ImageHandler can also be initialized with an index list, preventing the creation of a new one.
In order to ensure that the HDF5 file with the training data is unchanged, the md5sum of this file is calculated and stored in the file containing the image group split.
In [7]:
img_handler.save_lists("IndexLists.pkl")
In [8]:
import pickle
with open("IndexLists.pkl", "r") as f:
index_file = pickle.load(f)
print "Keys:", index_file.keys()
print "Index Lists:"
for key in index_file["index_list"].keys():
print " ", key, ":", index_file["index_list"][key]
print "md5sum:", index_file["h5file_hash"]
img_handler = DNNSwift.ImageHandler.initialize_with_index_list(
filename="TrainingData.h5",
categories={"Circle": 0, "Square": 1},
index_list=index_file)
Images can be extracted directly from the image handler. This returns a custom object with the attributes "imgs" and "labels", which hold the image and label data, respectively. Note that index_high is an inclusive index, as opposed to python's usual indexing logic of excluding the upper limit. Ergo, setting
index_low == index_high
returns a single image.
In [9]:
images = img_handler.get_images(list_name="test", index_low=0, index_high=9)
print "Object type:", images
print "images.imgs:", type(images.imgs), ";", images.imgs.shape
print "images.labels:", type(images.labels), ";", images.labels.shape
plt.imshow(np.concatenate(images.imgs[..., 0], axis=1))
Out[9]:
The architecture of the DNN is defined in a layout file that has the structure of a list of python dictionaries, e.g.:
[{name: "A", type: "conv", ...},
{name: "B", type: "conv", ...},
{name: "C", type: "fc", ...}]
Each dictionary corresponds to a single node, or layer, in the DNN. The type of node is defined via the 'type' key. The currently supported nodes and the required dictionary keys are:
Inception ('inception')
A compound block inspired by Google's inception architecture. A block consists of N independent branches, each with a varying kernel size, that are concatenated to a single feature array at the end. An example of such a block is shown here:
The pooling and convolutions are performed in a way that results in the output feature array having the same spatial dimensions as the input layer. To further reduce computational time, NxN convolutions are split into an Nx1 and 1xN convolution (Separable Convolutions).
An example of a layout file is given below. This layout takes input data with three channels and passes each channel through an independent node chain before concatenating the results:
{"name": "data_ch1", "type": "input", "data_z_index": [0]},
{"name": "data_ch2", "type": "input", "data_z_index": [1]},
{"name": "data_ch3", "type": "input", "data_z_index": [2]},
{"name": "conv1_ch1", "type": "conv", "n_kernels": 64, "size": (3, 3),
"stride": (1, 1), "padding": "VALID", "input": "data_ch1"},
{"name": "conv1_ch2", "type": "conv", "n_kernels": 64, "size": (3, 3),
"stride": (1, 1), "padding": "VALID", "input": "data_ch2"},
{"name": "conv1_ch3", "type": "conv", "n_kernels": 64, "size": (3, 3),
"stride": (1, 1), "padding": "VALID", "input": "data_ch3"},
{"name": "incep1_ch1", "type": "inception", "input": "conv1_ch1",
"n_kernels": 32, "branches": (3, 5, 7)},
{"name": "incep1_ch2", "type": "inception", "input": "conv1_ch2",
"n_kernels": 32, "branches": (3, 5, 7)},
{"name": "incep1_ch3", "type": "inception", "input": "conv1_ch3",
"n_kernels": 32, "branches": (3, 5, 7)},
{"name": "maxpool_after_incep1_ch1", "type": "maxpool", "size": (3, 3),
"stride": (2, 2), "padding": "SAME", "input": "incep1_ch1"},
{"name": "maxpool_after_incep1_ch2", "type": "maxpool", "size": (3, 3),
"stride": (2, 2), "padding": "SAME", "input": "incep1_ch2"},
{"name": "maxpool_after_incep1_ch3", "type": "maxpool", "size": (3, 3),
"stride": (2, 2), "padding": "SAME", "input": "incep1_ch3"},
{"name": "fc_ch1", "type": "fc", "n_kernels": 32,
"input": "maxpool_after_incep1_ch1"},
{"name": "fc_ch2", "type": "fc", "n_kernels": 32,
"input": "maxpool_after_incep1_ch2"},
{"name": "fc_ch3", "type": "fc", "n_kernels": 32,
"input": "maxpool_after_incep1_ch3"},
{"name": "concat_channels", "type": "concat",
"input": ["fc_ch1", "fc_ch2", "fc_ch3"]},
{"name": "output", "type": "fc", "input": "concat_channels"},
{"name": "CrossEntropy", "type": "cross_entropy", "input": "output"}
For the training data, however, a much simpler DNN will suffice to classify the images. The structure of this network is:
In [10]:
dnn_layout = [{"name": "input", "type": "input"},
{"name": "conv1", "type": "conv", "n_kernels": 32, "size": (3, 3),
"stride": (1, 1), "padding" : "VALID", "input": "input"},
{"name": "conv2", "type": "conv", "n_kernels": 32, "size": (3, 3),
"stride": (1, 1), "padding" : "VALID", "input": "conv1"},
{"name": "maxpool1", "type": "maxpool", "size": (3, 3),
"stride": (2, 2), "padding" : "VALID", "input": "conv2"},
{"name": "conv3", "type": "conv", "n_kernels": 64, "size": (3, 3),
"stride": (1, 1), "padding" : "VALID", "input": "maxpool1"},
{"name": "conv4", "type": "conv", "n_kernels": 64, "size": (3, 3),
"stride": (1, 1), "padding" : "VALID", "input": "conv3"},
{"name": "maxpool2", "type": "maxpool", "size": (3, 3),
"stride": (2, 2), "padding" : "VALID", "input": "conv4"},
{"name": "fc1", "type": "fc", "input": "maxpool2"},
{"name": "CrossEntropy", "type": "cross_entropy", "input": "fc1"}]
The network wrapper is initialized via:
In [11]:
trainer = DNNSwift.DNN(
img_dims=img_handler.get_image_dims(),
labels=img_handler.get_image_groups(),
layer_params=dnn_layout,
basedir="dnn_output")
The required parameters are as follows:
Optional parameters may also include:
An image of the network structure can be created via:
In [12]:
trainer.print_structure(filename="DNN_Structure.png")
The filename is the relative path under the objects basedir folder. The output type is defined by the file ending, e.g. *.png outputs an image. The full list of permitted output types can be found at the PyGraphViz homepage.
Running the network training after initialization is as simple as executing the command
In [13]:
trainer.train_network(
batch_size=128,
image_handler=img_handler,
num_epochs=1,
subdir="weights",
logfile="dnn_output/DNN_log.txt")
The required parameters are as follows:
Optional parameters may also include:
The DNN class uses the validation subset of the training data to perform a cross-validation of the state of the network. The calculation and file writing are handled entirely by the Validator class (and its save_results() method). The file contains a python dictionary with the keys:
In [14]:
import pickle
with open("dnn_output/weights/val_0.pkl", "r") as f:
val_dat = pickle.load(f)
print "Keys:", val_dat.keys()
print "Validation Accuracy:", val_dat["top_1_acc"]
The trained parameters ('weights') are output as as pickled python dictionary. For each node with a set of parameters (i.e. convolutions and all derivatives), two keys exist in the dictionary, 'node_name/w' and 'node_name/b' for the multiplicative weights and the biases, respectively. The values of each dictionary entry is a numpy array:
In [15]:
with open("dnn_output/weights/weights_0.pkl", "r") as f:
weights = pickle.load(f)
keys = sorted(weights.keys())
for key in keys:
print key, ":", weights[key].shape
If initializing a DNN with non-random weights, the dictionary must have two keys for each layer that requires parameters with the expected shape: (spatial, spatial, input channels, num_kernels/output_channels)
The DNN, once trained, can also be applied to images. The output is a vector of (unnormalized) probabilities for each category (the indices of the output vectors correspond to the numerical values given to each category as defined above). There are several "extra" dimensions in the numpy array due to tensorflow's internal logic. For consistency, these are included in the output and can be 'squeezed' away for easier handling.
In [16]:
# Initialize the object with the optimal set of weights
trainer = DNNSwift.DNN(
img_dims=img_handler.get_image_dims(),
labels=img_handler.get_image_groups(),
layer_params=dnn_layout,
basedir="dnn_output",
weights=weights)
images = img_handler.get_images("test", 0, 9)
output = trainer.run_network(input_images=images.imgs)
print "Output shape:", output.shape
print "Output:\n", np.squeeze(output)
print "Image Categories:", img_handler.get_image_groups()
Comparing the input labels with the output logits:
In [17]:
print "Network Output:", np.argmax(np.squeeze(output), axis=1)
print "Ground Truth: ", np.argmax(images.labels, axis=1)
Optional parameters for 'run_network()':
The package also supports running a network on input images of a different size than the images trained on. This can be useful for segmentation if the network is trained on small segments (e.g. foreground/background) of full images. To apply the network to a different image size, the DNN object must be instantiated again with a different img_dims argument. For example, let us construct an artificial image from four training images:
In [18]:
images = img_handler.get_images("test", 0, 3)
big_image = np.concatenate(
(np.concatenate((images.imgs[0], images.imgs[1]), axis=0),
np.concatenate((images.imgs[2], images.imgs[3]), axis=0)),
axis=1)
big_image = np.expand_dims(big_image, axis=0)
print "Big Image shape:", big_image.shape
plt.imshow(big_image[0, :, :, 0])
Out[18]:
Then initialize the network and run the output:
In [19]:
# Use the same weights as above
trainer = DNNSwift.DNN(
img_dims=(50, 50, 1),
labels=img_handler.get_image_groups(),
layer_params=dnn_layout,
basedir="dnn_output",
weights=weights)
output = trainer.run_network(input_images=big_image)
output_argmax = np.squeeze(np.argmax(output, axis=3))
plt.imshow(output_argmax)
Out[19]:
The output is no longer a single probability vector but a map of probabilities (shown here are only the predicted classes). This corresponds to a segmentation map of the input image. Note that this example is a terrible segmentation because the network was not trained to recognize backgrounds, i.e. it expects each pixel to be either a circle or a square.
Due to the pooling layers and the padding at the edges of convolutions, the output map is smaller than the input image. I have found that if the number of pooling layers is kept small, simply resizing the output map to the original size gives acceptable results. Future versions of tensorflow will contain unpooling layers that will be integrated into the package.
Note that by passing an explicit set of weights to the DNN object, the fully connected layer logic (of reducing the feature map to a single value) is effectively bypassed, making it possible to flexibly use this package for segmentation and classification simultaneously.