pylearn2 tutorial: Convolutional network

by Ian Goodfellow

Introduction

This ipython notebook will teach you the basics of how convolutional networks work, and show you how to use multilayer perceptrons in pylearn2.

To do this, we will go over several concepts:

Part 1: What pylearn2 is doing for you in this example

  • Review of multilayer perceptrons, and how convolutional networks are similar

  • Convolution and the equivariance property

  • Pooling and the invariance property

  • A note on using convolution in research papers

Part 2: How to use pylearn2 to train a convolutional network

- pylearn2 Spaces

- MNIST classification example


Note that this won't explain in detail how the individual classes are implemented. The classes follow pretty good naming conventions and have pretty good docstrings, but if you have trouble understanding them, write to me and I might add a part 3 explaining how some of the parts work under the hood.

Please write to pylearn-dev@googlegroups.com if you encounter any problem with this tutorial.

Requirements

Before running this notebook, you must have installed pylearn2. Follow the download and installation instructions if you have not yet done so.

This tutorial also assumes you already know about multilayer perceptrons, and know how to train and evaluate a multilayer perceptron in pylearn2. If not, work through multilayer_perceptron.ipynb before starting this tutorial.

It's also strongly recommend that you run this notebook with THEANO_FLAGS="device=gpu". This is a processing intensive example and the GPU will make it run a lot faster, if you have one available. Execute the next cell to verify that you are using the GPU.


In [1]:
import theano
print theano.config.device


gpu
Using gpu device 0: GeForce GTX 285

Part 1: What pylearn2 is doing for you in this example

In this part, we won't get into any specifics of pylearn2 yet. We'll just discuss what a convolutional network is. If you already know about convolutional networks, feel free to skip to part 2.

Review of multilayer perceptrons, and how convolutional networks are similar

In multilayer_perceptron.ipynb, we saw how the multilayer perceptron (MLP) is a versatile model that can do many things. In this series of tutorials, we think of it as a classification model that learns to map an input vector $x$ to a probability distribution $p(y\mid x)$ where $y$ is a categorical value with $k$ different values. Using a dataset $\mathcal{D}$ of $(x, y)$, we can train any such probabilistic model by maximizing the log likelihood,

$$ \sum_{x,y \in \mathcal{D} } \log P(y \mid x). $$

The multilayer perceptron defines $P(y \mid x)$ to be the composition of several simpler functions. Each function being composed can be thought of as another "layer" or "stage" of processing.

A convolutional network is nothing but a multilayer perceptron where some layers take a very special form, which we will call "convolutional layers". These layers are specially designed for processing inputs where the indices of the elements have some topological significance.

For example, if we represent a grayscale image as an array $I$ with the array indices corresponding to physical locations in the image, then we know that the element $I_{i,j}$ represents something that is spatially close to the element $I_{i+1,j}$. This is in contrast to a vector representation of an image. If $I$ is a vector, then $I_i$ might not be very close at all to $I_{i+1}$, depending on whether the image was converted to vector form in row-major or column major format and depending on whether $i$ is close to the end of a row or column.

Other kinds of data with topological in the indices include time series data, where some series $S$ can be indexed by a time variable $t$. We know that $S_t$ and $S_{t+1}$ come from close together in time. We can also think of the (row, column, time) indices of video data as providing topological information.

Suppose $T$ is a function that can translate (move) an input in the space defined by its indices by some amount $x$. In other words, $T(S,x)_i = S_j$ where $j=i-x$ (a MathJax or ipython bug seems to prevent me from putting $i-x$ in a subscript). Convolutional layers are an example of a function $f$ designed with the property $f(T(S,x)) \approx f(S)$ for small x.

This means if a neural network can recognize a handwritten digit in one position, it can recognize it when it is slightly shifted to a nearby position. Being able to recognize shifted versions of previously seen inputs greatly improves the generalization performance of convolutional networks.

Convolution and the equivariance property

TODO

Pooling and the invariance property

TODO

A note on using convolution in research papers

TODO

Part 2: How to use pylearn2 to train an MLP

Now that we've described the theory of what we're going to do, it's time to do it! This part describes how to use pylearn2 to run the algorithms described above.

As in the MLP tutorial, we will use the convolutional net to do optical character recognition on the MNIST dataset.

pylearn2 Spaces

In many places in pylearn2, we would like to be able to process several different kinds of data. In previous tutorials, we've just talked about data that could be preprocessed into a vector representation. Our algorithms all worked on vector spaces. However, it's often useful to format data in other ways. The pylearn2 Space object is used to specify the format for data. The VectorSpace class represents the typical vector formatted data we've used so far. The only thing it needs to encode about the data is its dimensionality, i.e., how many elements the vector has. In this tutorial we will start to explicitly represent images as having 2D structure, so we need to use the Conv2DSpace. The Conv2DSpace object describes how to represent a collection of images as a 4-tensor.

One thing the Conv2DSpace object needs to describe is the shape of the space--how big is the image in terms of rows and columns of pixels? Also, the image may have multiple channels. In this example, we use a grayscale input image, so the input only has one channel. Color images require three channels to store the red, green, and blue pixels at each location. We can also think of the output of each convolution layer as living in a Conv2DSpace, where each kernel outputs a different channel. Finally, the Conv2DSpace specifies what each axis of the 4-tensor means. The default is for the first axis to index over different examples, the second axis to index over channels, and the last two to index over rows and columns, respectively. This is the format that theano's 2D convolution code uses, but other libraries exist that use other formats and we often need to convert between them.

MNIST classification example

Setting up a convolutional network in pylearn2 is essentially the same as setting up any other MLP. In the YAML experiment description below, there are really just two things to take note of.

First, rather than using "nvis" to specify the input that the MLP will take, we use a parameter called "input_space". "nvis" is actually shorthand; if you pass an integer n to nvis, it will set input_space to VectorSpace(n). Now that we are using a convolutional network, we need the input to be formatted as a collection of images so that the convolution operator will have a 2D space to work on.

Second, we make a few layers of the network be "ConvRectifiedLinear" layers. Putting some convolutional layers in the network makes those layers invariant to small translations, so the job of the remaining layers is much easier.

We don't need to do anything special to make the Softmax layer on top work with these convolutional layers. The MLP class will tell the Softmax class that its input is now coming from a Conv2DSpace. The Softmax layer will then use the Conv2DSpace's convert method to convert the 2D output from the convolutional layer into a batch of vector-valued examples.

The model and training is defined in conv.yaml file. Here we load it and set some of it's hypyer-parameters.


In [1]:
train = open('conv.yaml', 'r').read()
train_params = {'train_stop': 50000,
                    'valid_stop': 60000,
                    'test_stop': 10000,
                    'batch_size': 100,
                    'output_channels_h2': 64, 
                    'output_channels_h3': 64,  
                    'max_epochs': 500,
                    'save_path': '.'}
train = train % (train_params)
print train


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: 100,
        input_space: !obj:pylearn2.space.Conv2DSpace {
            shape: [28, 28],
            num_channels: 1
        },
        layers: [ !obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: 'h2',
                     output_channels: 64,
                     irange: .05,
                     kernel_shape: [5, 5],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     max_kernel_norm: 1.9365
                 }, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: 'h3',
                     output_channels: 64,
                     irange: .05,
                     kernel_shape: [5, 5],
                     pool_shape: [4, 4],
                     pool_stride: [2, 2],
                     max_kernel_norm: 1.9365
                 }, !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     istdev: .05
                 }
                ],
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: 100,
        learning_rate: .01,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5
        },
        monitoring_dataset:
            {
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              start: 50000,
                              stop:  60000
                          },
                'test'  : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'test',
                              stop: 10000
                          }
            },
        cost: !obj:pylearn2.costs.cost.SumOfCosts { costs: [
            !obj:pylearn2.costs.cost.MethodCost {
                method: 'cost_from_X'
            }, !obj:pylearn2.costs.mlp.WeightDecay {
                coeffs: [ .00005, .00005, .00005 ]
            }
            ]
        },
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.50,
                    N: 10
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: 500
                },
            ]
        },
    },
    extensions:
        [ !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: 'valid_y_misclass',
             save_path: "./convolutional_network_best.pkl"
        }, !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 10,
            final_momentum: .99
        }
    ]
}



Now, we use pylearn2's yaml_parse.load to construct the Train object, and run its main loop. The same thing could be accomplished by running pylearn2's train.py script on a file containing the yaml string.

Execute the next cell to train the model. This will take several minutes and possible as much as a few hours depending on how fast your computer is.


In [2]:
from pylearn2.config import yaml_parse
train = yaml_parse.load(train)
train.main_loop()


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-a29d25125a51> in <module>()
      1 from pylearn2.config import yaml_parse
----> 2 train = yaml_parse.load(train)
      3 train.main_loop()

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in load(stream, environ, instantiate, **kwargs)
    209         string = stream.read()
    210 
--> 211     proxy_graph = yaml.load(string, **kwargs)
    212     if instantiate:
    213         return _instantiate(proxy_graph)

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/__init__.pyc in load(stream, Loader)
     69     loader = Loader(stream)
     70     try:
---> 71         return loader.get_single_data()
     72     finally:
     73         loader.dispose()

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in get_single_data(self)
     37         node = self.get_single_node()
     38         if node is not None:
---> 39             return self.construct_document(node)
     40         return None
     41 

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in construct_document(self, node)
     41 
     42     def construct_document(self, node):
---> 43         data = self.construct_object(node)
     44         while self.state_generators:
     45             state_generators = self.state_generators

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in construct_object(self, node, deep)
     88             data = constructor(self, node)
     89         else:
---> 90             data = constructor(self, tag_suffix, node)
     91         if isinstance(data, types.GeneratorType):
     92             generator = data

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in multi_constructor_obj(loader, tag_suffix, node)
    356     yaml_src = yaml.serialize(node)
    357     construct_mapping(node)
--> 358     mapping = loader.construct_mapping(node)
    359 
    360     assert hasattr(mapping, 'keys')

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in construct_mapping(self, node, deep)
    206         if isinstance(node, MappingNode):
    207             self.flatten_mapping(node)
--> 208         return BaseConstructor.construct_mapping(self, node, deep=deep)
    209 
    210     def construct_yaml_null(self, node):

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in construct_mapping(self, node, deep)
    131                 raise ConstructorError("while constructing a mapping", node.start_mark,
    132                         "found unacceptable key (%s)" % exc, key_node.start_mark)
--> 133             value = self.construct_object(value_node, deep=deep)
    134             mapping[key] = value
    135         return mapping

/Users/chris/Virtualenvs/seefish/lib/python2.7/site-packages/PyYAML-3.11-py2.7-macosx-10.10-x86_64.egg/yaml/constructor.pyc in construct_object(self, node, deep)
     88             data = constructor(self, node)
     89         else:
---> 90             data = constructor(self, tag_suffix, node)
     91         if isinstance(data, types.GeneratorType):
     92             generator = data

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in multi_constructor_obj(loader, tag_suffix, node)
    370         callable = eval(tag_suffix)
    371     else:
--> 372         callable = try_to_import(tag_suffix)
    373     rval = Proxy(callable=callable, yaml_src=yaml_src, positionals=(),
    374                  keywords=mapping)

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in try_to_import(tag_suffix)
    297                         base_msg += ' but could import %s' % modulename
    298                     reraise_as(ImportError(base_msg + '. Original exception: '
--> 299                                            + str(e)))
    300                 j += 1
    301     try:

/Users/chris/Dropbox/programming/pylearn2/pylearn2/utils/exc.py in reraise_as(new_exc)
     88     new_exc.__cause__ = orig_exc_value
     89     new_exc.reraised = True
---> 90     six.reraise(type(new_exc), new_exc, orig_exc_traceback)

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in try_to_import(tag_suffix)
    290                 modulename = '.'.join(pcomponents[:j])
    291                 try:
--> 292                     exec('import %s' % modulename)
    293                 except Exception:
    294                     base_msg = 'Could not import %s' % modulename

/Users/chris/Dropbox/programming/pylearn2/pylearn2/config/yaml_parse.py in <module>()

/Users/chris/Dropbox/programming/pylearn2/pylearn2/models/mlp.py in <module>()
     20 from theano.gof.op import get_debug_values
     21 from theano.sandbox.rng_mrg import MRG_RandomStreams
---> 22 from theano.sandbox.cuda.dnn import dnn_available, dnn_pool
     23 from theano.tensor.signal.downsample import max_pool_2d
     24 import theano.tensor as T

ImportError: Could not import pylearn2.models.mlp but could import pylearn2.models. Original exception: No module named dnn

Original exception:
	ImportError: No module named dnn

Compiling the theano functions used to run the network will take a long time for this example. This is because the number of theano variables and ops used to specify the computation is relatively large. There is no single theano op for doing max pooling with overlapping pooling windows, so pylearn2 builds a large expression graph using indexing operations to accomplish the max pooling.

After the model is trained, we can use the print_monitor script to print the last monitoring entry of a saved model. By running it on "convolutional_network_best.pkl", we can see the performance of the model at the point where it did the best on the validation set.


In [ ]:
!print_monitor.py convolutional_network_best.pkl | grep test_y_misclass

The test set error has dropped to 0.74%! This is a big improvement over the standard MLP.

We can also look at the convolution kernels learned by the first layer, to see that the network is looking for shifted versions of small pieces of penstrokes.


In [ ]:
!show_weights.py convolutional_network_best.pkl

Further reading

You can find more information on convolutional networks from the following sources:

LISA lab's Deep Learning Tutorials: Convolutional Neural Networks (LeNet)

This is by no means a complete list.