This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

Getting Acquainted with Deep Learning

Back when deep learning didn't have a fancy name yet, it was called artificial neural networks. So you already know a great deal about it! This was a respected field in itself, but after the days of Rosenblatt's perceptron, many researchers and machine learning practitioners slowly began to lose interest in the field since no one had a good solution for training a neural network with multiple layers.

With the current popularity of deep learning in both industry and academia, we are fortunate enough to have a whole range of open-source deep learning frameworks at our disposal:

  • Google Brain's TensorFlow: This is a machine learning library that describes computations as dataflow graphs. To date, this is one of the most commonly used deep learning libraries. Hence, it is also evolving quickly, so you might have to check back often for software updates. TensorFlow provides a whole range of user interfaces, including Python, C++, and Java interface.
  • Microsoft Research's Cognitive Toolkit (CNTK): This is a deep learning framework that describes neural networks as a series of computational steps via a directed graph.
  • UC Berkeley's Caffe: This is a pure deep learning framework written in C++, with an additional Python interface.
  • University of Montreal's Theano: This is a numerical computation library compiled to run efficiently on CPU and GPU architectures. Theano is more than a machine learning library; it can express any computation using a specialized computer algebra system. Hence, it is best suited for people who wish to write their machine learning algorithms from scratch.
  • Torch: This is a scientific computing framework based on the Lua programming language. Like Theano, Torch is more than a machine learning library, but it is heavily used for deep learning by companies such as Facebook, IBM, and Yandex.

Finally, there is also Keras, which we will be using in the following sections. In contrast to the preceding frameworks, Keras understands itself as an interface rather than an end-toend deep learning framework. It allows you to specify deep neural nets using an easy-tounderstand API, which can then be run on backends, such as TensorFlow, CNTK, or Theano.

Getting acquainted with Keras

The core data structure of Keras is a model, which is similar to OpenCV's classifier object, except it focuses on neural networks only. The simplest type of model is the Sequential model, which arranges the different layers of the neural net in a linear stack, just like we did for the MLP in OpenCV:


In [1]:
from keras.models import Sequential
model = Sequential()


Using Theano backend.

Then different layers can be added to the model one by one. In Keras, layers do not just contain neurons, they also perform a function. Some core layer types include the following:

  • Dense: This is a densely connected layer. This is exactly what we used when we designed our MLP: a layer of neurons that is connected to every neuron in the previous layer.
  • Activation: This applies an activation function to an output. Keras provides a whole range of activation functions, including OpenCV's identify function (linear), the hyperbolic tangent (tanh), a sigmoidal squashing function (sigmoid), a softmax function (softmax), and many more.
  • Reshape: This reshapes an output to a certain shape.

There are other layers that calculate arithmetic or geometric operations on their inputs:

  • Convolutional layers: These layers allow you to specify a kernel with which the input layer is convolved. This allows you to perform operations such as a Sobel filter or apply a Gaussian kernel in 1D, 2D, or even 3D.
  • Pooling layers: These layers perform a max pooling operation on their input, where the output neuron's activity is given by the maximally active input neuron.

Some other layers that are popular in deep learning are as follows:

  • Dropout: This layer randomly sets a fraction of input units to zero at each update. This is a way to inject noise into the training process, making it more robust.
  • Embedding: This layer encodes categorical data, similar to some functions from scikit-learn's preprocessing module.
  • GaussianNoise: This layer applies additive zero-centered Gaussian noise. This is another way of injecting noise into the training process, making it more robust.

A perceptron similar to the preceding one could thus be implemented using a Dense layer that has two inputs and one output. Staying true to our earlier example, we will initialize the weights to zero and use the hyperbolic tangent as an activation function:


In [2]:
from keras.layers import Dense
model.add(Dense(1, activation='linear', input_dim=2, kernel_initializer='zeros'))

Finally, we want to specify the training method. Keras provides a number of optimizers, including the following:

  • stochastic gradient descent ('sgd'): This is what we have discussed before
  • root mean square propagation ('RMSprop'): This is a method in which the learning rate is adapted for each of the parameters
  • adaptive moment estimation ('Adam'): This is an update to root mean square propagation and many more

In addition, Keras also provides a number of different loss functions:

  • mean squared error ('mean_squared_error'): This is what was discussed before
  • hinge loss ('hinge'): This is a maximum-margin classifier often used with SVM, as discussed in Chapter 6, Detecting Pedestrians with Support Vector Machines, and many more

You can see that there's a whole plethora of parameters to be specified and methods to choose from. To stay true to our aforementioned perceptron implementation, we will choose stochastic gradient descent as an optimizer, the mean squared error as a cost function, and accuracy as a scoring function:


In [3]:
model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['accuracy'])

In order to compare the performance of the Keras implementation to our home-brewed version, we will apply the classifier to the same dataset:


In [4]:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=100, centers=2, cluster_std=2.2, random_state=42)

Finally, a Keras model is fit to the data with a very familiar syntax. Here, we can also choose how many iterations to train for (epochs), how many samples to present before we calculate the error gradient (batch_size), whether to shuffle the dataset (shuffle), and whether to output progress updates (verbose):


In [5]:
model.fit(X, y, epochs=400, batch_size=100, shuffle=False, verbose=0)


Out[5]:
<keras.callbacks.History at 0x7f00c46ad4a8>

After the training completes, we can evaluate the classifier as follows:


In [6]:
model.evaluate(X, y)


 32/100 [========>.....................] - ETA: 0s
Out[6]:
[0.040941802412271501, 1.0]

Here, the first reported value is the mean squared error, whereas the second value denotes accuracy. This means that the final mean squared error was 0.04, and we had 100% accuracy. Way better than our own implementation!


In [7]:
import numpy as np
np.random.seed(1337)  # for reproducibility

With these tools in hand, we are now ready to approach a real-world dataset!