Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
Back when deep learning didn't have a fancy name yet, it was called artificial neural networks. So you already know a great deal about it! This was a respected field in itself, but after the days of Rosenblatt's perceptron, many researchers and machine learning practitioners slowly began to lose interest in the field since no one had a good solution for training a neural network with multiple layers.
With the current popularity of deep learning in both industry and academia, we are fortunate enough to have a whole range of open-source deep learning frameworks at our disposal:
Finally, there is also Keras, which we will be using in the following sections. In contrast to the preceding frameworks, Keras understands itself as an interface rather than an end-toend deep learning framework. It allows you to specify deep neural nets using an easy-tounderstand API, which can then be run on backends, such as TensorFlow, CNTK, or Theano.
The core data structure of Keras is a model, which is similar to OpenCV's classifier object,
except it focuses on neural networks only. The simplest type of model is the Sequential
model, which arranges the different layers of the neural net in a linear stack, just like we did
for the MLP in OpenCV:
In [1]:
from keras.models import Sequential
model = Sequential()
Then different layers can be added to the model one by one. In Keras, layers do not just contain neurons, they also perform a function. Some core layer types include the following:
Dense
: This is a densely connected layer. This is exactly what we used when we designed our MLP: a layer of neurons that is connected to every neuron in the previous layer.Activation
: This applies an activation function to an output. Keras provides a whole range of activation functions, including OpenCV's identify function (linear
), the hyperbolic tangent (tanh
), a sigmoidal squashing function (sigmoid
), a softmax function (softmax
), and many more.Reshape
: This reshapes an output to a certain shape.There are other layers that calculate arithmetic or geometric operations on their inputs:
Some other layers that are popular in deep learning are as follows:
Dropout
: This layer randomly sets a fraction of input units to zero at each update. This is a way to inject noise into the training process, making it more robust.Embedding
: This layer encodes categorical data, similar to some functions from scikit-learn's preprocessing
module.GaussianNoise
: This layer applies additive zero-centered Gaussian noise. This is another way of injecting noise into the training process, making it more robust.A perceptron similar to the preceding one could thus be implemented using a
Dense
layer that has two inputs and one output. Staying true to our earlier
example, we will initialize the weights to zero and use the hyperbolic tangent as
an activation function:
In [2]:
from keras.layers import Dense
model.add(Dense(1, activation='linear', input_dim=2, kernel_initializer='zeros'))
Finally, we want to specify the training method. Keras provides a number of optimizers, including the following:
'sgd'
): This is what we have discussed before'RMSprop'
): This is a method in which the
learning rate is adapted for each of the parameters'Adam'
): This is an update to root mean square propagation and many moreIn addition, Keras also provides a number of different loss functions:
'mean_squared_error'
): This is what was discussed before'hinge'
): This is a maximum-margin classifier often used with SVM, as discussed in Chapter 6, Detecting Pedestrians with Support Vector Machines, and many moreYou can see that there's a whole plethora of parameters to be specified and methods to choose from. To stay true to our aforementioned perceptron implementation, we will choose stochastic gradient descent as an optimizer, the mean squared error as a cost function, and accuracy as a scoring function:
In [3]:
model.compile(optimizer='sgd',
loss='mean_squared_error',
metrics=['accuracy'])
In order to compare the performance of the Keras implementation to our home-brewed version, we will apply the classifier to the same dataset:
In [4]:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=100, centers=2, cluster_std=2.2, random_state=42)
Finally, a Keras model is fit to the data with a very familiar syntax. Here, we can also choose
how many iterations to train for (epochs
), how many samples to present before we
calculate the error gradient (batch_size
), whether to shuffle the dataset (shuffle
), and
whether to output progress updates (verbose
):
In [5]:
model.fit(X, y, epochs=400, batch_size=100, shuffle=False, verbose=0)
Out[5]:
After the training completes, we can evaluate the classifier as follows:
In [6]:
model.evaluate(X, y)
Out[6]:
Here, the first reported value is the mean squared error, whereas the second value denotes accuracy. This means that the final mean squared error was 0.04, and we had 100% accuracy. Way better than our own implementation!
In [7]:
import numpy as np
np.random.seed(1337) # for reproducibility
With these tools in hand, we are now ready to approach a real-world dataset!