Deep Learning 101 : Multilayer Perceptrons

In this tutorial we'll go over the basics of fitting neural networks in python, using keras and tensorflow. You'll need to have them both installed to be able to run this tutorial. Key take aways are to become familiar with the keras API and nomenclature and understand how MLPs relate to logistic regression.


In [2]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

from sklearn.datasets import make_circles

from keras.layers import Input, Dense
from keras.models import Sequential

X, y = make_circles(n_samples=5000, factor=.3, noise=.05)
X_train = X[:4000]
y_train = y[:4000]

X_val = X[4000:]
y_val = y[4000:]
num_variables = X.shape[1]


Using TensorFlow backend.

In [3]:
plt.scatter(X[:,0], X[:,1], c=y,cmap=plt.cm.coolwarm)
plt.title("Classification Dataset")
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()


A first model with logistic regression

The first thing we have to do is to set up an input tensor. One the key advantages of using keras is that you only have to specify the shapes for the inputs and outputs, the shapes for the remaining tensors will be inferred automatically.

In our case, each input vector $x_i = <x_{i1},x_{i2}>$ has two dimensions and the output is a single binary variable $y_i \in \{0,1\}$. Let's see how we would set this up with keras:


In [4]:
logreg = Sequential()
logreg.add(Dense(output_dim=1, input_dim=num_variables, activation='sigmoid'))

The first line tells keras to initialze a new sequential model. Sequential models take a single input tensor and produce a single output tensor. For the purposes of this tutorial, we're going to stick with the sequential model because it will have all of the functionality we'll need.

The bulk of the action happens on the second line, and it's a litte terse so let's unpack it:

  • logreg.add tells keras that we want to add a new layer to our network.
  • Dense specifies that we want to use a fully-connected layer, aka a dense layer and is probably most opaque aspect of the keras API. In general, what a dense layer does create several different linear combinations of the output of the previous layer, followed by an element-wise application of an activation function. The number of linear combinations is specified by the number of 'hidden units' in the layer, which in this case is specified by output_dim.
  • input_dim=num_variables - Since this is the first layer in our network, we have to tell keras how big the input will be. In this case
  • activation='sigmoid' specifies which element-wise transformation we should apply the resulting output. Here we've specified a sigmoid transformation, which is given by $\frac{1}{1 + \exp(-t)}$, where $t$ is the result of the linear combination. This function 'squashes' the weighted sum to be a number between 0 and 1.

Putting that all together, we are telling keras that we'd like a single linear combination of the input variables, followed by a sigmoid transformation, which will represent the probability that an observation belongs to the class $y=1$. In this case, the Dense layer will have 2 parameters/weights $\{w_1,w_2\}$ and intercept/bias term $w_0$ since this is included by default.

Since this is a relatively simple case, we can write out the actual function implied by this model. It looks like this:

$ \frac{1}{1 + \exp(-(w0 + w1*x_{i1} + w2*x_{i2}))} $

As you might be able to see now, this is equivalent to a logistic regression model, where we are modeling the log-odds as linear function of the input variables.

Enough talking, let's fit the model! That's easily enough accomplished in another 2 lines:


In [5]:
logreg.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
logreg.fit(X_train, y_train,validation_data=[X_val,y_val],verbose=0)
val_score = logreg.evaluate(X_val,y_val,verbose=0)
print "Accuracy on validation data: " + str(val_score[1]*100) + "%"


Accuracy on validation data: 49.6%

In [6]:
# create a mesh to plot in
h = 0.02
x_min, x_max = X_val[:, 0].min(), X_val[:, 0].max()
y_min, y_max = X_val[:, 1].min(), X_val[:, 1].max()
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_val[:, 0], X_val[:, 1], c=y_val, cmap=plt.cm.coolwarm)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("Logistic Regression Decision Regions")
plt.show()



In [7]:
num_hidden = 5
mlp = Sequential()
mlp.add(Dense(output_dim=num_hidden, input_dim=num_variables, activation='relu'))
mlp.add(Dense(output_dim=1, activation='sigmoid'))

In [10]:
mlp.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
mlp.fit(X_train, y_train,validation_data=[X_val,y_val],nb_epoch=20,verbose=0)


Out[10]:
<keras.callbacks.History at 0x7f9440155b50>

In [11]:
# create a mesh to plot in
h = 0.02
x_min, x_max = X_val[:, 0].min(), X_val[:, 0].max()
y_min, y_max = X_val[:, 1].min(), X_val[:, 1].max()
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
Z = mlp.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_val[:, 0], X_val[:, 1], c=y_val, cmap=plt.cm.coolwarm)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("MLP with " + str(num_hidden) + " Hidden Units")
plt.show()


We can see that a neural network with 5 hidden units is trying to stitch together a relatively boxy set of lines to create piecewise linear functions in an effort to classify the points.


In [12]:
num_hidden = 128
mlp = Sequential()
mlp.add(Dense(output_dim=num_hidden, input_dim=num_variables, activation='relu'))
mlp.add(Dense(output_dim=1, activation='sigmoid'))
mlp.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
mlp.fit(X_train, y_train,validation_data=[X_val,y_val],nb_epoch=10,verbose=1)


Train on 4000 samples, validate on 1000 samples
Epoch 1/10
4000/4000 [==============================] - 0s - loss: 0.6049 - acc: 0.6467 - val_loss: 0.5108 - val_acc: 0.9810
Epoch 2/10
4000/4000 [==============================] - 0s - loss: 0.4036 - acc: 0.9995 - val_loss: 0.2923 - val_acc: 1.0000
Epoch 3/10
4000/4000 [==============================] - 0s - loss: 0.2100 - acc: 1.0000 - val_loss: 0.1412 - val_acc: 1.0000
Epoch 4/10
4000/4000 [==============================] - 0s - loss: 0.1048 - acc: 1.0000 - val_loss: 0.0740 - val_acc: 1.0000
Epoch 5/10
4000/4000 [==============================] - 0s - loss: 0.0584 - acc: 1.0000 - val_loss: 0.0439 - val_acc: 1.0000
Epoch 6/10
4000/4000 [==============================] - 0s - loss: 0.0364 - acc: 1.0000 - val_loss: 0.0286 - val_acc: 1.0000
Epoch 7/10
4000/4000 [==============================] - 0s - loss: 0.0247 - acc: 1.0000 - val_loss: 0.0200 - val_acc: 1.0000
Epoch 8/10
4000/4000 [==============================] - 0s - loss: 0.0177 - acc: 1.0000 - val_loss: 0.0147 - val_acc: 1.0000
Epoch 9/10
4000/4000 [==============================] - 0s - loss: 0.0133 - acc: 1.0000 - val_loss: 0.0112 - val_acc: 1.0000 - ETA: 0s - loss: 0.0136 - acc: 1.0000
Epoch 10/10
4000/4000 [==============================] - 0s - loss: 0.0103 - acc: 1.0000 - val_loss: 0.0088 - val_acc: 1.0000
Out[12]:
<keras.callbacks.History at 0x7f9468c25f90>

In [13]:
# create a mesh to plot in
h = 0.02
x_min, x_max = X_val[:, 0].min(), X_val[:, 0].max()
y_min, y_max = X_val[:, 1].min(), X_val[:, 1].max()
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
Z = mlp.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_val[:, 0], X_val[:, 1], c=y_val, cmap=plt.cm.coolwarm)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("MLP with " + str(num_hidden) + " Hidden Units")
plt.show()


Not that we need to for this example, but we can easily extend this model to add more layers, simply by adding another dense layer.


In [14]:
mlp = Sequential()
mlp.add(Dense(output_dim=num_hidden, input_dim=num_variables, activation='relu'))
mlp.add(Dense(output_dim=num_hidden, activation='relu'))
mlp.add(Dense(output_dim=1, activation='sigmoid'))
mlp.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
mlp.fit(X_train, y_train,validation_data=[X_val,y_val],nb_epoch=10,verbose=1)


Train on 4000 samples, validate on 1000 samples
Epoch 1/10
4000/4000 [==============================] - 0s - loss: 0.3127 - acc: 0.8510 - val_loss: 0.0372 - val_acc: 1.0000
Epoch 2/10
4000/4000 [==============================] - 0s - loss: 0.0112 - acc: 1.0000 - val_loss: 0.0032 - val_acc: 1.0000
Epoch 3/10
4000/4000 [==============================] - 0s - loss: 0.0020 - acc: 1.0000 - val_loss: 0.0012 - val_acc: 1.0000 - ETA: 0s - loss: 0.0025 - acc: 1.0000 - ETA: 0s - loss: 0.0022 - acc: 1.0000
Epoch 4/10
4000/4000 [==============================] - 0s - loss: 8.6576e-04 - acc: 1.0000 - val_loss: 5.9462e-04 - val_acc: 1.0000
Epoch 5/10
4000/4000 [==============================] - 0s - loss: 4.9211e-04 - acc: 1.0000 - val_loss: 3.6253e-04 - val_acc: 1.0000 - ETA: 0s - loss: 5.1859e-04 - acc: 1.0000
Epoch 6/10
4000/4000 [==============================] - 0s - loss: 3.1913e-04 - acc: 1.0000 - val_loss: 2.4520e-04 - val_acc: 1.0000 - ETA: 0s - loss: 3.3042e-04 - acc: 1.0000
Epoch 7/10
4000/4000 [==============================] - 0s - loss: 2.2346e-04 - acc: 1.0000 - val_loss: 1.7577e-04 - val_acc: 1.0000
Epoch 8/10
4000/4000 [==============================] - 0s - loss: 1.6531e-04 - acc: 1.0000 - val_loss: 1.3257e-04 - val_acc: 1.0000 - ETA: 0s - loss: 1.8606e-04 - acc: 1.0000
Epoch 9/10
4000/4000 [==============================] - 0s - loss: 1.2738e-04 - acc: 1.0000 - val_loss: 1.0297e-04 - val_acc: 1.0000
Epoch 10/10
4000/4000 [==============================] - 0s - loss: 1.0074e-04 - acc: 1.0000 - val_loss: 8.2025e-05 - val_acc: 1.0000 - ETA: 0s - loss: 1.0449e-04 - acc: 1.0000
Out[14]:
<keras.callbacks.History at 0x7f9438402790>

In [16]:
h = 0.02
x_min, x_max = X_val[:, 0].min(), X_val[:, 0].max()
y_min, y_max = X_val[:, 1].min(), X_val[:, 1].max()
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))
Z = mlp.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_val[:, 0], X_val[:, 1], c=y_val, cmap=plt.cm.coolwarm)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("MLP with 2 Hidden Layers and " + str(num_hidden) + " Units")
plt.show()


That's it for this tutorial. Hopefully it helped you get a handle on how to use the keras API to build some simple neural networks. In the next one, we'll go over how to use this same approach to build convolutional models.


In [ ]: