This notebook looks at the effect of increasing the number of hidden layers and the number of hidden units in each layer in order to model non-linear data.
The code is adapted from Simple end-to-end Tensorflow examples blog post by Jason Baldridge. The ideas here are identical, except the implementation uses Keras instead of Tensorflow.
In [1]:
from __future__ import division, print_function
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.utils import np_utils
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
def read_dataset(filename):
Z = np.loadtxt(filename, delimiter=",")
y = Z[:, 0]
X = Z[:, 1:]
return X, y
def plot_dataset(X, y):
Xred = X[y==0]
Xblue = X[y==1]
plt.scatter(Xred[:, 0], Xred[:, 1], color='r', marker='o')
plt.scatter(Xblue[:, 0], Xblue[:, 1], color='b', marker='o')
plt.xlabel("X[0]")
plt.ylabel("X[1]")
plt.show()
In [3]:
X, y = read_dataset("../data/linear.csv")
X = X[y != 2]
y = y[y != 2].astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)
Our y values need to be in sparse one-hot encoding format, so we convert the labels to this format. We then split the dataset 70% for training and 30% for testing.
In [4]:
Y = np_utils.to_categorical(y, 2)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.3, random_state=0)
Construct a model with an input layer which takes 2 inputs, and a softmax output layer. The softmax activation takes the scores from each output line and converts them to probabilities. There is no non-linear activation in this network. The equation is given by:
Training this model for 50 epochs yields accuracy of 82.8% on the test set.
In [5]:
model = Sequential()
model.add(Dense(2, input_shape=(2,)))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))
Out[5]:
In [6]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))
Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)
In [7]:
X, y = read_dataset("../data/moons.csv")
y = y.astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)
In [8]:
Y = np_utils.to_categorical(y, 2)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.3, random_state=0)
A network with the same configuration as above produces a accuracy of 85.67% on the test set, as opposed to 92.7% on the linear dataset.
Let us add a hidden layer of 50 hidden units and a Rectified Linear Unit (ReLu) activation to induce some non-linearity in the model. This gives us an accuracy of 89.3%.
In [9]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))
Out[9]:
In [10]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))
Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)
Lets add another layer. Layers produce non-linearity. We add another hidden layer with 100 units, also with a ReLu activation unit. This brings up our accuracy to 92%. The separation is still mostly linear, with just the beginnings of non-linearity.
In [11]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(100))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))
Out[11]:
In [12]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))
Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)
This is the saturn dataset. The data is definitely not linearly separable unless one applies a radial function to project onto a sphere and cut horizontally across the sphere. We will not do this, since our objective is to investigate the effect of hidden layers and hidden units.
In [13]:
X, y = read_dataset("../data/saturn.csv")
y = y.astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)
Previous network (producing 90.5% accuracy on test data for the moon data) produces 90.3% accuracy on the Saturn data. You can see the boundary getting non-linear.
In [14]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(100))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))
Out[14]:
In [15]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))
Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)
Lets increase the number of hidden units from 1 to 2. The number of hidden units in each layer is also much larger. We have also added Rectified Linear Unit activations and Dropouts on each layer. Using this, our accuracy goes up to 98.8%. The separation boundary is now definitely non-linear.
In [16]:
model = Sequential()
model.add(Dense(1024, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))
Out[16]:
In [17]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))
Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)
In [ ]: