Effect of Number and Depth of Hidden Units on Nonlinearity

This notebook looks at the effect of increasing the number of hidden layers and the number of hidden units in each layer in order to model non-linear data.

The code is adapted from Simple end-to-end Tensorflow examples blog post by Jason Baldridge. The ideas here are identical, except the implementation uses Keras instead of Tensorflow.

Imports and setup


In [1]:
from __future__ import division, print_function
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.utils import np_utils

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


Using Theano backend.

In [2]:
def read_dataset(filename):
    Z = np.loadtxt(filename, delimiter=",")
    y = Z[:, 0]
    X = Z[:, 1:]
    return X, y

def plot_dataset(X, y):
    Xred = X[y==0]
    Xblue = X[y==1]
    plt.scatter(Xred[:, 0], Xred[:, 1], color='r', marker='o')
    plt.scatter(Xblue[:, 0], Xblue[:, 1], color='b', marker='o')
    plt.xlabel("X[0]")
    plt.ylabel("X[1]")
    plt.show()

Linearly Separable Data

Our first dataset is linearly separable as seen in the scatter plot below.


In [3]:
X, y = read_dataset("../data/linear.csv")
X = X[y != 2]
y = y[y != 2].astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)


(1334, 2) (1334,)

Our y values need to be in sparse one-hot encoding format, so we convert the labels to this format. We then split the dataset 70% for training and 30% for testing.


In [4]:
Y = np_utils.to_categorical(y, 2)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.3, random_state=0)

Construct a model with an input layer which takes 2 inputs, and a softmax output layer. The softmax activation takes the scores from each output line and converts them to probabilities. There is no non-linear activation in this network. The equation is given by:

Training this model for 50 epochs yields accuracy of 82.8% on the test set.


In [5]:
model = Sequential()
model.add(Dense(2, input_shape=(2,)))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))


Train on 933 samples, validate on 401 samples
Epoch 1/50
933/933 [==============================] - 0s - loss: 0.5953 - acc: 0.7085 - val_loss: 0.3681 - val_acc: 0.8579
Epoch 2/50
933/933 [==============================] - 0s - loss: 0.3103 - acc: 0.9110 - val_loss: 0.3085 - val_acc: 0.8928
Epoch 3/50
933/933 [==============================] - 0s - loss: 0.2769 - acc: 0.9153 - val_loss: 0.2925 - val_acc: 0.8903
Epoch 4/50
933/933 [==============================] - 0s - loss: 0.2622 - acc: 0.9175 - val_loss: 0.2817 - val_acc: 0.8953
Epoch 5/50
933/933 [==============================] - 0s - loss: 0.2517 - acc: 0.9218 - val_loss: 0.2727 - val_acc: 0.8978
Epoch 6/50
933/933 [==============================] - 0s - loss: 0.2432 - acc: 0.9218 - val_loss: 0.2651 - val_acc: 0.8978
Epoch 7/50
933/933 [==============================] - 0s - loss: 0.2360 - acc: 0.9228 - val_loss: 0.2588 - val_acc: 0.8978
Epoch 8/50
933/933 [==============================] - 0s - loss: 0.2296 - acc: 0.9250 - val_loss: 0.2539 - val_acc: 0.8978
Epoch 9/50
933/933 [==============================] - 0s - loss: 0.2242 - acc: 0.9239 - val_loss: 0.2488 - val_acc: 0.8978
Epoch 10/50
933/933 [==============================] - 0s - loss: 0.2194 - acc: 0.9250 - val_loss: 0.2442 - val_acc: 0.9002
Epoch 11/50
933/933 [==============================] - 0s - loss: 0.2151 - acc: 0.9271 - val_loss: 0.2409 - val_acc: 0.8978
Epoch 12/50
933/933 [==============================] - 0s - loss: 0.2112 - acc: 0.9282 - val_loss: 0.2368 - val_acc: 0.9027
Epoch 13/50
933/933 [==============================] - 0s - loss: 0.2076 - acc: 0.9282 - val_loss: 0.2349 - val_acc: 0.8978
Epoch 14/50
933/933 [==============================] - 0s - loss: 0.2048 - acc: 0.9282 - val_loss: 0.2310 - val_acc: 0.9052
Epoch 15/50
933/933 [==============================] - 0s - loss: 0.2017 - acc: 0.9293 - val_loss: 0.2273 - val_acc: 0.9102
Epoch 16/50
933/933 [==============================] - 0s - loss: 0.1989 - acc: 0.9335 - val_loss: 0.2253 - val_acc: 0.9102
Epoch 17/50
933/933 [==============================] - 0s - loss: 0.1964 - acc: 0.9325 - val_loss: 0.2224 - val_acc: 0.9127
Epoch 18/50
933/933 [==============================] - 0s - loss: 0.1940 - acc: 0.9346 - val_loss: 0.2207 - val_acc: 0.9102
Epoch 19/50
933/933 [==============================] - 0s - loss: 0.1919 - acc: 0.9346 - val_loss: 0.2181 - val_acc: 0.9152
Epoch 20/50
933/933 [==============================] - 0s - loss: 0.1898 - acc: 0.9357 - val_loss: 0.2160 - val_acc: 0.9152
Epoch 21/50
933/933 [==============================] - 0s - loss: 0.1879 - acc: 0.9357 - val_loss: 0.2137 - val_acc: 0.9152
Epoch 22/50
933/933 [==============================] - 0s - loss: 0.1860 - acc: 0.9378 - val_loss: 0.2118 - val_acc: 0.9127
Epoch 23/50
933/933 [==============================] - 0s - loss: 0.1843 - acc: 0.9389 - val_loss: 0.2106 - val_acc: 0.9152
Epoch 24/50
933/933 [==============================] - 0s - loss: 0.1826 - acc: 0.9389 - val_loss: 0.2090 - val_acc: 0.9152
Epoch 25/50
933/933 [==============================] - 0s - loss: 0.1811 - acc: 0.9389 - val_loss: 0.2076 - val_acc: 0.9152
Epoch 26/50
933/933 [==============================] - 0s - loss: 0.1796 - acc: 0.9389 - val_loss: 0.2056 - val_acc: 0.9152
Epoch 27/50
933/933 [==============================] - 0s - loss: 0.1782 - acc: 0.9389 - val_loss: 0.2041 - val_acc: 0.9177
Epoch 28/50
933/933 [==============================] - 0s - loss: 0.1768 - acc: 0.9400 - val_loss: 0.2031 - val_acc: 0.9152
Epoch 29/50
933/933 [==============================] - 0s - loss: 0.1755 - acc: 0.9389 - val_loss: 0.2023 - val_acc: 0.9152
Epoch 30/50
933/933 [==============================] - 0s - loss: 0.1743 - acc: 0.9400 - val_loss: 0.2009 - val_acc: 0.9152
Epoch 31/50
933/933 [==============================] - 0s - loss: 0.1731 - acc: 0.9411 - val_loss: 0.1994 - val_acc: 0.9177
Epoch 32/50
933/933 [==============================] - 0s - loss: 0.1720 - acc: 0.9421 - val_loss: 0.1999 - val_acc: 0.9177
Epoch 33/50
933/933 [==============================] - 0s - loss: 0.1709 - acc: 0.9411 - val_loss: 0.1978 - val_acc: 0.9177
Epoch 34/50
933/933 [==============================] - 0s - loss: 0.1697 - acc: 0.9421 - val_loss: 0.1956 - val_acc: 0.9227
Epoch 35/50
933/933 [==============================] - 0s - loss: 0.1687 - acc: 0.9443 - val_loss: 0.1947 - val_acc: 0.9227
Epoch 36/50
933/933 [==============================] - 0s - loss: 0.1676 - acc: 0.9443 - val_loss: 0.1939 - val_acc: 0.9227
Epoch 37/50
933/933 [==============================] - 0s - loss: 0.1666 - acc: 0.9453 - val_loss: 0.1927 - val_acc: 0.9227
Epoch 38/50
933/933 [==============================] - 0s - loss: 0.1656 - acc: 0.9464 - val_loss: 0.1917 - val_acc: 0.9227
Epoch 39/50
933/933 [==============================] - 0s - loss: 0.1648 - acc: 0.9464 - val_loss: 0.1906 - val_acc: 0.9227
Epoch 40/50
933/933 [==============================] - 0s - loss: 0.1638 - acc: 0.9464 - val_loss: 0.1904 - val_acc: 0.9227
Epoch 41/50
933/933 [==============================] - 0s - loss: 0.1630 - acc: 0.9486 - val_loss: 0.1890 - val_acc: 0.9202
Epoch 42/50
933/933 [==============================] - 0s - loss: 0.1621 - acc: 0.9486 - val_loss: 0.1879 - val_acc: 0.9202
Epoch 43/50
933/933 [==============================] - 0s - loss: 0.1613 - acc: 0.9475 - val_loss: 0.1873 - val_acc: 0.9227
Epoch 44/50
933/933 [==============================] - 0s - loss: 0.1605 - acc: 0.9486 - val_loss: 0.1862 - val_acc: 0.9227
Epoch 45/50
933/933 [==============================] - 0s - loss: 0.1598 - acc: 0.9486 - val_loss: 0.1854 - val_acc: 0.9227
Epoch 46/50
933/933 [==============================] - 0s - loss: 0.1591 - acc: 0.9486 - val_loss: 0.1846 - val_acc: 0.9227
Epoch 47/50
933/933 [==============================] - 0s - loss: 0.1583 - acc: 0.9475 - val_loss: 0.1840 - val_acc: 0.9252
Epoch 48/50
933/933 [==============================] - 0s - loss: 0.1575 - acc: 0.9486 - val_loss: 0.1832 - val_acc: 0.9252
Epoch 49/50
933/933 [==============================] - 0s - loss: 0.1568 - acc: 0.9486 - val_loss: 0.1824 - val_acc: 0.9252
Epoch 50/50
933/933 [==============================] - 0s - loss: 0.1561 - acc: 0.9486 - val_loss: 0.1818 - val_acc: 0.9252
Out[5]:
<keras.callbacks.History at 0x11307c310>

In [6]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))

Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)


score: 0.182, accuracy: 0.925

Linearly non-separable data #1

The data below is the moons dataset. The two clusters cannot be separated by a straight line.


In [7]:
X, y = read_dataset("../data/moons.csv")
y = y.astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)


(2000, 2) (2000,)

In [8]:
Y = np_utils.to_categorical(y, 2)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.3, random_state=0)

A network with the same configuration as above produces a accuracy of 85.67% on the test set, as opposed to 92.7% on the linear dataset.

Let us add a hidden layer of 50 hidden units and a Rectified Linear Unit (ReLu) activation to induce some non-linearity in the model. This gives us an accuracy of 89.3%.


In [9]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))


Train on 1400 samples, validate on 600 samples
Epoch 1/50
1400/1400 [==============================] - 0s - loss: 0.6720 - acc: 0.5943 - val_loss: 0.6208 - val_acc: 0.7867
Epoch 2/50
1400/1400 [==============================] - 0s - loss: 0.5794 - acc: 0.8186 - val_loss: 0.5476 - val_acc: 0.8050
Epoch 3/50
1400/1400 [==============================] - 0s - loss: 0.5148 - acc: 0.8350 - val_loss: 0.4948 - val_acc: 0.8083
Epoch 4/50
1400/1400 [==============================] - 0s - loss: 0.4668 - acc: 0.8414 - val_loss: 0.4550 - val_acc: 0.8117
Epoch 5/50
1400/1400 [==============================] - 0s - loss: 0.4299 - acc: 0.8443 - val_loss: 0.4242 - val_acc: 0.8117
Epoch 6/50
1400/1400 [==============================] - 0s - loss: 0.4010 - acc: 0.8536 - val_loss: 0.4001 - val_acc: 0.8167
Epoch 7/50
1400/1400 [==============================] - 0s - loss: 0.3780 - acc: 0.8536 - val_loss: 0.3806 - val_acc: 0.8250
Epoch 8/50
1400/1400 [==============================] - 0s - loss: 0.3593 - acc: 0.8564 - val_loss: 0.3648 - val_acc: 0.8300
Epoch 9/50
1400/1400 [==============================] - 0s - loss: 0.3438 - acc: 0.8557 - val_loss: 0.3517 - val_acc: 0.8317
Epoch 10/50
1400/1400 [==============================] - 0s - loss: 0.3309 - acc: 0.8593 - val_loss: 0.3406 - val_acc: 0.8350
Epoch 11/50
1400/1400 [==============================] - 0s - loss: 0.3197 - acc: 0.8629 - val_loss: 0.3307 - val_acc: 0.8400
Epoch 12/50
1400/1400 [==============================] - 0s - loss: 0.3100 - acc: 0.8650 - val_loss: 0.3222 - val_acc: 0.8417
Epoch 13/50
1400/1400 [==============================] - 0s - loss: 0.3015 - acc: 0.8664 - val_loss: 0.3147 - val_acc: 0.8483
Epoch 14/50
1400/1400 [==============================] - 0s - loss: 0.2942 - acc: 0.8714 - val_loss: 0.3084 - val_acc: 0.8517
Epoch 15/50
1400/1400 [==============================] - 0s - loss: 0.2878 - acc: 0.8700 - val_loss: 0.3027 - val_acc: 0.8567
Epoch 16/50
1400/1400 [==============================] - 0s - loss: 0.2822 - acc: 0.8743 - val_loss: 0.2977 - val_acc: 0.8567
Epoch 17/50
1400/1400 [==============================] - 0s - loss: 0.2770 - acc: 0.8743 - val_loss: 0.2932 - val_acc: 0.8567
Epoch 18/50
1400/1400 [==============================] - 0s - loss: 0.2724 - acc: 0.8764 - val_loss: 0.2891 - val_acc: 0.8567
Epoch 19/50
1400/1400 [==============================] - 0s - loss: 0.2681 - acc: 0.8771 - val_loss: 0.2853 - val_acc: 0.8617
Epoch 20/50
1400/1400 [==============================] - 0s - loss: 0.2643 - acc: 0.8807 - val_loss: 0.2817 - val_acc: 0.8650
Epoch 21/50
1400/1400 [==============================] - 0s - loss: 0.2608 - acc: 0.8814 - val_loss: 0.2785 - val_acc: 0.8667
Epoch 22/50
1400/1400 [==============================] - 0s - loss: 0.2575 - acc: 0.8836 - val_loss: 0.2756 - val_acc: 0.8683
Epoch 23/50
1400/1400 [==============================] - 0s - loss: 0.2546 - acc: 0.8829 - val_loss: 0.2727 - val_acc: 0.8700
Epoch 24/50
1400/1400 [==============================] - 0s - loss: 0.2517 - acc: 0.8850 - val_loss: 0.2701 - val_acc: 0.8700
Epoch 25/50
1400/1400 [==============================] - 0s - loss: 0.2491 - acc: 0.8850 - val_loss: 0.2677 - val_acc: 0.8717
Epoch 26/50
1400/1400 [==============================] - 0s - loss: 0.2467 - acc: 0.8864 - val_loss: 0.2654 - val_acc: 0.8717
Epoch 27/50
1400/1400 [==============================] - 0s - loss: 0.2444 - acc: 0.8864 - val_loss: 0.2633 - val_acc: 0.8717
Epoch 28/50
1400/1400 [==============================] - 0s - loss: 0.2423 - acc: 0.8900 - val_loss: 0.2614 - val_acc: 0.8733
Epoch 29/50
1400/1400 [==============================] - 0s - loss: 0.2404 - acc: 0.8900 - val_loss: 0.2594 - val_acc: 0.8733
Epoch 30/50
1400/1400 [==============================] - 0s - loss: 0.2386 - acc: 0.8921 - val_loss: 0.2577 - val_acc: 0.8733
Epoch 31/50
1400/1400 [==============================] - 0s - loss: 0.2368 - acc: 0.8907 - val_loss: 0.2560 - val_acc: 0.8750
Epoch 32/50
1400/1400 [==============================] - 0s - loss: 0.2354 - acc: 0.8936 - val_loss: 0.2545 - val_acc: 0.8767
Epoch 33/50
1400/1400 [==============================] - 0s - loss: 0.2338 - acc: 0.8936 - val_loss: 0.2531 - val_acc: 0.8750
Epoch 34/50
1400/1400 [==============================] - 0s - loss: 0.2324 - acc: 0.8943 - val_loss: 0.2518 - val_acc: 0.8783
Epoch 35/50
1400/1400 [==============================] - 0s - loss: 0.2311 - acc: 0.8943 - val_loss: 0.2505 - val_acc: 0.8783
Epoch 36/50
1400/1400 [==============================] - 0s - loss: 0.2298 - acc: 0.8950 - val_loss: 0.2492 - val_acc: 0.8800
Epoch 37/50
1400/1400 [==============================] - 0s - loss: 0.2287 - acc: 0.8957 - val_loss: 0.2481 - val_acc: 0.8800
Epoch 38/50
1400/1400 [==============================] - 0s - loss: 0.2276 - acc: 0.8964 - val_loss: 0.2473 - val_acc: 0.8800
Epoch 39/50
1400/1400 [==============================] - 0s - loss: 0.2266 - acc: 0.8957 - val_loss: 0.2462 - val_acc: 0.8833
Epoch 40/50
1400/1400 [==============================] - 0s - loss: 0.2255 - acc: 0.8936 - val_loss: 0.2452 - val_acc: 0.8833
Epoch 41/50
1400/1400 [==============================] - 0s - loss: 0.2246 - acc: 0.8950 - val_loss: 0.2442 - val_acc: 0.8833
Epoch 42/50
1400/1400 [==============================] - 0s - loss: 0.2238 - acc: 0.8957 - val_loss: 0.2433 - val_acc: 0.8850
Epoch 43/50
1400/1400 [==============================] - 0s - loss: 0.2230 - acc: 0.8971 - val_loss: 0.2425 - val_acc: 0.8867
Epoch 44/50
1400/1400 [==============================] - 0s - loss: 0.2223 - acc: 0.8986 - val_loss: 0.2418 - val_acc: 0.8867
Epoch 45/50
1400/1400 [==============================] - 0s - loss: 0.2214 - acc: 0.8971 - val_loss: 0.2411 - val_acc: 0.8883
Epoch 46/50
1400/1400 [==============================] - 0s - loss: 0.2206 - acc: 0.8979 - val_loss: 0.2403 - val_acc: 0.8883
Epoch 47/50
1400/1400 [==============================] - 0s - loss: 0.2200 - acc: 0.8971 - val_loss: 0.2397 - val_acc: 0.8900
Epoch 48/50
1400/1400 [==============================] - 0s - loss: 0.2192 - acc: 0.8979 - val_loss: 0.2390 - val_acc: 0.8933
Epoch 49/50
1400/1400 [==============================] - 0s - loss: 0.2186 - acc: 0.8993 - val_loss: 0.2381 - val_acc: 0.8900
Epoch 50/50
1400/1400 [==============================] - 0s - loss: 0.2179 - acc: 0.8993 - val_loss: 0.2378 - val_acc: 0.8917
Out[9]:
<keras.callbacks.History at 0x113056dd0>

In [10]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))

Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)


score: 0.238, accuracy: 0.892

Lets add another layer. Layers produce non-linearity. We add another hidden layer with 100 units, also with a ReLu activation unit. This brings up our accuracy to 92%. The separation is still mostly linear, with just the beginnings of non-linearity.


In [11]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(100))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))


Train on 1400 samples, validate on 600 samples
Epoch 1/50
1400/1400 [==============================] - 0s - loss: 0.6461 - acc: 0.7443 - val_loss: 0.6040 - val_acc: 0.7600
Epoch 2/50
1400/1400 [==============================] - 0s - loss: 0.5660 - acc: 0.7943 - val_loss: 0.5371 - val_acc: 0.7867
Epoch 3/50
1400/1400 [==============================] - 0s - loss: 0.5031 - acc: 0.8086 - val_loss: 0.4838 - val_acc: 0.7917
Epoch 4/50
1400/1400 [==============================] - 0s - loss: 0.4529 - acc: 0.8200 - val_loss: 0.4414 - val_acc: 0.7967
Epoch 5/50
1400/1400 [==============================] - 0s - loss: 0.4137 - acc: 0.8286 - val_loss: 0.4099 - val_acc: 0.8033
Epoch 6/50
1400/1400 [==============================] - 0s - loss: 0.3844 - acc: 0.8321 - val_loss: 0.3867 - val_acc: 0.8067
Epoch 7/50
1400/1400 [==============================] - 0s - loss: 0.3621 - acc: 0.8407 - val_loss: 0.3687 - val_acc: 0.8167
Epoch 8/50
1400/1400 [==============================] - 0s - loss: 0.3447 - acc: 0.8421 - val_loss: 0.3547 - val_acc: 0.8200
Epoch 9/50
1400/1400 [==============================] - 0s - loss: 0.3305 - acc: 0.8464 - val_loss: 0.3426 - val_acc: 0.8283
Epoch 10/50
1400/1400 [==============================] - 0s - loss: 0.3185 - acc: 0.8507 - val_loss: 0.3321 - val_acc: 0.8333
Epoch 11/50
1400/1400 [==============================] - 0s - loss: 0.3079 - acc: 0.8557 - val_loss: 0.3228 - val_acc: 0.8367
Epoch 12/50
1400/1400 [==============================] - 0s - loss: 0.2986 - acc: 0.8621 - val_loss: 0.3143 - val_acc: 0.8433
Epoch 13/50
1400/1400 [==============================] - 0s - loss: 0.2904 - acc: 0.8657 - val_loss: 0.3069 - val_acc: 0.8500
Epoch 14/50
1400/1400 [==============================] - 0s - loss: 0.2829 - acc: 0.8679 - val_loss: 0.2999 - val_acc: 0.8500
Epoch 15/50
1400/1400 [==============================] - 0s - loss: 0.2761 - acc: 0.8736 - val_loss: 0.2936 - val_acc: 0.8567
Epoch 16/50
1400/1400 [==============================] - 0s - loss: 0.2700 - acc: 0.8786 - val_loss: 0.2877 - val_acc: 0.8633
Epoch 17/50
1400/1400 [==============================] - 0s - loss: 0.2644 - acc: 0.8807 - val_loss: 0.2826 - val_acc: 0.8683
Epoch 18/50
1400/1400 [==============================] - 0s - loss: 0.2595 - acc: 0.8793 - val_loss: 0.2777 - val_acc: 0.8683
Epoch 19/50
1400/1400 [==============================] - 0s - loss: 0.2549 - acc: 0.8836 - val_loss: 0.2730 - val_acc: 0.8683
Epoch 20/50
1400/1400 [==============================] - 0s - loss: 0.2506 - acc: 0.8857 - val_loss: 0.2689 - val_acc: 0.8717
Epoch 21/50
1400/1400 [==============================] - 0s - loss: 0.2465 - acc: 0.8886 - val_loss: 0.2650 - val_acc: 0.8717
Epoch 22/50
1400/1400 [==============================] - 0s - loss: 0.2427 - acc: 0.8914 - val_loss: 0.2613 - val_acc: 0.8733
Epoch 23/50
1400/1400 [==============================] - 0s - loss: 0.2392 - acc: 0.8914 - val_loss: 0.2579 - val_acc: 0.8750
Epoch 24/50
1400/1400 [==============================] - 0s - loss: 0.2358 - acc: 0.8907 - val_loss: 0.2545 - val_acc: 0.8767
Epoch 25/50
1400/1400 [==============================] - 0s - loss: 0.2329 - acc: 0.8943 - val_loss: 0.2514 - val_acc: 0.8767
Epoch 26/50
1400/1400 [==============================] - 0s - loss: 0.2297 - acc: 0.8950 - val_loss: 0.2482 - val_acc: 0.8767
Epoch 27/50
1400/1400 [==============================] - 0s - loss: 0.2269 - acc: 0.8986 - val_loss: 0.2453 - val_acc: 0.8767
Epoch 28/50
1400/1400 [==============================] - 0s - loss: 0.2242 - acc: 0.8979 - val_loss: 0.2427 - val_acc: 0.8783
Epoch 29/50
1400/1400 [==============================] - 0s - loss: 0.2215 - acc: 0.9014 - val_loss: 0.2400 - val_acc: 0.8783
Epoch 30/50
1400/1400 [==============================] - 0s - loss: 0.2192 - acc: 0.9021 - val_loss: 0.2384 - val_acc: 0.8833
Epoch 31/50
1400/1400 [==============================] - 0s - loss: 0.2170 - acc: 0.9007 - val_loss: 0.2354 - val_acc: 0.8800
Epoch 32/50
1400/1400 [==============================] - 0s - loss: 0.2148 - acc: 0.9007 - val_loss: 0.2331 - val_acc: 0.8817
Epoch 33/50
1400/1400 [==============================] - 0s - loss: 0.2125 - acc: 0.9014 - val_loss: 0.2314 - val_acc: 0.8867
Epoch 34/50
1400/1400 [==============================] - 0s - loss: 0.2106 - acc: 0.9050 - val_loss: 0.2291 - val_acc: 0.8900
Epoch 35/50
1400/1400 [==============================] - 0s - loss: 0.2085 - acc: 0.9036 - val_loss: 0.2278 - val_acc: 0.8950
Epoch 36/50
1400/1400 [==============================] - 0s - loss: 0.2068 - acc: 0.9064 - val_loss: 0.2250 - val_acc: 0.8950
Epoch 37/50
1400/1400 [==============================] - 0s - loss: 0.2048 - acc: 0.9064 - val_loss: 0.2230 - val_acc: 0.8967
Epoch 38/50
1400/1400 [==============================] - 0s - loss: 0.2027 - acc: 0.9050 - val_loss: 0.2209 - val_acc: 0.8983
Epoch 39/50
1400/1400 [==============================] - 0s - loss: 0.2010 - acc: 0.9043 - val_loss: 0.2193 - val_acc: 0.9033
Epoch 40/50
1400/1400 [==============================] - 0s - loss: 0.1991 - acc: 0.9064 - val_loss: 0.2166 - val_acc: 0.9050
Epoch 41/50
1400/1400 [==============================] - 0s - loss: 0.1973 - acc: 0.9071 - val_loss: 0.2156 - val_acc: 0.9033
Epoch 42/50
1400/1400 [==============================] - 0s - loss: 0.1956 - acc: 0.9107 - val_loss: 0.2127 - val_acc: 0.9033
Epoch 43/50
1400/1400 [==============================] - 0s - loss: 0.1937 - acc: 0.9071 - val_loss: 0.2110 - val_acc: 0.9050
Epoch 44/50
1400/1400 [==============================] - 0s - loss: 0.1919 - acc: 0.9093 - val_loss: 0.2093 - val_acc: 0.9050
Epoch 45/50
1400/1400 [==============================] - 0s - loss: 0.1899 - acc: 0.9136 - val_loss: 0.2066 - val_acc: 0.9067
Epoch 46/50
1400/1400 [==============================] - 0s - loss: 0.1880 - acc: 0.9100 - val_loss: 0.2058 - val_acc: 0.9067
Epoch 47/50
1400/1400 [==============================] - 0s - loss: 0.1860 - acc: 0.9121 - val_loss: 0.2057 - val_acc: 0.9050
Epoch 48/50
1400/1400 [==============================] - 0s - loss: 0.1849 - acc: 0.9136 - val_loss: 0.2035 - val_acc: 0.9067
Epoch 49/50
1400/1400 [==============================] - 0s - loss: 0.1833 - acc: 0.9164 - val_loss: 0.1995 - val_acc: 0.9083
Epoch 50/50
1400/1400 [==============================] - 0s - loss: 0.1812 - acc: 0.9150 - val_loss: 0.1974 - val_acc: 0.9100
Out[11]:
<keras.callbacks.History at 0x114262110>

In [12]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))

Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)


score: 0.197, accuracy: 0.910

Linearly non-separable data #2

This is the saturn dataset. The data is definitely not linearly separable unless one applies a radial function to project onto a sphere and cut horizontally across the sphere. We will not do this, since our objective is to investigate the effect of hidden layers and hidden units.


In [13]:
X, y = read_dataset("../data/saturn.csv")
y = y.astype("int")
print(X.shape, y.shape)
plot_dataset(X, y)


(2000, 2) (2000,)

Previous network (producing 90.5% accuracy on test data for the moon data) produces 90.3% accuracy on the Saturn data. You can see the boundary getting non-linear.


In [14]:
model = Sequential()
model.add(Dense(50, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dense(100))
model.add(Activation("relu"))
model.add(Dense(2))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))


Train on 1400 samples, validate on 600 samples
Epoch 1/50
1400/1400 [==============================] - 0s - loss: 0.6458 - acc: 0.6900 - val_loss: 0.6154 - val_acc: 0.7217
Epoch 2/50
1400/1400 [==============================] - 0s - loss: 0.5856 - acc: 0.7629 - val_loss: 0.5632 - val_acc: 0.7500
Epoch 3/50
1400/1400 [==============================] - 0s - loss: 0.5337 - acc: 0.7879 - val_loss: 0.5164 - val_acc: 0.7733
Epoch 4/50
1400/1400 [==============================] - 0s - loss: 0.4872 - acc: 0.8079 - val_loss: 0.4735 - val_acc: 0.7917
Epoch 5/50
1400/1400 [==============================] - 0s - loss: 0.4453 - acc: 0.8164 - val_loss: 0.4367 - val_acc: 0.8017
Epoch 6/50
1400/1400 [==============================] - 0s - loss: 0.4101 - acc: 0.8307 - val_loss: 0.4062 - val_acc: 0.8033
Epoch 7/50
1400/1400 [==============================] - 0s - loss: 0.3805 - acc: 0.8414 - val_loss: 0.3809 - val_acc: 0.8117
Epoch 8/50
1400/1400 [==============================] - 0s - loss: 0.3560 - acc: 0.8507 - val_loss: 0.3603 - val_acc: 0.8200
Epoch 9/50
1400/1400 [==============================] - 0s - loss: 0.3359 - acc: 0.8557 - val_loss: 0.3433 - val_acc: 0.8300
Epoch 10/50
1400/1400 [==============================] - 0s - loss: 0.3191 - acc: 0.8607 - val_loss: 0.3289 - val_acc: 0.8417
Epoch 11/50
1400/1400 [==============================] - 0s - loss: 0.3050 - acc: 0.8629 - val_loss: 0.3163 - val_acc: 0.8467
Epoch 12/50
1400/1400 [==============================] - 0s - loss: 0.2924 - acc: 0.8671 - val_loss: 0.3049 - val_acc: 0.8517
Epoch 13/50
1400/1400 [==============================] - 0s - loss: 0.2820 - acc: 0.8729 - val_loss: 0.2956 - val_acc: 0.8567
Epoch 14/50
1400/1400 [==============================] - 0s - loss: 0.2730 - acc: 0.8800 - val_loss: 0.2877 - val_acc: 0.8617
Epoch 15/50
1400/1400 [==============================] - 0s - loss: 0.2652 - acc: 0.8814 - val_loss: 0.2804 - val_acc: 0.8683
Epoch 16/50
1400/1400 [==============================] - 0s - loss: 0.2583 - acc: 0.8864 - val_loss: 0.2743 - val_acc: 0.8717
Epoch 17/50
1400/1400 [==============================] - 0s - loss: 0.2523 - acc: 0.8871 - val_loss: 0.2688 - val_acc: 0.8717
Epoch 18/50
1400/1400 [==============================] - 0s - loss: 0.2473 - acc: 0.8893 - val_loss: 0.2641 - val_acc: 0.8750
Epoch 19/50
1400/1400 [==============================] - 0s - loss: 0.2425 - acc: 0.8921 - val_loss: 0.2596 - val_acc: 0.8783
Epoch 20/50
1400/1400 [==============================] - 0s - loss: 0.2385 - acc: 0.8929 - val_loss: 0.2556 - val_acc: 0.8800
Epoch 21/50
1400/1400 [==============================] - 0s - loss: 0.2348 - acc: 0.8936 - val_loss: 0.2521 - val_acc: 0.8833
Epoch 22/50
1400/1400 [==============================] - 0s - loss: 0.2318 - acc: 0.8957 - val_loss: 0.2492 - val_acc: 0.8850
Epoch 23/50
1400/1400 [==============================] - 0s - loss: 0.2290 - acc: 0.8964 - val_loss: 0.2466 - val_acc: 0.8883
Epoch 24/50
1400/1400 [==============================] - 0s - loss: 0.2261 - acc: 0.9007 - val_loss: 0.2442 - val_acc: 0.8900
Epoch 25/50
1400/1400 [==============================] - 0s - loss: 0.2239 - acc: 0.9000 - val_loss: 0.2424 - val_acc: 0.8917
Epoch 26/50
1400/1400 [==============================] - 0s - loss: 0.2213 - acc: 0.9021 - val_loss: 0.2409 - val_acc: 0.8933
Epoch 27/50
1400/1400 [==============================] - 0s - loss: 0.2193 - acc: 0.9029 - val_loss: 0.2377 - val_acc: 0.8933
Epoch 28/50
1400/1400 [==============================] - 0s - loss: 0.2174 - acc: 0.9043 - val_loss: 0.2357 - val_acc: 0.8983
Epoch 29/50
1400/1400 [==============================] - 0s - loss: 0.2157 - acc: 0.9036 - val_loss: 0.2340 - val_acc: 0.8983
Epoch 30/50
1400/1400 [==============================] - 0s - loss: 0.2137 - acc: 0.9014 - val_loss: 0.2315 - val_acc: 0.8983
Epoch 31/50
1400/1400 [==============================] - 0s - loss: 0.2119 - acc: 0.9036 - val_loss: 0.2298 - val_acc: 0.8983
Epoch 32/50
1400/1400 [==============================] - 0s - loss: 0.2104 - acc: 0.9043 - val_loss: 0.2282 - val_acc: 0.9033
Epoch 33/50
1400/1400 [==============================] - 0s - loss: 0.2087 - acc: 0.8993 - val_loss: 0.2279 - val_acc: 0.9017
Epoch 34/50
1400/1400 [==============================] - 0s - loss: 0.2077 - acc: 0.9014 - val_loss: 0.2267 - val_acc: 0.9017
Epoch 35/50
1400/1400 [==============================] - 0s - loss: 0.2062 - acc: 0.9036 - val_loss: 0.2249 - val_acc: 0.9033
Epoch 36/50
1400/1400 [==============================] - 0s - loss: 0.2043 - acc: 0.9036 - val_loss: 0.2233 - val_acc: 0.9033
Epoch 37/50
1400/1400 [==============================] - 0s - loss: 0.2029 - acc: 0.9029 - val_loss: 0.2209 - val_acc: 0.9033
Epoch 38/50
1400/1400 [==============================] - 0s - loss: 0.2013 - acc: 0.9064 - val_loss: 0.2191 - val_acc: 0.9000
Epoch 39/50
1400/1400 [==============================] - 0s - loss: 0.2004 - acc: 0.9021 - val_loss: 0.2186 - val_acc: 0.9033
Epoch 40/50
1400/1400 [==============================] - 0s - loss: 0.1988 - acc: 0.9029 - val_loss: 0.2171 - val_acc: 0.9033
Epoch 41/50
1400/1400 [==============================] - 0s - loss: 0.1973 - acc: 0.9071 - val_loss: 0.2149 - val_acc: 0.9000
Epoch 42/50
1400/1400 [==============================] - 0s - loss: 0.1957 - acc: 0.9021 - val_loss: 0.2136 - val_acc: 0.9000
Epoch 43/50
1400/1400 [==============================] - 0s - loss: 0.1947 - acc: 0.9050 - val_loss: 0.2128 - val_acc: 0.9050
Epoch 44/50
1400/1400 [==============================] - 0s - loss: 0.1935 - acc: 0.9079 - val_loss: 0.2114 - val_acc: 0.9050
Epoch 45/50
1400/1400 [==============================] - 0s - loss: 0.1919 - acc: 0.9079 - val_loss: 0.2112 - val_acc: 0.9050
Epoch 46/50
1400/1400 [==============================] - 0s - loss: 0.1907 - acc: 0.9107 - val_loss: 0.2099 - val_acc: 0.9050
Epoch 47/50
1400/1400 [==============================] - 0s - loss: 0.1894 - acc: 0.9121 - val_loss: 0.2066 - val_acc: 0.9017
Epoch 48/50
1400/1400 [==============================] - 0s - loss: 0.1879 - acc: 0.9079 - val_loss: 0.2058 - val_acc: 0.9050
Epoch 49/50
1400/1400 [==============================] - 0s - loss: 0.1867 - acc: 0.9114 - val_loss: 0.2041 - val_acc: 0.9050
Epoch 50/50
1400/1400 [==============================] - 0s - loss: 0.1854 - acc: 0.9129 - val_loss: 0.2032 - val_acc: 0.9067
Out[14]:
<keras.callbacks.History at 0x114765650>

In [15]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))

Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)


score: 0.203, accuracy: 0.907

Lets increase the number of hidden units from 1 to 2. The number of hidden units in each layer is also much larger. We have also added Rectified Linear Unit activations and Dropouts on each layer. Using this, our accuracy goes up to 98.8%. The separation boundary is now definitely non-linear.


In [16]:
model = Sequential()

model.add(Dense(1024, input_shape=(2,)))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation("relu"))

model.add(Dense(2))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
model.fit(Xtrain, Ytrain, batch_size=32, nb_epoch=50, validation_data=(Xtest, Ytest))


Train on 1400 samples, validate on 600 samples
Epoch 1/50
1400/1400 [==============================] - 0s - loss: 0.6546 - acc: 0.8279 - val_loss: 0.6185 - val_acc: 0.8150
Epoch 2/50
1400/1400 [==============================] - 0s - loss: 0.5990 - acc: 0.8386 - val_loss: 0.5622 - val_acc: 0.8100
Epoch 3/50
1400/1400 [==============================] - 0s - loss: 0.5424 - acc: 0.8429 - val_loss: 0.5063 - val_acc: 0.8100
Epoch 4/50
1400/1400 [==============================] - 0s - loss: 0.4833 - acc: 0.8421 - val_loss: 0.4525 - val_acc: 0.8133
Epoch 5/50
1400/1400 [==============================] - 0s - loss: 0.4311 - acc: 0.8450 - val_loss: 0.4068 - val_acc: 0.8167
Epoch 6/50
1400/1400 [==============================] - 0s - loss: 0.3867 - acc: 0.8514 - val_loss: 0.3715 - val_acc: 0.8250
Epoch 7/50
1400/1400 [==============================] - 0s - loss: 0.3519 - acc: 0.8571 - val_loss: 0.3451 - val_acc: 0.8300
Epoch 8/50
1400/1400 [==============================] - 0s - loss: 0.3244 - acc: 0.8629 - val_loss: 0.3231 - val_acc: 0.8417
Epoch 9/50
1400/1400 [==============================] - 0s - loss: 0.3038 - acc: 0.8671 - val_loss: 0.3051 - val_acc: 0.8533
Epoch 10/50
1400/1400 [==============================] - 0s - loss: 0.2890 - acc: 0.8700 - val_loss: 0.2914 - val_acc: 0.8567
Epoch 11/50
1400/1400 [==============================] - 0s - loss: 0.2759 - acc: 0.8800 - val_loss: 0.2787 - val_acc: 0.8667
Epoch 12/50
1400/1400 [==============================] - 0s - loss: 0.2625 - acc: 0.8836 - val_loss: 0.2672 - val_acc: 0.8750
Epoch 13/50
1400/1400 [==============================] - 0s - loss: 0.2521 - acc: 0.8914 - val_loss: 0.2583 - val_acc: 0.8783
Epoch 14/50
1400/1400 [==============================] - 0s - loss: 0.2412 - acc: 0.8907 - val_loss: 0.2499 - val_acc: 0.8767
Epoch 15/50
1400/1400 [==============================] - 0s - loss: 0.2365 - acc: 0.8957 - val_loss: 0.2427 - val_acc: 0.8800
Epoch 16/50
1400/1400 [==============================] - 0s - loss: 0.2267 - acc: 0.8971 - val_loss: 0.2365 - val_acc: 0.8833
Epoch 17/50
1400/1400 [==============================] - 0s - loss: 0.2203 - acc: 0.9000 - val_loss: 0.2305 - val_acc: 0.8850
Epoch 18/50
1400/1400 [==============================] - 0s - loss: 0.2178 - acc: 0.9043 - val_loss: 0.2256 - val_acc: 0.8917
Epoch 19/50
1400/1400 [==============================] - 0s - loss: 0.2093 - acc: 0.9036 - val_loss: 0.2213 - val_acc: 0.9050
Epoch 20/50
1400/1400 [==============================] - 0s - loss: 0.2108 - acc: 0.9036 - val_loss: 0.2163 - val_acc: 0.9033
Epoch 21/50
1400/1400 [==============================] - 0s - loss: 0.2014 - acc: 0.9086 - val_loss: 0.2134 - val_acc: 0.9050
Epoch 22/50
1400/1400 [==============================] - 0s - loss: 0.1979 - acc: 0.9114 - val_loss: 0.2065 - val_acc: 0.9083
Epoch 23/50
1400/1400 [==============================] - 0s - loss: 0.1963 - acc: 0.9064 - val_loss: 0.2022 - val_acc: 0.9100
Epoch 24/50
1400/1400 [==============================] - 0s - loss: 0.1917 - acc: 0.9107 - val_loss: 0.1988 - val_acc: 0.9083
Epoch 25/50
1400/1400 [==============================] - 0s - loss: 0.1831 - acc: 0.9171 - val_loss: 0.1951 - val_acc: 0.9100
Epoch 26/50
1400/1400 [==============================] - 0s - loss: 0.1815 - acc: 0.9171 - val_loss: 0.1897 - val_acc: 0.9117
Epoch 27/50
1400/1400 [==============================] - 0s - loss: 0.1783 - acc: 0.9193 - val_loss: 0.1866 - val_acc: 0.9150
Epoch 28/50
1400/1400 [==============================] - 0s - loss: 0.1762 - acc: 0.9207 - val_loss: 0.1824 - val_acc: 0.9167
Epoch 29/50
1400/1400 [==============================] - 0s - loss: 0.1741 - acc: 0.9221 - val_loss: 0.1776 - val_acc: 0.9200
Epoch 30/50
1400/1400 [==============================] - 0s - loss: 0.1659 - acc: 0.9286 - val_loss: 0.1733 - val_acc: 0.9250
Epoch 31/50
1400/1400 [==============================] - 0s - loss: 0.1660 - acc: 0.9207 - val_loss: 0.1686 - val_acc: 0.9250
Epoch 32/50
1400/1400 [==============================] - 0s - loss: 0.1626 - acc: 0.9293 - val_loss: 0.1643 - val_acc: 0.9283
Epoch 33/50
1400/1400 [==============================] - 0s - loss: 0.1559 - acc: 0.9336 - val_loss: 0.1605 - val_acc: 0.9267
Epoch 34/50
1400/1400 [==============================] - 0s - loss: 0.1541 - acc: 0.9329 - val_loss: 0.1546 - val_acc: 0.9283
Epoch 35/50
1400/1400 [==============================] - 0s - loss: 0.1463 - acc: 0.9379 - val_loss: 0.1502 - val_acc: 0.9350
Epoch 36/50
1400/1400 [==============================] - 0s - loss: 0.1405 - acc: 0.9357 - val_loss: 0.1462 - val_acc: 0.9383
Epoch 37/50
1400/1400 [==============================] - 0s - loss: 0.1380 - acc: 0.9443 - val_loss: 0.1404 - val_acc: 0.9383
Epoch 38/50
1400/1400 [==============================] - 0s - loss: 0.1310 - acc: 0.9479 - val_loss: 0.1352 - val_acc: 0.9400
Epoch 39/50
1400/1400 [==============================] - 0s - loss: 0.1273 - acc: 0.9450 - val_loss: 0.1300 - val_acc: 0.9433
Epoch 40/50
1400/1400 [==============================] - 0s - loss: 0.1256 - acc: 0.9521 - val_loss: 0.1248 - val_acc: 0.9467
Epoch 41/50
1400/1400 [==============================] - 0s - loss: 0.1249 - acc: 0.9493 - val_loss: 0.1215 - val_acc: 0.9517
Epoch 42/50
1400/1400 [==============================] - 0s - loss: 0.1151 - acc: 0.9579 - val_loss: 0.1152 - val_acc: 0.9550
Epoch 43/50
1400/1400 [==============================] - 0s - loss: 0.1117 - acc: 0.9593 - val_loss: 0.1090 - val_acc: 0.9583
Epoch 44/50
1400/1400 [==============================] - 0s - loss: 0.1080 - acc: 0.9629 - val_loss: 0.1038 - val_acc: 0.9617
Epoch 45/50
1400/1400 [==============================] - 0s - loss: 0.1022 - acc: 0.9621 - val_loss: 0.0986 - val_acc: 0.9633
Epoch 46/50
1400/1400 [==============================] - 0s - loss: 0.0961 - acc: 0.9650 - val_loss: 0.0973 - val_acc: 0.9650
Epoch 47/50
1400/1400 [==============================] - 0s - loss: 0.0910 - acc: 0.9657 - val_loss: 0.0897 - val_acc: 0.9683
Epoch 48/50
1400/1400 [==============================] - 0s - loss: 0.0904 - acc: 0.9743 - val_loss: 0.0842 - val_acc: 0.9717
Epoch 49/50
1400/1400 [==============================] - 0s - loss: 0.0845 - acc: 0.9721 - val_loss: 0.0800 - val_acc: 0.9733
Epoch 50/50
1400/1400 [==============================] - 0s - loss: 0.0801 - acc: 0.9736 - val_loss: 0.0749 - val_acc: 0.9783
Out[16]:
<keras.callbacks.History at 0x1162c7b50>

In [17]:
score = model.evaluate(Xtest, Ytest, verbose=0)
print("score: %.3f, accuracy: %.3f" % (score[0], score[1]))

Y_ = model.predict(X)
y_ = np_utils.categorical_probas_to_classes(Y_)
plot_dataset(X, y_)


score: 0.075, accuracy: 0.978

In [ ]: