A "Hello world" in Keras

Keras: Theano-based Deep Learning library

This is just a minimal and very brief example of using Keras to provide a simple working setup that can be modified for more complex tasks.

This model's task is rather trivial - to classify a few almost linearly separable 2D gaussian blobs.

The setup is very similar to the previous Theanets "Hello world".

Install

This tutorial assumes you have the basic stuff prepared.

  • Python (eg. anaconda with Python 3.x)
  • numpy, matplotlib, ipython, sklearn
# get theano (eg. the latest from github)
pip install git+git://github.com/Theano/Theano.git
pip install Keras

In [1]:
%pylab inline

from keras.layers.core import Dense, Activation
from keras.models import Sequential
from keras.utils import np_utils

from sklearn.cross_validation import train_test_split
from sklearn.datasets.samples_generator import make_blobs
from sklearn.metrics import classification_report, confusion_matrix


Populating the interactive namespace from numpy and matplotlib

Generate data

We generate a very simple dataset: three almost linearly separable gaussian blobs in 2D.


In [36]:
n_samples = 10000
n_classes = 3
n_features = 2

# centers - number of classes
# n_features - dimension of the data
X, y_int = make_blobs(n_samples=n_samples, centers=n_classes, n_features=n_features, \
    cluster_std=0.5, random_state=0)

# No need to convert the features and targets to the 32-bit format as in plain theano.

# labels need to be one-hot encoded (binary vector of size N for N classes)
y = np_utils.to_categorical(y_int, n_classes)

In [37]:
# visualize the data for better understanding
def plot_2d_blobs(dataset):
    X, y = dataset
    axis('equal')
    scatter(X[:, 0], X[:, 1], c=y, alpha=0.1, edgecolors='none')

plot_2d_blobs((X, y_int))


Split the data into training and test set

No validation set since we won't tune any hyperparamers today.


In [38]:
# split the data into training, validation and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [39]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape


Out[39]:
((8000, 2), (2000, 2), (8000, 3), (2000, 3))

Create the model

Plain neural network with a single input, hidden and output layer.

The input size matches the number of our features. The output size matches the number of classes (due to one-hot encoding).

We chose 3 neurons in the hidden layer.


In [54]:
# the model is just a sequence of transformations - layer weights, activations, etc.
model = Sequential()
# weights from input to hidden layer - linear transform
model.add(Dense(3, input_dim=n_features))
# basic non-linearity
model.add(Activation("tanh"))
# weights from hidden to output layer
model.add(Dense(n_classes))
# nonlinearity suitable for a classifier
model.add(Activation("softmax"))

In [55]:
# - loss function suitable for multi-class classification
# - plain stochastic gradient descent with mini-batches
model.compile(loss='categorical_crossentropy', optimizer='sgd')

Train the model

We train the model in 5 epochs with mini-batches of size 32 using plain SGD.

The progress is nicely printed on the console. A nice thing over theanets is that the progressbar is overwritten and not just appending each row, this saves visual space and avoids cluttering.


In [57]:
model.fit(X_train, y_train, nb_epoch=5, batch_size=32);


Epoch 1/5
8000/8000 [==============================] - 0s - loss: 0.1550     
Epoch 2/5
8000/8000 [==============================] - 0s - loss: 0.1255     
Epoch 3/5
8000/8000 [==============================] - 0s - loss: 0.1054     
Epoch 4/5
8000/8000 [==============================] - 0s - loss: 0.0908     
Epoch 5/5
8000/8000 [==============================] - 0s - loss: 0.0799     

Evaluate the model

Since we have multi-class classification problem the basic metric is accuracy.

The keras model allows to compute it for us. Otherwise we can grab for sklearn.

Also while using the model for predictions the progress is printed.


In [58]:
def evaluate_accuracy(X, y, label):
    _, accuracy = model.evaluate(X_train, y_train, show_accuracy=True)
    print('training accuracy:', 100 * accuracy, '%')

evaluate_accuracy(X_train, X_train, 'training')
evaluate_accuracy(X_test, X_test, 'test')


8000/8000 [==============================] - 0s     
training accuracy: 99.6 %
8000/8000 [==============================] - 0s     
training accuracy: 99.6 %

In [50]:
y_test_pred = model.predict_classes(X_test)


2000/2000 [==============================] - 0s     

In [59]:
plot_2d_blobs((X_test, y_test_pred))


Conclusion

We were quite successful to separate the data point, even though the problem was intentionally made very simple.

The point is that Keras is a really nice library which can be set up very quickly, the models can be expressed via composition of narrowly focused modules and there are small but surprising practical details.

When I compare the experience to Torch that I've been learning recently, Torch, although being powerful and fast, has a lot to improve compared to Keras. It's just a first impression, but definitely Keras is quite pythonic and it's fun to use. Great job, Keras devs!

For more complete documentation and more advanced examples, please visit http://keras.io/.

Happy machine learning!