Getting started with keras

This tutorial is inspired from https://keras.io

Sequential model

Keras uses slightly different approach for initializing and defining layers. This approach is called Sequential model. Sequential model is a linear stack of several layers of neural network to be designed. So to defining each and every layer in the neural network we use Sequential class. This can be done in two different ways as shown below.



In [144]:

    
#
# Import required packages
#
from keras.models import Sequential
from keras.layers import Dense, Activation



In [145]:

    
from IPython.display import display, Image
import matplotlib.pyplot as plt
%matplotlib inline
import random

Either define entire neural network inside the constructor of the Sequential class as below,



In [146]:

    
#
# Network model can be initialized using following syntax in the constructor itself
#
model1 = Sequential([
    Dense(32,input_dim=784),
    Activation("relu"),
    Dense(10),
    Activation("softmax")
])

Or add layers to the network one by one as per convinience.



In [147]:

    
#
# Layers to the network can be added dynamically
#
model2 = Sequential()
model2.add(Dense(32, input_dim=784))
model2.add(Activation('relu'))
model2.add(Dense(10))
model2.add(Activation('softmax'))

The model needs to know what input shape it should expect i.e whether input is 28x28 (746 pixels) image or some numeric text or some other size features.

For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference from the shape of previous layers) needs to receive information about its input shape hence first model.add function has extra argument of input_dim.

There are several possible ways to do this:

-- pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.

e.g. input_shape=(784,10) -> neural network shall have 10 inputs of 784 length each

 input_shape=(784,) or input_shape=(784,None) -> neural network shall have any positive number of inputs with 784 length each

-- pass instead a batch_input_shape argument, where the batch dimension is included. This is useful for specifying a fixed batch size (e.g. with stateful RNNs).

-- some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.

As such, the following three snippets are strictly equivalent:



In [148]:

    
model1 = Sequential()
model1.add(Dense(32, input_shape=(784,)))



In [149]:

    
model2 = Sequential()
model2.add(Dense(32, batch_input_shape=(None, 784)))
# note that batch dimension is "None" here,
# so the model will be able to process batches of any size with each input of length 784.



In [150]:

    
model3 = Sequential()
model3.add(Dense(32, input_dim=784))

Note that input_dim=784 is same as input_shape=(784,)

The Merge layer

Multiple Sequential instances can be merged into a single output via a Merge layer. The output is a layer that can be added as first layer in a new Sequential model. For instance, here's a model with two separate input branches getting merged:



In [151]:

    
Image("keras_examples/keras_merge.png")









    Out[151]:



In [152]:

    
from keras.layers import Merge

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(10, activation='softmax'))

Such a two-branch model can then be trained via e.g.:



In [ ]:

    
final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
final_model.fit([input_data_1, input_data_2], targets)  # we pass one data array per model input

The Merge layer supports a number of pre-defined modes:

sum (default): element-wise sum
concat: tensor concatenation. You can specify the concatenation axis via the argument concat_axis.
mul: element-wise multiplication
ave: tensor average
dot: dot product. You can specify which axes to reduce along via the argument dot_axes.
cos: cosine proximity between vectors in 2D tensors.

You can also pass a function as the mode argument, allowing for arbitrary transformations:



In [ ]:

    
merged = Merge([left_branch, right_branch], mode=lambda x: x[0] - x[1])

Now you know enough to be able to define almost any model with Keras. For complex models that cannot be expressed via Sequential and Merge, you can use the functional API.

Compilation

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

an optimizer, it is a type of optimizer to be used e.g. gradient descent. This could be the string identifier of an existing optimizer (such as rmsprop or adagrad), or an instance of the Optimizer class. See: optimizers.

a loss function, it is an error function to be optimized e.g. squered error function or cross-entropy function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: objectives.

a list of metrics, to evaluate performance of the network. For any classification problem you will want to set this to metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function. Custom metric function should return either a single tensor value or a dict metric_name -> metric_value. See: metrics.



In [ ]:

    
# for a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# for a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# for a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# for custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

def false_rates(y_true, y_pred):
    false_neg = ...
    false_pos = ...
    return {
        'false_neg': false_neg,
        'false_pos': false_pos,
    }

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred, false_rates])

Training

Keras models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the fit function. Read its documentation here.



In [153]:

    
# for a single-input model with 2 classes (binary):

model = Sequential()
model.add(Dense(1, input_dim=784, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
data = np.random.random((1000, 784))
labels = np.random.randint(2, size=(1000, 1))

# train the model, iterating on the data in batches
# of 32 samples
model.fit(data, labels, nb_epoch=10, batch_size=32)









    



Epoch 1/10
1000/1000 [==============================] - 0s - loss: 0.7246 - acc: 0.5100      
Epoch 2/10
1000/1000 [==============================] - 0s - loss: 0.7144 - acc: 0.5220     
Epoch 3/10
1000/1000 [==============================] - 0s - loss: 0.7026 - acc: 0.5330     
Epoch 4/10
1000/1000 [==============================] - 0s - loss: 0.6950 - acc: 0.5410     
Epoch 5/10
1000/1000 [==============================] - 0s - loss: 0.6854 - acc: 0.5560     
Epoch 6/10
1000/1000 [==============================] - 0s - loss: 0.6762 - acc: 0.5790     
Epoch 7/10
1000/1000 [==============================] - 0s - loss: 0.6720 - acc: 0.5780     
Epoch 8/10
1000/1000 [==============================] - 0s - loss: 0.6609 - acc: 0.5960     
Epoch 9/10
1000/1000 [==============================] - 0s - loss: 0.6531 - acc: 0.6160     
Epoch 10/10
1000/1000 [==============================] - 0s - loss: 0.6497 - acc: 0.6350     






    Out[153]:





<keras.callbacks.History at 0x7f13dbcf2f90>



In [ ]:

    
# for a multi-input model with 10 classes:

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

model = Sequential()
model.add(merged)
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
from keras.utils.np_utils import to_categorical
data_1 = np.random.random((1000, 784))
data_2 = np.random.random((1000, 784))

# these are integers between 0 and 9
labels = np.random.randint(10, size=(1000, 1))
# we convert the labels to a binary matrix of size (1000, 10)
# for use with categorical_crossentropy
labels = to_categorical(labels, 10)

# train the model
# note that we are passing a list of Numpy arrays as training data
# since the model has 2 inputs
model.fit([data_1, data_2], labels, nb_epoch=10, batch_size=32)

Example

Following is an example implementation of multi-layer perceptron on MNIST data set

First initialize all the libraries rerquired



In [155]:

    
# %load mnist_mlp.py
'''Trains a simple deep NN on the MNIST dataset.

Gets to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils

Simple function to display testdata with prediction results on the test dataset



In [156]:

    
def show_prediction_results(X_test,predicted_labels):
    for i,j in enumerate(random.sample(range(len(X_test)),10)):
        plt.subplot(5,2,i+1)
        plt.axis("off")
        plt.title("Predicted labels is "+str(np.argmax(predicted_labels[j])))
        plt.imshow(X_test[j].reshape(28,28))

Generating and structuring dataset for training and testing. We will be using 28x28 images from MNIST dataset of about 60000 for training and 10000 for testing. We will use batch size of 128, for classifying 10 numbers in the images. For small computations 20 epochs are used to these can be increased for more accuracy.



In [157]:

    
batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)









    



60000 train samples
10000 test samples

Start building Sequiential model in keras. We will use 3 layer MLP model for modelling the dataset.



In [158]:

    
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))



In [159]:

    
model.summary()









    



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_27 (Dense)                 (None, 512)           401920      dense_input_14[0][0]             
____________________________________________________________________________________________________
activation_20 (Activation)       (None, 512)           0           dense_27[0][0]                   
____________________________________________________________________________________________________
dropout_11 (Dropout)             (None, 512)           0           activation_20[0][0]              
____________________________________________________________________________________________________
dense_28 (Dense)                 (None, 512)           262656      dropout_11[0][0]                 
____________________________________________________________________________________________________
activation_21 (Activation)       (None, 512)           0           dense_28[0][0]                   
____________________________________________________________________________________________________
dropout_12 (Dropout)             (None, 512)           0           activation_21[0][0]              
____________________________________________________________________________________________________
dense_29 (Dense)                 (None, 10)            5130        dropout_12[0][0]                 
____________________________________________________________________________________________________
activation_22 (Activation)       (None, 10)            0           dense_29[0][0]                   
====================================================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
____________________________________________________________________________________________________

Compiling model is configuring model with performance parameters such as loss function. metric and optimizer



In [160]:

    
model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

fit function for model fits the training data to neural network configured before



In [161]:

    
history = model.fit(X_train, Y_train,
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=0, validation_data=(X_test, Y_test))



In [162]:

    
# Let's save the model in local file to fetch at later point in time to skip computations
# and directly start testing if need be
model.save_weights('mnist_mlp.hdf5')
with open('mnist_mlp.json', 'w') as f:
    f.write(model.to_json())

predict function for model predicts labels or values for the testing data provided



In [163]:

    
predicted_labels = model.predict(X_test,verbose=0)
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])









    



Test score: 0.12476406681
Test accuracy: 0.9837



In [164]:

    
# Let's visualize some results randomly picked from testdata set and predicted labels for them
#
show_prediction_results(X_test,predicted_labels)