Chapter 6.2 - Understanding recurrent neural networks

Simple RNN

SimpleRNN layer takes input of shape (batch_size, timesteps, input_features)


In [2]:
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

In [3]:
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 32)                2080      
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________

Like all recurrent layers in Keras, SimpleRNN can be run in two different modes: it can return either the full sequences of successive outputs for each timestep (a 3D tensor of shape (batch_size, timesteps, output_features)), or it can return only the last output for each input sequence (a 2D tensor of shape (batch_size, output_features))


In [5]:
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, None, 32)          2080      
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________

Stacking multiple recurrent layers on top of each other can have benefits, like with convolutional neural networks.


In [8]:
model = Sequential()
model.add(Embedding(input_dim = 10000, 
                    output_dim = 32))
model.add(SimpleRNN(units = 32, 
                    return_sequences = True))
model.add(SimpleRNN(units = 32, 
                    return_sequences = True))
model.add(SimpleRNN(units = 32, 
                    return_sequences = True))
# The last layer returns only the last outputs
model.add(SimpleRNN(32))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_5 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_11 (SimpleRNN)    (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_12 (SimpleRNN)    (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_13 (SimpleRNN)    (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_14 (SimpleRNN)    (None, 32)                2080      
=================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________

IMDB example


In [9]:
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences

In [10]:
# Number of words to be used as features
max_features = 10000 
# Cutting the review after this number of words
maxlen = 500  
batch_size = 32

In [12]:
print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')

print('Pad sequences (samples x time)')
input_train = pad_sequences(input_train, maxlen=maxlen)
input_test = pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)


Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
input_train shape: (25000, 500)
input_test shape: (25000, 500)

In [13]:
from keras.layers import Dense

model = Sequential()
model.add(Embedding(input_dim = max_features, 
                    output_dim = 32))
model.add(SimpleRNN(units = 32))
model.add(Dense(units = 1, 
                activation='sigmoid'))

model.compile(optimizer = 'rmsprop', 
              loss = 'binary_crossentropy', 
              metrics = ['acc'])
history = model.fit(x = input_train, 
                    y = y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)


Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 47s 2ms/step - loss: 0.6472 - acc: 0.6102 - val_loss: 0.5290 - val_acc: 0.7522
Epoch 2/10
20000/20000 [==============================] - 41s 2ms/step - loss: 0.4153 - acc: 0.8217 - val_loss: 0.3960 - val_acc: 0.8328
Epoch 3/10
20000/20000 [==============================] - 42s 2ms/step - loss: 0.3040 - acc: 0.8766 - val_loss: 0.4501 - val_acc: 0.7864
Epoch 4/10
20000/20000 [==============================] - 43s 2ms/step - loss: 0.2303 - acc: 0.9107 - val_loss: 0.3894 - val_acc: 0.8364
Epoch 5/10
20000/20000 [==============================] - 43s 2ms/step - loss: 0.1805 - acc: 0.9346 - val_loss: 0.4967 - val_acc: 0.8336
Epoch 6/10
20000/20000 [==============================] - 45s 2ms/step - loss: 0.1221 - acc: 0.9576 - val_loss: 0.4190 - val_acc: 0.8396
Epoch 7/10
20000/20000 [==============================] - 41s 2ms/step - loss: 0.0786 - acc: 0.9742 - val_loss: 0.4795 - val_acc: 0.8302
Epoch 8/10
20000/20000 [==============================] - 41s 2ms/step - loss: 0.0538 - acc: 0.9834 - val_loss: 0.5335 - val_acc: 0.8164
Epoch 9/10
20000/20000 [==============================] - 46s 2ms/step - loss: 0.0324 - acc: 0.9903 - val_loss: 0.5256 - val_acc: 0.8404
Epoch 10/10
20000/20000 [==============================] - 42s 2ms/step - loss: 0.0262 - acc: 0.9929 - val_loss: 0.6446 - val_acc: 0.8218

Visualizing the results


In [14]:
import matplotlib.pyplot as plt

In [15]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


LSTM


In [16]:
from keras.layers import LSTM

model = Sequential()
model.add(Embedding(input_dim = max_features, 
                    output_dim = 32))
model.add(LSTM(units = 32))
model.add(Dense(units = 1, 
                activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop',
              loss = 'binary_crossentropy',
              metrics = ['acc'])
history = model.fit(x = input_train, 
                    y = y_train,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)


Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 195s 10ms/step - loss: 0.5080 - acc: 0.7631 - val_loss: 0.3552 - val_acc: 0.8706
Epoch 2/10
20000/20000 [==============================] - 190s 9ms/step - loss: 0.2900 - acc: 0.8862 - val_loss: 0.3028 - val_acc: 0.8766
Epoch 3/10
20000/20000 [==============================] - 183s 9ms/step - loss: 0.2338 - acc: 0.9091 - val_loss: 0.3408 - val_acc: 0.8818
Epoch 4/10
20000/20000 [==============================] - 183s 9ms/step - loss: 0.1981 - acc: 0.9250 - val_loss: 0.4363 - val_acc: 0.8672
Epoch 5/10
20000/20000 [==============================] - 198s 10ms/step - loss: 0.1745 - acc: 0.9362 - val_loss: 0.2977 - val_acc: 0.8868
Epoch 6/10
20000/20000 [==============================] - 200s 10ms/step - loss: 0.1536 - acc: 0.9438 - val_loss: 0.7284 - val_acc: 0.8106
Epoch 7/10
20000/20000 [==============================] - 211s 11ms/step - loss: 0.1431 - acc: 0.9496 - val_loss: 0.3726 - val_acc: 0.8802
Epoch 8/10
20000/20000 [==============================] - 207s 10ms/step - loss: 0.1306 - acc: 0.9526 - val_loss: 0.3197 - val_acc: 0.8816
Epoch 9/10
20000/20000 [==============================] - 198s 10ms/step - loss: 0.1154 - acc: 0.9594 - val_loss: 0.3865 - val_acc: 0.8580
Epoch 10/10
20000/20000 [==============================] - 206s 10ms/step - loss: 0.1094 - acc: 0.9627 - val_loss: 0.3684 - val_acc: 0.8824

Visualizing the results


In [18]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


Predictably, LSTM seems to handle longer sequence better than SimpleRNN, but it is much slower.

No free lunch theorem in practise.