ConvNet-LSTM Stack

In this notebook, we stack an LSTM on top of a convolutional layer to classify IMDB movie reviews by their sentiment.

Load dependencies


In [1]:
import keras
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, SpatialDropout1D, LSTM
from keras.layers.wrappers import Bidirectional 
from keras.layers import Conv1D, MaxPooling1D 
from keras.callbacks import ModelCheckpoint
import os
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt 
%matplotlib inline


Using TensorFlow backend.

Set hyperparameters


In [2]:
# output directory name:
output_dir = 'model_output/cnnLSTM'

# training:
epochs = 4
batch_size = 128

# vector-space embedding: 
n_dim = 64 
n_unique_words = 10000 
max_review_length = 200 
pad_type = trunc_type = 'pre'
drop_embed = 0.2 

# convolutional layer architecture:
n_conv = 64  
k_conv = 3 
mp_size = 4

# LSTM layer architecture:
n_lstm = 64 
drop_lstm = 0.2

Load data


In [3]:
(x_train, y_train), (x_valid, y_valid) = imdb.load_data(num_words=n_unique_words)

Preprocess data


In [4]:
x_train = pad_sequences(x_train, maxlen=max_review_length, padding=pad_type, truncating=trunc_type, value=0)
x_valid = pad_sequences(x_valid, maxlen=max_review_length, padding=pad_type, truncating=trunc_type, value=0)

Design neural network architecture


In [5]:
model = Sequential()
model.add(Embedding(n_unique_words, n_dim, input_length=max_review_length)) 
model.add(SpatialDropout1D(drop_embed))
model.add(Conv1D(n_conv, k_conv, activation='relu'))
model.add(MaxPooling1D(mp_size))
model.add(Bidirectional(LSTM(n_lstm, dropout=drop_lstm)))
model.add(Dense(1, activation='sigmoid'))

In [6]:
# LSTM layer parameters double due to both reading directions
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 200, 64)           640000    
_________________________________________________________________
spatial_dropout1d_1 (Spatial (None, 200, 64)           0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 198, 64)           12352     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 49, 64)            0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               66048     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
Total params: 718,529
Trainable params: 718,529
Non-trainable params: 0
_________________________________________________________________

Configure model


In [7]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [8]:
modelcheckpoint = ModelCheckpoint(filepath=output_dir+"/weights.{epoch:02d}.hdf5")
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

Train!


In [9]:
# 87.4% validation accuracy in epoch 1
# slightly higher than bidirectional LSTM and about the same as stacked biLSTM
# but epochs are a third as long, or one-sixth as long, respectively
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_valid, y_valid), callbacks=[modelcheckpoint])


Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 63s - loss: 0.4660 - acc: 0.7604 - val_loss: 0.3090 - val_acc: 0.8739
Epoch 2/4
25000/25000 [==============================] - 35s - loss: 0.2406 - acc: 0.9045 - val_loss: 0.3007 - val_acc: 0.8709
Epoch 3/4
25000/25000 [==============================] - 33s - loss: 0.1774 - acc: 0.9350 - val_loss: 0.3929 - val_acc: 0.8492
Epoch 4/4
25000/25000 [==============================] - 33s - loss: 0.1314 - acc: 0.9544 - val_loss: 0.3887 - val_acc: 0.8527
Out[9]:
<keras.callbacks.History at 0x7f983e8e4588>

Evaluate


In [10]:
model.load_weights(output_dir+"/weights.00.hdf5") # zero-indexed

In [11]:
y_hat = model.predict_proba(x_valid)


25000/25000 [==============================] - 29s    

In [12]:
plt.hist(y_hat)
_ = plt.axvline(x=0.5, color='orange')



In [13]:
"{:0.2f}".format(roc_auc_score(y_valid, y_hat)*100.0)


Out[13]:
'94.61'

In [ ]: