IMDB Sentiment Classification

This scripts reads the IMDB Movie Review Sentiment dataset: http://keras.io/datasets/#imdb-movie-reviews-sentiment-classification

Neural Network architecture:

  • Zero-pad (or truncate) input documents to maxlen word indices
  • Word embeddings lookup (random initialization)
  • Dropout layer
  • Convolutional with max-over-time layer (custom implementation, not provided by Keras)
  • Dense hidden layer
  • Dropout layer
  • Sigmoid-Neural for binary classifacation (positive / negative)

Accuracy after 3 epochs: 0.8616


In [1]:
from __future__ import absolute_import
from __future__ import print_function
import numpy as np
from ConvolutionalMaxOverTime import ConvolutionalMaxOverTime
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.optimizers import RMSprop
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Convolution1D, MaxPooling1D
from keras.datasets import imdb



# set parameters:
max_features = 5000
maxlen = 200
batch_size = 32
embedding_dims = 100
nb_filter = 100
hidden_dims = 250
nb_epoch = 5

print("Loading data...")
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features,
                                                      test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()

# we start off with an efficient embedding layer which maps
# our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Dropout(0.25))

#Max over time layer
model.add(ConvolutionalMaxOverTime(nb_filter, activation='relu'))

# hidden layer:
model.add(Dense(hidden_dims, activation='relu')) 
model.add(Dropout(0.25))

# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1, activation='sigmoid'))


model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=True, show_accuracy=True, validation_data=(X_test, y_test))


Couldn't import dot_parser, loading of dot files will not be possible.
Loading data...
20000 train sequences
5000 test sequences
Pad sequences (samples x time)
X_train shape: (20000, 200)
X_test shape: (5000, 200)
Build model...
Train on 20000 samples, validate on 5000 samples
Epoch 1/5
20000/20000 [==============================] - 32s - loss: 0.4275 - acc: 0.7942 - val_loss: 0.3481 - val_acc: 0.8492
Epoch 2/5
20000/20000 [==============================] - 32s - loss: 0.2988 - acc: 0.8742 - val_loss: 0.3312 - val_acc: 0.8566
Epoch 3/5
20000/20000 [==============================] - 36s - loss: 0.2559 - acc: 0.8956 - val_loss: 0.3312 - val_acc: 0.8616
Epoch 4/5
20000/20000 [==============================] - 36s - loss: 0.2247 - acc: 0.9084 - val_loss: 0.3343 - val_acc: 0.8572
Epoch 5/5
20000/20000 [==============================] - 35s - loss: 0.1928 - acc: 0.9249 - val_loss: 0.3731 - val_acc: 0.8588
Out[1]:
<keras.callbacks.History at 0x1385ee50>

In [ ]:


In [ ]: