Word embeddings

Import various modules that we need for this notebook (now using Keras 1.0.0)


In [3]:
%pylab inline

import copy

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.datasets import imdb
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, RMSprop
from keras.utils import np_utils
from keras.layers.convolutional import Convolution1D, MaxPooling1D, ZeroPadding1D, AveragePooling1D
from keras.callbacks import EarlyStopping
from keras.layers.normalization import BatchNormalization
from keras.preprocessing import sequence
from keras.layers.embeddings import Embedding


Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['copy']
`%matplotlib` prevents importing * from pylab and numpy

Load the MNIST dataset, flatten the images, convert the class labels, and scale the data.

I. Example using word embedding

We read in the IMDB dataset, removing the 25 most common words, and using the next 500 most commonly used terms.


In [4]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(path="imdb.pkl", nb_words=500,
                                                      skip_top=25, maxlen=100, test_split=0.2)
X_train = sequence.pad_sequences(X_train, maxlen=100)
X_test = sequence.pad_sequences(X_test, maxlen=100)

In [5]:
model = Sequential()

model.add(Embedding(500, 25, input_length=100))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(256))
model.add(Dropout(0.25))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode='binary')


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-ed320527dfcd> in <module>()
      1 model = Sequential()
      2 
----> 3 model.add(Embedding(500, 25, input_length=100))
      4 model.add(Dropout(0.25))
      5 

NameError: name 'Embedding' is not defined

In [ ]:
model.fit(X_train, y_train, batch_size=batch_size,
          nb_epoch=nb_epoch, show_accuracy=True,
          validation_data=(X_test, y_test))