In [1]:
%matplotlib inline
from __future__ import absolute_import
from __future__ import print_function
# import local library
import tools
import nnlstm
# import library to build the neural network
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.optimizers import Adam
In [3]:
#%install_ext https://raw.githubusercontent.com/rasbt/watermark/master/watermark.py
%load_ext watermark
# for reproducibility
%watermark -a 'Paul Willot' -mvp numpy,scipy,keras
Let's gather the datas from the previous notebook
In [2]:
X_train, y_train, X_test, y_test, feature_names,max_features, classes_names, vectorizer = tools.load_pickle("data/unpadded_4_BacObjMetCon.pickle")
and pad each vector to a regular size (necessary for the sequence processing)
In [3]:
X_train, X_test, y_train, y_test = nnlstm.pad_sequence(X_train, X_test, y_train, y_test, maxlen=100)
Or directly get a bigger training and testing set:
In [18]:
X_train, y_train, X_test, y_test, feature_names, max_features, classes_names, vectorizer = tools.load_pickle("/Users/meat/Documents/NII/data/training_4_BacObjMetCon.pickle")
Our data look like this:
In [8]:
X_train[0][:100]
Out[8]:
In [9]:
# one-hot vector for the 4 different labels
y_train[0]
Out[9]:
We use the Keras library, build on Theano.
Here I choose a very simple architecture because of my low performance system (with no graphic card), but of course feel free to try any. Especially, stacking layers of LSTM could improve performance, according to this paper from Karpathy (which I used a lot as reference)
In [19]:
%%time
# take approximately 50s to build
dim_out = len(classes_names)
net = Sequential()
net.add(Embedding(max_features, 16))
net.add(LSTM(16, 16))
net.add(Dense(16, dim_out))
net.add(Dropout(0.5))
net.add(Activation('softmax'))
net.compile(loss='categorical_crossentropy', optimizer='adam', class_mode="categorical")
Training on a small subset
In [20]:
batch_size = 100
length_train = 15000 # length of the reduced training set (can put to -1 for all)
length_test = 5000 # length of the reduced testing set (can put to -1 for all)
nb_epoch = 10
patience = 2 # when to apply early stopping, if necessary
history = nnlstm.train_network(net,
X_train[:length_train],
y_train[:length_train],
X_test[:length_test],
y_test[:length_test],
nb_epoch,
batch_size=batch_size,
path_save="weights",
patience=patience)
The weights are saved at each epoch, and you can load 'best' for the epoch with the higher (accuracy * (loss/10))
In [21]:
net.load_weights("weights/best.hdf5")
In [22]:
nnlstm.show_history(history)
In [23]:
nnlstm.evaluate_network(net, X_test[:length_test], y_test[:length_test], classes_names, length=-1)
It is interesting to note the confusion of the network between Background and Objective, as the difference between these two labels is indeed usually quite thin.
There is a lot of possible improvments, from simply trying new features to implement transfer training...
For an other nice and more complete example of LSTM neural network usage, you can look at Karpathy's blogpost
Hopefully this notebook was helpful for you in some way. If there is some issues with this repo feel free to submit an issue or give me a comment, I'm always glad to have some feedback!
Paul Willot