Wayne Nixalo - 2017-Jun-12 17:27

Code-Along of Lesson 5 JNB.

Lesson 5 NB: https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson5.ipynb

Lecture


In [1]:
import theano


/home/wnixalo/miniconda3/envs/FAI/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX 870M (0000:01:00.0)

In [2]:
%matplotlib inline

import sys, os
sys.path.insert(1, os.path.join('utils'))

import utils; reload(utils)
from utils import *
from __future__ import division, print_function


Using Theano backend.

In [3]:
model_path = 'data/imdb/models/'
%mkdir -p $model_path # -p : make intermediate directories as needed

Setup data

We're going to look at the IMDB dataset, which contains movie reviews from IMDB, along with their sentiment. Keras comes with some helpers for this dataset.


In [4]:
from keras.datasets import imdb
idx = imdb.get_word_index()


Downloading data from https://s3.amazonaws.com/text-datasets/imdb_word_index.pkl

This is the word list:


In [5]:
idx_arr = sorted(idx, key=idx.get)
idx_arr[:10]


Out[5]:
['the', 'and', 'a', 'of', 'to', 'is', 'br', 'in', 'it', 'i']

...and this is the mapping from id to word:


In [6]:
idx2word = {v: k for k, v in idx.iteritems()}

We download the reviews using code copied from keras.datasets:


In [7]:
# getting the dataset directly bc keras's versn makes some changes
path = get_file('imdb_full.pkl',
                origin='https://s3.amazonaws.com/text-datasets/imdb_full.pkl',
                md5_hash='d091312047c43cf9e4e38fef92437263')
f = open(path, 'rb')
(x_train, labels_train), (x_test, labels_test) = pickle.load(f)


Downloading data from https://s3.amazonaws.com/text-datasets/imdb_full.pkl

In [8]:
# apparently cpickle can be x1000 faster than pickle? hmm
len(x_train)


Out[8]:
25000

Here's the 1st review. As you see, the words have been replaced by ids. The ids can be looked up in idx2word.


In [9]:
', '.join(map(str, x_train[0]))


Out[9]:
'23022, 309, 6, 3, 1069, 209, 9, 2175, 30, 1, 169, 55, 14, 46, 82, 5869, 41, 393, 110, 138, 14, 5359, 58, 4477, 150, 8, 1, 5032, 5948, 482, 69, 5, 261, 12, 23022, 73935, 2003, 6, 73, 2436, 5, 632, 71, 6, 5359, 1, 25279, 5, 2004, 10471, 1, 5941, 1534, 34, 67, 64, 205, 140, 65, 1232, 63526, 21145, 1, 49265, 4, 1, 223, 901, 29, 3024, 69, 4, 1, 5863, 10, 694, 2, 65, 1534, 51, 10, 216, 1, 387, 8, 60, 3, 1472, 3724, 802, 5, 3521, 177, 1, 393, 10, 1238, 14030, 30, 309, 3, 353, 344, 2989, 143, 130, 5, 7804, 28, 4, 126, 5359, 1472, 2375, 5, 23022, 309, 10, 532, 12, 108, 1470, 4, 58, 556, 101, 12, 23022, 309, 6, 227, 4187, 48, 3, 2237, 12, 9, 215'

The first word of the first review is 23022. Let's see what that is.


In [10]:
idx2word[23022]


Out[10]:
'bromwell'

In [11]:
x_train[0]


Out[11]:
[23022,
 309,
 6,
 3,
 1069,
 209,
 9,
 2175,
 30,
 1,
 169,
 55,
 14,
 46,
 82,
 5869,
 41,
 393,
 110,
 138,
 14,
 5359,
 58,
 4477,
 150,
 8,
 1,
 5032,
 5948,
 482,
 69,
 5,
 261,
 12,
 23022,
 73935,
 2003,
 6,
 73,
 2436,
 5,
 632,
 71,
 6,
 5359,
 1,
 25279,
 5,
 2004,
 10471,
 1,
 5941,
 1534,
 34,
 67,
 64,
 205,
 140,
 65,
 1232,
 63526,
 21145,
 1,
 49265,
 4,
 1,
 223,
 901,
 29,
 3024,
 69,
 4,
 1,
 5863,
 10,
 694,
 2,
 65,
 1534,
 51,
 10,
 216,
 1,
 387,
 8,
 60,
 3,
 1472,
 3724,
 802,
 5,
 3521,
 177,
 1,
 393,
 10,
 1238,
 14030,
 30,
 309,
 3,
 353,
 344,
 2989,
 143,
 130,
 5,
 7804,
 28,
 4,
 126,
 5359,
 1472,
 2375,
 5,
 23022,
 309,
 10,
 532,
 12,
 108,
 1470,
 4,
 58,
 556,
 101,
 12,
 23022,
 309,
 6,
 227,
 4187,
 48,
 3,
 2237,
 12,
 9,
 215]

Here's the whole review, mapped from ids to words.


In [12]:
' '.join([idx2word[o] for o in x_train[0]])


Out[12]:
"bromwell high is a cartoon comedy it ran at the same time as some other programs about school life such as teachers my 35 years in the teaching profession lead me to believe that bromwell high's satire is much closer to reality than is teachers the scramble to survive financially the insightful students who can see right through their pathetic teachers' pomp the pettiness of the whole situation all remind me of the schools i knew and their students when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled at high a classic line inspector i'm here to sack one of your teachers student welcome to bromwell high i expect that many adults of my age think that bromwell high is far fetched what a pity that it isn't"

The labels are 1 for positive, 0 for negative


In [13]:
labels_train[:10]


Out[13]:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Reduce vocabulary size by setting rare words to max index.


In [14]:
vocab_size = 5000

trn  = [np.array([i if i < vocab_size-1 else vocab_size-1 for i in s]) for s in x_train]
test = [np.array([i if i < vocab_size-1 else vocab_size-1 for i in s]) for s in x_test]

Look at distribution of lengths of sentences


In [15]:
lens = np.array(map(len, trn))
(lens.max(), lens.min(), lens.mean())


Out[15]:
(2493, 10, 237.71364)

Pad (with zero) or truncate each sentence to make consistent length.


In [16]:
seq_len = 500

# keras.preprocessing.sequence
trn = sequence.pad_sequences(trn, maxlen=seq_len, value=0)
test = sequence.pad_sequences(test, maxlen=seq_len, value=0)

This results in nice rectangular matrices that can be passed to ML algorithms. Reviews shorter than 500 words are prepadded with zeros, those greater are truncated.


In [17]:
trn.shape


Out[17]:
(25000, 500)

In [ ]:
trn[0]

Create simple models

Single hidden layer NN

This simplest model that tends to give reasonable results is a single hidden layer net. So let's try that. Note that we can't expect to get any useful results by feeding word ids directly into a neural net - so instead we use an embedding to replace them with a vector of 32 (initially random) floats for each word in the vocab.


In [18]:
model = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])

In [19]:
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_1 (Embedding)          (None, 500, 32)       160000      embedding_input_1[0][0]          
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 16000)         0           embedding_1[0][0]                
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 100)           1600100     flatten_1[0][0]                  
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 100)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             101         dropout_1[0][0]                  
====================================================================================================
Total params: 1,760,201
Trainable params: 1,760,201
Non-trainable params: 0
____________________________________________________________________________________________________

In [21]:
# model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 17s - loss: 0.4679 - acc: 0.7480 - val_loss: 0.3213 - val_acc: 0.8592
Epoch 2/2
25000/25000 [==============================] - 16s - loss: 0.2015 - acc: 0.9251 - val_loss: 0.3033 - val_acc: 0.8748
Out[21]:
<keras.callbacks.History at 0x128cc6c90>

In [21]:
# redoing on Linux
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 9s - loss: 0.4612 - acc: 0.7551 - val_loss: 0.3033 - val_acc: 0.8702
Epoch 2/2
25000/25000 [==============================] - 9s - loss: 0.2043 - acc: 0.9234 - val_loss: 0.2920 - val_acc: 0.8764
Out[21]:
<keras.callbacks.History at 0x7f52258289d0>

The Stanford paper that this dataset is from cites a state of the art accuacy (without unlabelled data) of 0.883. So we're short of that, but on the right track.

Single Conv layer with Max Pooling

A CNN is likely to work better, since it's designed to take advantage of ordered data. We'll need to use a 1D CNN, since a sequence of words is 1D.


In [22]:
# the embedding layer is always the first step in every NLP model
# --> after that layer, you don't have words anymore: vectors
conv1 = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len, dropout=0.2),
    Dropout(0.2),
    Convolution1D(64, 5, border_mode='same', activation='relu'),
    Dropout(0.2),
    MaxPooling1D(),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])

In [31]:
conv1.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_2 (Embedding)          (None, 500, 32)       160000      embedding_input_2[0][0]          
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 500, 32)       0           embedding_2[0][0]                
____________________________________________________________________________________________________
convolution1d_1 (Convolution1D)  (None, 500, 64)       10304       dropout_2[0][0]                  
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 500, 64)       0           convolution1d_1[0][0]            
____________________________________________________________________________________________________
maxpooling1d_1 (MaxPooling1D)    (None, 250, 64)       0           dropout_3[0][0]                  
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 16000)         0           maxpooling1d_1[0][0]             
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 100)           1600100     flatten_2[0][0]                  
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 100)           0           dense_3[0][0]                    
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 1)             101         dropout_4[0][0]                  
====================================================================================================
Total params: 1,770,505
Trainable params: 1,770,505
Non-trainable params: 0
____________________________________________________________________________________________________

In [23]:
conv1.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

In [24]:
# conv1.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 207s - loss: 0.5067 - acc: 0.7100 - val_loss: 0.2949 - val_acc: 0.8857
Epoch 2/4
25000/25000 [==============================] - 225s - loss: 0.2904 - acc: 0.8846 - val_loss: 0.2652 - val_acc: 0.8911
Epoch 3/4
25000/25000 [==============================] - 245s - loss: 0.2568 - acc: 0.9006 - val_loss: 0.2599 - val_acc: 0.8903
Epoch 4/4
25000/25000 [==============================] - 216s - loss: 0.2382 - acc: 0.9060 - val_loss: 0.2580 - val_acc: 0.8944
Out[24]:
<keras.callbacks.History at 0x1313ab390>

In [24]:
# redoing on Linux w/ GPU
conv1.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 26s - loss: 0.5557 - acc: 0.6994 - val_loss: 0.4454 - val_acc: 0.7932
Epoch 2/4
25000/25000 [==============================] - 25s - loss: 0.4152 - acc: 0.8196 - val_loss: 0.4297 - val_acc: 0.8046
Epoch 3/4
25000/25000 [==============================] - 25s - loss: 0.3636 - acc: 0.8481 - val_loss: 0.4245 - val_acc: 0.8134
Epoch 4/4
25000/25000 [==============================] - 25s - loss: 0.3153 - acc: 0.8737 - val_loss: 0.3618 - val_acc: 0.8442
Out[24]:
<keras.callbacks.History at 0x7f521bbeeb10>

That's well past the Stanford paper's accuracy - another win for CNNs!

Heh, the above take a lot longer than 4s on my Mac


In [25]:
conv1.save_weights(model_path + 'conv1.h5')
# conv1.load_weights(model_path + 'conv1.h5')

Pre-trained Vectors

You may want to look at wordvectors.ipynb before moving on.

In this section, we replicate the previous CNN, but using pre-trained embeddings.


In [26]:
def get_glove_dataset(dataset):
    """Download the requested glove dataset from files.fast.ai
    and return a location that can be passed to load_vectors.
    """
    # see wordvectors.ipynb for info on how these files were 
    # generated from the original glove data.
    md5sums = {'6B.50d' : '8e1557d1228decbda7db6dfd81cd9909',
               '6B.100d': 'c92dbbeacde2b0384a43014885a60b2c',
               '6B.200d': 'af271b46c04b0b2e41a84d8cd806178d',
               '6B.300d': '30290210376887dcc6d0a5a6374d8255'}
    glove_path = os.path.abspath('data/glove.6B/results')
    %mkdir -p $glove_path
    return get_file(dataset, 
                    'https://files.fast.ai/models/glove/' + dataset + '.tgz', 
                    cache_subdir=glove_path,
                    md5_hash=md5sums.get(dataset, None),
                    untar=True)

# not able to download from above, so using code from wordvectors_CodeAlong.ipynb to load
def get_glove(name):
    with open(path+ 'glove.' + name + '.txt', 'r') as f: lines = [line.split() for line in f]
    words = [d[0] for d in lines]
    vecs = np.stack(np.array(d[1:], dtype=np.float32) for d in lines)
    wordidx = {o:i for i,o in enumerate(words)}
    save_array(res_path+name+'.dat', vecs)
    pickle.dump(words, open(res_path+name+'_words.pkl','wb'))
    pickle.dump(wordidx, open(res_path+name+'_idx.pkl','wb'))
#   # adding return filename
#     return res_path + name + '.dat'
    
def load_glove(loc):
    return (load_array(loc + '.dat'),
        pickle.load(open(loc + '_words.pkl', 'rb')),
        pickle.load(open(loc + '_idx.pkl', 'rb')))

In [27]:
def load_vectors(loc):
    return (load_array(loc + '.dat'),
        pickle.load(open(loc + '_words.pkl', 'rb')),
        pickle.load(open(loc + '_idx.pkl', 'rb')))
# apparently pickle is a `bit-serializer` or smth like that?

In [32]:
# this isn't working, so instead..
vecs, words, wordidx = load_vectors(get_glove_dataset('6B.50d'))


Downloading data from https://files.fast.ai/models/glove/6B.50d.tgz
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-32-bdad4353bb30> in <module>()
----> 1 vecs, words, wordidx = load_vectors(get_glove_dataset('6B.50d'))

<ipython-input-25-4e677e8d2c65> in get_glove_dataset(dataset)
     15                     cache_subdir=glove_path,
     16                     md5_hash=md5sums.get(dataset, None),
---> 17                     untar=True)

/Users/WayNoxchi/Miniconda3/envs/FAI/lib/python2.7/site-packages/keras/utils/data_utils.pyc in get_file(fname, origin, untar, md5_hash, cache_subdir)
    113                             functools.partial(dl_progress, progbar=progbar))
    114             except URLError as e:
--> 115                 raise Exception(error_msg.format(origin, e.errno, e.reason))
    116             except HTTPError as e:
    117                 raise Exception(error_msg.format(origin, e.code, e.msg))

Exception: URL fetch failure on https://files.fast.ai/models/glove/6B.50d.tgz: None -- [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)

In [ ]:
# trying to load the glove data I downloaded directly, before:
vecs, words, wordix = load_vectors('data/glove.6B/' + 'glove.' + '6B.50d' + '.txt')
# vecs, words, wordix = load_vectors('data/glove.6B/' + 'glove.' + '6B.50d' + '.tgz')
# not successful. get_file(..) returns filepath as '.tar' ? as .tgz doesn't work.
# ??get_file # keras.utils.data_utils.get_file(..)

In [28]:
# that doesn't work either, but method from wordvectors JNB worked so:
path = 'data/glove.6B/'
# res_path = path + 'results/'
res_path = 'data/imdb/results/'
%mkdir -p $res_path
# this way not working; so will pull vecs,words,wordidx manually:
# vecs, words, wordidx = load_vectors(get_glove('6B.50d'))
get_glove('6B.50d')
vecs, words, wordidx = load_glove(res_path + '6B.50d')

# NOTE: yay it worked..!..

In [29]:
def create_emb():
    n_fact = vecs.shape[1]
    emb = np.zeros((vocab_size, n_fact))
    
    for i in xrange(1, len(emb)):
        word = idx2word[i]
        if word and re.match(r"^[a-zA-Z0-9\-]*$", word):
            src_idx = wordidx[word]
            emb[i] = vecs[src_idx]
        else:
            # If we can't find the word in glove, randomly initialize
            emb[i] = normal(scale=0.6, size=(n_fact,))
    
    # This is our "rare word" id - we want to randomly initialize
    emb[-1] = normal(scale=0.6, size=(n_fact,))
    emb /= 3
    return emb

In [30]:
emb = create_emb()
# this embedding matrix is now the glove word vectors, indexed according to 
# the imdb dataset.

We pass out embedding matrix to the Embedding constructor, and set it to non-trainable.


In [31]:
model = Sequential([
    Embedding(vocab_size, 50, input_length=seq_len, dropout=0.2,
              weights=[emb], trainable=False),
    Dropout(0.25),
    Convolution1D(64, 5, border_mode='same', activation='relu'),
    Dropout(0.25),
    MaxPooling1D(),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])
# this is copy-pasted of the previous code, with the addition of the 
# weights being the pre-trained embeddings.
# We figure the weights are pretty good, so we'll initially set 
# trainable to False. Will finetune due to some words missing or etc..

In [32]:
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

In [60]:
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 222s - loss: 0.5868 - acc: 0.6868 - val_loss: 0.4844 - val_acc: 0.7903
Epoch 2/2
25000/25000 [==============================] - 246s - loss: 0.4984 - acc: 0.7660 - val_loss: 0.4602 - val_acc: 0.7956
Out[60]:
<keras.callbacks.History at 0x12f9a4990>

In [33]:
# running on GPU
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 24s - loss: 0.5922 - acc: 0.6682 - val_loss: 0.4823 - val_acc: 0.7908
Epoch 2/2
25000/25000 [==============================] - 15s - loss: 0.4951 - acc: 0.7636 - val_loss: 0.4586 - val_acc: 0.8117
Out[33]:
<keras.callbacks.History at 0x7f5214d13910>

We've already beated our previous model! But let's fine-tune the embedding weights - especially since the words we couldn't find in glove just have random embeddings.


In [34]:
model.layers[0].trainable=True

In [63]:
model.optimizer.lr=1e-4
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 237s - loss: 0.4715 - acc: 0.7809 - val_loss: 0.4244 - val_acc: 0.8246
Epoch 2/2
25000/25000 [==============================] - 211s - loss: 0.4546 - acc: 0.7885 - val_loss: 0.4361 - val_acc: 0.8038
Out[63]:
<keras.callbacks.History at 0x13207ac10>

In [35]:
# running on GPU
model.optimizer.lr=1e-4
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 11s - loss: 0.4748 - acc: 0.7783 - val_loss: 0.4309 - val_acc: 0.8166
Epoch 2/2
25000/25000 [==============================] - 11s - loss: 0.4492 - acc: 0.7924 - val_loss: 0.4227 - val_acc: 0.8172
Out[35]:
<keras.callbacks.History at 0x7f51d04cde90>

In [36]:
# the above was supposed to be 3 total epochs but I did 4 by mistake
model.save_weights(model_path+'glove50.h5')

Multi-size CNN

This is an implementation of a multi-size CNN as show in Ben Bowles' blog post.


In [37]:
from keras.layers import Merge

We use the functional API to create multiple ocnv layers of different sizes, and then concatenate them.


In [38]:
graph_in = Input((vocab_size, 50))
convs = [ ]
for fsz in xrange(3, 6):
    x = Convolution1D(64, fsz, border_mode='same', activation='relu')(graph_in)
    x = MaxPooling1D()(x)
    x = Flatten()(x)
    convs.append(x)
out = Merge(mode='concat')(convs)
graph = Model(graph_in, out)

In [39]:
emb = create_emb()

We then replace the conv/max-pool layer in our original CNN with the concatenated conv layers.


In [40]:
model = Sequential ([
    Embedding(vocab_size, 50, input_length=seq_len, dropout=0.2, weights=[emb]),
    Dropout(0.2),
    graph,
    Dropout(0.5),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')
    ])

In [41]:
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

In [70]:
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 468s - loss: 0.4986 - acc: 0.7366 - val_loss: 0.2978 - val_acc: 0.8749
Epoch 2/2
25000/25000 [==============================] - 489s - loss: 0.3156 - acc: 0.8706 - val_loss: 0.2785 - val_acc: 0.8826
Out[70]:
<keras.callbacks.History at 0x125a3d9d0>

In [42]:
# on GPU
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 42s - loss: 0.5007 - acc: 0.7389 - val_loss: 0.3242 - val_acc: 0.8682
Epoch 2/2
25000/25000 [==============================] - 56s - loss: 0.3159 - acc: 0.8704 - val_loss: 0.2714 - val_acc: 0.8874
Out[42]:
<keras.callbacks.History at 0x7f5211a36590>

Interestingly, I found that in this case I got best results when I started the embedding layer as being trainable, and then set it to non-trainable after a couple of epochs. I have no idea why! hmmm


In [43]:
model.layers[0].trainable=False

In [44]:
model.optimizer.lr=1e-5

In [74]:
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 610s - loss: 0.2759 - acc: 0.8918 - val_loss: 0.2625 - val_acc: 0.8956
Epoch 2/2
25000/25000 [==============================] - 676s - loss: 0.2613 - acc: 0.8962 - val_loss: 0.2534 - val_acc: 0.8970
Out[74]:
<keras.callbacks.History at 0x138529590>

In [45]:
# on gpu
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)


Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 67s - loss: 0.2844 - acc: 0.8858 - val_loss: 0.2683 - val_acc: 0.8887
Epoch 2/2
25000/25000 [==============================] - 67s - loss: 0.2600 - acc: 0.8954 - val_loss: 0.2836 - val_acc: 0.8800
Out[45]:
<keras.callbacks.History at 0x7f521539c990>

In [46]:
conv1.save_weights(model_path + 'conv1_1.h5')
# conv1.load_weights(model_path + 'conv1.h5')

This more complex architecture has given us another boost in accuracy.

LSTM

We haven't covered this bit yet!


In [48]:
model = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len, mask_zero=True,
              W_regularizer=l2(1e-6), dropout=0.2),
    LSTM(100, consume_less='gpu'),
    Dense(1, activation='sigmoid')])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_6 (Embedding)          (None, 500, 32)       160000      embedding_input_6[0][0]          
____________________________________________________________________________________________________
lstm_2 (LSTM)                    (None, 100)           53200       embedding_6[0][0]                
____________________________________________________________________________________________________
dense_10 (Dense)                 (None, 1)             101         lstm_2[0][0]                     
====================================================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
____________________________________________________________________________________________________

In [49]:
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=5, batch_size=64)
# NOTE: if this took 100s/epoch using TitanX's or Tesla K80s ... use the Linux machine for this


Train on 25000 samples, validate on 25000 samples
Epoch 1/5
25000/25000 [==============================] - 241s - loss: 0.5485 - acc: 0.7101 - val_loss: 0.4063 - val_acc: 0.8214
Epoch 2/5
25000/25000 [==============================] - 241s - loss: 0.3534 - acc: 0.8539 - val_loss: 0.3629 - val_acc: 0.8468
Epoch 3/5
25000/25000 [==============================] - 241s - loss: 0.3167 - acc: 0.8712 - val_loss: 0.2983 - val_acc: 0.8784
Epoch 4/5
25000/25000 [==============================] - 241s - loss: 0.3039 - acc: 0.8771 - val_loss: 0.3154 - val_acc: 0.8748
Epoch 5/5
25000/25000 [==============================] - 241s - loss: 0.2696 - acc: 0.8899 - val_loss: 0.3017 - val_acc: 0.8812
Out[49]:
<keras.callbacks.History at 0x7f5200f59350>

In [50]:
conv1.save_weights(model_path + 'LSTM_1.h5')

In [ ]:


In [ ]:


In [ ]: