In [1]:

    
from theano.sandbox import cuda









    



WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX 1070 (CNMeM is disabled, cuDNN 5110)



In [2]:

    
%matplotlib inline
import utils #; reload(utils)
from utils import *
from __future__ import division, print_function
import pickle
import utils_MDR









    



Using Theano backend.



In [3]:

    
model_path = 'data/imdb/models/'

MDR: and needs by GPU-fan code, too...



In [4]:

    
import utils_MDR
from utils_MDR import *

Setup data

We're going to look at the IMDB dataset, which contains movie reviews from IMDB, along with their sentiment. Keras comes with some helpers for this dataset.



In [5]:

    
from keras.datasets import imdb
idx = imdb.get_word_index()

This is the word list:



In [6]:

    
idx_arr = sorted(idx, key=idx.get)
idx_arr[:10]









    Out[6]:





['the', 'and', 'a', 'of', 'to', 'is', 'br', 'in', 'it', 'i']

...and this is the mapping from id to word



In [7]:

    
## idx2word = {v: k for k, v in idx.iteritems()}  ## Py 2.7
idx2word = {v: k for k, v in idx.items()}         ## Py 3.x

We download the reviews using code copied from keras.datasets:



In [8]:

    
path = get_file('imdb_full.pkl',
                origin='https://s3.amazonaws.com/text-datasets/imdb_full.pkl',
                md5_hash='d091312047c43cf9e4e38fef92437263')
f = open(path, 'rb')
(x_train, labels_train), (x_test, labels_test) = pickle.load(f)



In [9]:

    
len(x_train)









    Out[9]:





25000

Here's the 1st review. As you see, the words have been replaced by ids. The ids can be looked up in idx2word.



In [10]:

    
', '.join(map(str, x_train[0]))









    Out[10]:





'23022, 309, 6, 3, 1069, 209, 9, 2175, 30, 1, 169, 55, 14, 46, 82, 5869, 41, 393, 110, 138, 14, 5359, 58, 4477, 150, 8, 1, 5032, 5948, 482, 69, 5, 261, 12, 23022, 73935, 2003, 6, 73, 2436, 5, 632, 71, 6, 5359, 1, 25279, 5, 2004, 10471, 1, 5941, 1534, 34, 67, 64, 205, 140, 65, 1232, 63526, 21145, 1, 49265, 4, 1, 223, 901, 29, 3024, 69, 4, 1, 5863, 10, 694, 2, 65, 1534, 51, 10, 216, 1, 387, 8, 60, 3, 1472, 3724, 802, 5, 3521, 177, 1, 393, 10, 1238, 14030, 30, 309, 3, 353, 344, 2989, 143, 130, 5, 7804, 28, 4, 126, 5359, 1472, 2375, 5, 23022, 309, 10, 532, 12, 108, 1470, 4, 58, 556, 101, 12, 23022, 309, 6, 227, 4187, 48, 3, 2237, 12, 9, 215'

The first word of the first review is 23022. Let's see what that is.



In [11]:

    
idx2word[23022]









    Out[11]:





'bromwell'

Here's the whole review, mapped from ids to words.



In [12]:

    
' '.join([idx2word[o] for o in x_train[0]])









    Out[12]:





"bromwell high is a cartoon comedy it ran at the same time as some other programs about school life such as teachers my 35 years in the teaching profession lead me to believe that bromwell high's satire is much closer to reality than is teachers the scramble to survive financially the insightful students who can see right through their pathetic teachers' pomp the pettiness of the whole situation all remind me of the schools i knew and their students when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled at high a classic line inspector i'm here to sack one of your teachers student welcome to bromwell high i expect that many adults of my age think that bromwell high is far fetched what a pity that it isn't"

The labels are 1 for positive, 0 for negative.



In [13]:

    
labels_train[:10]









    Out[13]:





[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Reduce vocab size by setting rare words to max index.



In [12]:

    
vocab_size = 5000

trn = [np.array([i if i<vocab_size-1 else vocab_size-1 for i in s]) for s in x_train]
test = [np.array([i if i<vocab_size-1 else vocab_size-1 for i in s]) for s in x_test]

Look at distribution of lengths of sentences.



In [13]:

    
## create an array of 'len'...gths
# lens = np.array(map(len, trn)) ## only works in Py2.x, not 3.x ... 
##    'map in Python 3 return an iterator, while map in Python 2 returns a list'
##    (https://stackoverflow.com/questions/35691489/error-in-python-3-5-cant-add-map-results-together)

# This is a quick fix - not really a proper P3x approach.

lens = np.array(list(map(len, trn)))   ## wrapped a list around it

(lens.max(), lens.min(), lens.mean())









    Out[13]:





(2493, 10, 237.71364)

Expect: (2493, 10, 237.71364)

Pad (with zero) or truncate each sentence to make consistent length.



In [14]:

    
seq_len = 500

trn = sequence.pad_sequences(trn, maxlen=seq_len, value=0)
test = sequence.pad_sequences(test, maxlen=seq_len, value=0)

This results in nice rectangular matrices that can be passed to ML algorithms. Reviews shorter than 500 words are pre-padded with zeros, those greater are truncated.



In [15]:

    
trn.shape









    Out[15]:





(25000, 500)

Expect: (25000, 500)

Create simple models

Single hidden layer NN

The simplest model that tends to give reasonable results is a single hidden layer net. So let's try that. Note that we can't expect to get any useful results by feeding word ids directly into a neural net - so instead we use an embedding to replace them with a vector of 32 (initially random) floats for each word in the vocab.



In [17]:

    
model = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])



In [18]:

    
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
model.summary()









    



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_1 (Embedding)          (None, 500, 32)       160000      embedding_input_1[0][0]          
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 16000)         0           embedding_1[0][0]                
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 100)           1600100     flatten_1[0][0]                  
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 100)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             101         dropout_1[0][0]                  
====================================================================================================
Total params: 1,760,201
Trainable params: 1,760,201
Non-trainable params: 0
____________________________________________________________________________________________________



In [19]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 2s - loss: 0.5320 - acc: 0.6807 - val_loss: 0.2968 - val_acc: 0.8726
Epoch 2/2
25000/25000 [==============================] - 2s - loss: 0.2233 - acc: 0.9152 - val_loss: 0.3010 - val_acc: 0.8757

JH's result: Train on 25000 samples, validate on 25000 samples Epoch 1/2 25000/25000 [==============================] - 1s - loss: 0.4651 - acc: 0.7495 - val_loss: 0.2830 - val_acc: 0.8804 Epoch 2/2 25000/25000 [==============================] - 1s - loss: 0.1969 - acc: 0.9265 - val_loss: 0.3195 - val_acc: 0.8694

The stanford paper that this dataset is from cites a state of the art accuracy (without unlabelled data) of 0.883. So we're short of that, but on the right track.

Single conv layer with max pooling

A CNN is likely to work better, since it's designed to take advantage of ordered data. We'll need to use a 1D CNN, since a sequence of words is 1D.



In [20]:

    
conv1 = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len, dropout=0.2),
    Dropout(0.2),
    Convolution1D(64, 5, border_mode='same', activation='relu'),
    Dropout(0.2),
    MaxPooling1D(),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])



In [21]:

    
conv1.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])



In [22]:

    
set_gpu_fan_speed(90)
conv1.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 8s - loss: 0.4949 - acc: 0.7320 - val_loss: 0.3057 - val_acc: 0.8719
Epoch 2/4
25000/25000 [==============================] - 8s - loss: 0.2976 - acc: 0.8811 - val_loss: 0.2695 - val_acc: 0.8867
Epoch 3/4
25000/25000 [==============================] - 8s - loss: 0.2577 - acc: 0.8978 - val_loss: 0.2588 - val_acc: 0.8941
Epoch 4/4
25000/25000 [==============================] - 8s - loss: 0.2378 - acc: 0.9068 - val_loss: 0.2556 - val_acc: 0.8941

JH's result: Train on 25000 samples, validate on 25000 samples Epoch 1/4 25000/25000 [==============================] - 4s - loss: 0.4984 - acc: 0.7250 - val_loss: 0.2922 - val_acc: 0.8816 Epoch 2/4 25000/25000 [==============================] - 4s - loss: 0.2971 - acc: 0.8836 - val_loss: 0.2681 - val_acc: 0.8911 Epoch 3/4 25000/25000 [==============================] - 4s - loss: 0.2568 - acc: 0.8983 - val_loss: 0.2551 - val_acc: 0.8947 Epoch 4/4 25000/25000 [==============================] - 4s - loss: 0.2427 - acc: 0.9029 - val_loss: 0.2558 - val_acc: 0.8947

That's well past the Stanford paper's accuracy - another win for CNNs!



In [23]:

    
conv1.save_weights(model_path + 'conv1.h5')



In [32]:

    
conv1.load_weights(model_path + 'conv1.h5')









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-1f275b7c9b27> in <module>()
----> 1 conv1.load_weights(model_path + 'conv1.h5')

NameError: name 'conv1' is not defined

Pre-trained vectors

You may want to look at wordvectors.ipynb before moving on.

In this section, we replicate the previous CNN, but using pre-trained embeddings.



In [104]:

    
def load_vectors(loc):
    return (load_array(loc+'.dat'),
        pickle.load(open(loc+'_words.pkl','rb')),
        pickle.load(open(loc+'_idx.pkl','rb')))



In [119]:

    
#vecs, words, wordidx = load_vectors('data/glove/results/6B.50d')  ## JH's original
vecs, words, wordidx = load_vectors('data/glove/results/6B.100d')   ## MDR's experiment

The glove word ids and imdb word ids use different indexes. So we create a simple function that creates an embedding matrix using the indexes from imdb, and the embeddings from glove (where they exist).



In [120]:

    
def create_emb():
    n_fact = vecs.shape[1]
    emb = np.zeros((vocab_size, n_fact))

    for i in range(1,len(emb)):
        word = idx2word[i]
        if word and re.match(r"^[a-zA-Z0-9\-]*$", word):
            src_idx = wordidx[word]
            emb[i] = vecs[src_idx]
        else:
            # If we can't find the word in glove, randomly initialize
            emb[i] = normal(scale=0.6, size=(n_fact,))

    # This is our "rare word" id - we want to randomly initialize
    emb[-1] = normal(scale=0.6, size=(n_fact,))
    emb/=3
    return emb



In [121]:

    
emb = create_emb()

We pass our embedding matrix to the Embedding constructor, and set it to non-trainable.



In [122]:

    
model = Sequential([
    #Embedding(vocab_size, 50, 
    Embedding(vocab_size, 100, 
              input_length=seq_len, dropout=0.2, weights=[emb], trainable=False),
    Dropout(0.25),   ## JH (0.25)
    Convolution1D(64, 5, border_mode='same', activation='relu'),
    Dropout(0.25),   ## JH (0.25)
    MaxPooling1D(),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.3),    ## JH (0.7)
    Dense(1, activation='sigmoid')])



In [123]:

    
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

I get better results with the 100d embedding than I do with the 50d embedding, after 4 epochs. - MDR



In [124]:

    
# model.optimizer.lr = 1e-3  ## MDR: added to the 50d for marginally faster training than I was getting
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)
set_gpu_fan_speed(0)
model.save_weights(model_path+'glove100_wt1.h5')  ## care, with the weight count!









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 8s - loss: 0.5658 - acc: 0.6951 - val_loss: 0.4579 - val_acc: 0.7869
Epoch 2/4
25000/25000 [==============================] - 7s - loss: 0.4530 - acc: 0.7906 - val_loss: 0.4927 - val_acc: 0.7446
Epoch 3/4
25000/25000 [==============================] - 7s - loss: 0.4276 - acc: 0.8032 - val_loss: 0.3745 - val_acc: 0.8446
Epoch 4/4
25000/25000 [==============================] - 7s - loss: 0.4013 - acc: 0.8186 - val_loss: 0.3630 - val_acc: 0.8491

MDR's result (50d embedding): Train on 25000 samples, validate on 25000 samples Epoch 1/4 25000/25000 [==============================] - 8s - loss: 0.5619 - acc: 0.7015 - val_loss: 0.4735 - val_acc: 0.7981 Epoch 2/4 25000/25000 [==============================] - 8s - loss: 0.4817 - acc: 0.7703 - val_loss: 0.4583 - val_acc: 0.7811 Epoch 3/4 25000/25000 [==============================] - 8s - loss: 0.4535 - acc: 0.7903 - val_loss: 0.4082 - val_acc: 0.8293 Epoch 4/4 25000/25000 [==============================] - 8s - loss: 0.4331 - acc: 0.7967 - val_loss: 0.4409 - val_acc: 0.8093

JH's result (50d embedding?): Train on 25000 samples, validate on 25000 samples Epoch 1/2 25000/25000 [==============================] - 4s - loss: 0.5217 - acc: 0.7172 - val_loss: 0.2942 - val_acc: 0.8815 Epoch 2/2 25000/25000 [==============================] - 4s - loss: 0.3169 - acc: 0.8719 - val_loss: 0.2662 - val_acc: 0.8978



In [70]:

    
model.load_weights(model_path+'glove50_wt1.h5')



In [129]:

    
model.load_weights(model_path+'glove100_wt1.h5')

MDR: so my initial results were nowhere near as good, but we're not overfitting yet.

MDR: my results are nowhere near JH's! [] Investigate this!

We already have beaten our previous model! But let's fine-tune the embedding weights - especially since the words we couldn't find in glove just have random embeddings.



In [126]:

    
model.layers[0].trainable=True



In [127]:

    
model.optimizer.lr=1e-4



In [128]:

    
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 8s - loss: 0.3819 - acc: 0.8287 - val_loss: 0.3785 - val_acc: 0.8316
Epoch 2/4
25000/25000 [==============================] - 7s - loss: 0.3709 - acc: 0.8315 - val_loss: 0.3816 - val_acc: 0.8361
Epoch 3/4
25000/25000 [==============================] - 7s - loss: 0.3541 - acc: 0.8413 - val_loss: 0.3516 - val_acc: 0.8484
Epoch 4/4
25000/25000 [==============================] - 7s - loss: 0.3400 - acc: 0.8511 - val_loss: 0.3947 - val_acc: 0.8204






    Out[128]:





<keras.callbacks.History at 0x7f36ee1a8d30>

MDR result (50d embedding) Train on 25000 samples, validate on 25000 samples Epoch 1/4 25000/25000 [==============================] - 8s - loss: 0.3990 - acc: 0.8183 - val_loss: 0.3893 - val_acc: 0.8295 Epoch 2/4 25000/25000 [==============================] - 8s - loss: 0.3947 - acc: 0.8213 - val_loss: 0.4191 - val_acc: 0.8040 Epoch 3/4 25000/25000 [==============================] - 8s - loss: 0.3746 - acc: 0.8301 - val_loss: 0.3859 - val_acc: 0.8282 Epoch 4/4 25000/25000 [==============================] - 8s - loss: 0.3680 - acc: 0.8343 - val_loss: 0.3931 - val_acc: 0.8174

"As expected, that's given us a nice little boost. :)" - MDR: actually made it worse! For both 50d and 100d cases!



In [75]:

    
model.save_weights(model_path+'glove50.h5')

Multi-size CNN

This is an implementation of a multi-size CNN as shown in Ben Bowles' excellent blog post.



In [130]:

    
from keras.layers import Merge

We use the functional API to create multiple conv layers of different sizes, and then concatenate them.



In [136]:

    
#graph_in = Input ((vocab_size, 50))
graph_in = Input ((vocab_size, 100))  ## MDR - for 100d embedding
convs = [ ] 
for fsz in range (3, 6): 
    x = Convolution1D(64, fsz, border_mode='same', activation="relu")(graph_in)
    x = MaxPooling1D()(x) 
    x = Flatten()(x) 
    convs.append(x)
out = Merge(mode="concat")(convs) 
graph = Model(graph_in, out)



In [137]:

    
emb = create_emb()

We then replace the conv/max-pool layer in our original CNN with the concatenated conv layers.



In [138]:

    
model = Sequential ([
    #Embedding(vocab_size, 50, 
    Embedding(vocab_size, 100, 
              input_length=seq_len, dropout=0.2, weights=[emb]),
    Dropout (0.2),
    graph,
    Dropout (0.5),
    Dense (100, activation="relu"),
    Dropout (0.7),
    Dense (1, activation='sigmoid')
    ])



In [139]:

    
model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])

MDR: it turns out that there's no improvement, in this expt, for using the 100d embedding over the 50d.



In [140]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 22s - loss: 0.5049 - acc: 0.7261 - val_loss: 0.2961 - val_acc: 0.8833
Epoch 2/2
25000/25000 [==============================] - 22s - loss: 0.3106 - acc: 0.8742 - val_loss: 0.2761 - val_acc: 0.8838

MDR's results (50d): Train on 25000 samples, validate on 25000 samples Epoch 1/2 25000/25000 [==============================] - 18s - loss: 0.4930 - acc: 0.7455 - val_loss: 0.3141 - val_acc: 0.8803 Epoch 2/2 25000/25000 [==============================] - 18s - loss: 0.3153 - acc: 0.8732 - val_loss: 0.2724 - val_acc: 0.8936

JH's results (50d?): Train on 25000 samples, validate on 25000 samples Epoch 1/2 25000/25000 [==============================] - 11s - loss: 0.3997 - acc: 0.8207 - val_loss: 0.3032 - val_acc: 0.8943 Epoch 2/2 25000/25000 [==============================] - 11s - loss: 0.2882 - acc: 0.8832 - val_loss: 0.2646 - val_acc: 0.9029

Interestingly, I found that in this case I got best results when I started the embedding layer as being trainable, and then set it to non-trainable after a couple of epochs. I have no idea why!

MDR: (does it limit overfitting, maybe?) ... anyway, my running of the same code achieved nearly the same results, so much happier.



In [82]:

    
model.save_weights(model_path+'glove50_conv2_wt1.h5')



In [88]:

    
model.load_weights(model_path+'glove50_conv2_wt1.h5')

MDR: I want to test this statement from JH, above, by running another couple of epochs. First let's reduce the LR.



In [89]:

    
model.optimizer.lr = 1e-5



In [90]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=2, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/2
25000/25000 [==============================] - 18s - loss: 0.2815 - acc: 0.8887 - val_loss: 0.2644 - val_acc: 0.8961
Epoch 2/2
25000/25000 [==============================] - 17s - loss: 0.2622 - acc: 0.8949 - val_loss: 0.2681 - val_acc: 0.8893

Okay, so that didn't help. Reload the weights from before.



In [95]:

    
model.load_weights(model_path+'glove50_conv2_wt1.h5')

MDR: following JH's plan, from this point.



In [96]:

    
model.layers[0].trainable=False



In [97]:

    
model.optimizer.lr=1e-5



In [98]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 17s - loss: 0.2802 - acc: 0.8879 - val_loss: 0.2646 - val_acc: 0.8928
Epoch 2/4
25000/25000 [==============================] - 18s - loss: 0.2593 - acc: 0.8984 - val_loss: 0.2579 - val_acc: 0.8976
Epoch 3/4
25000/25000 [==============================] - 18s - loss: 0.2480 - acc: 0.8998 - val_loss: 0.2541 - val_acc: 0.8960
Epoch 4/4
25000/25000 [==============================] - 17s - loss: 0.2350 - acc: 0.9072 - val_loss: 0.2550 - val_acc: 0.8965






    Out[98]:





<keras.callbacks.History at 0x7f370f2f65f8>

This more complex architecture has given us another boost in accuracy.

MDR: although I didn't see a huge advantage, personally.

LSTM

We haven't covered this bit yet!

MDR: so, there's no preloaded embedding, here - it's a fresh, random set?



In [ ]:

    
model = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len, mask_zero=True,
              W_regularizer=l2(1e-6), dropout=0.2),
    LSTM(100, consume_less='gpu'),
    Dense(1, activation='sigmoid')])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

JH's result: ____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== embedding_13 (Embedding) (None, 500, 32) 160064 embedding_input_13[0][0] ____________________________________________________________________________________________________ lstm_13 (LSTM) (None, 100) 53200 embedding_13[0][0] ____________________________________________________________________________________________________ dense_18 (Dense) (None, 1) 101 lstm_13[0][0] ==================================================================================================== Total params: 213365 ____________________________________________________________________________________________________

MDR: hang on! These summary() outputs look quite different, to me! Not least that this is apparently the 13th lstm he's produced (in this session?) - and yet I've fot a higher numbered dense layer than him. Eh?

But then I reach better results in fewer epochs than he does, this time around. Compare the times, and the more stable convergence in my results. Weird. Still, that's my first LSTM!!



In [100]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=5, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/5
25000/25000 [==============================] - 82s - loss: 0.5113 - acc: 0.7419 - val_loss: 0.5116 - val_acc: 0.7670
Epoch 2/5
25000/25000 [==============================] - 83s - loss: 0.3604 - acc: 0.8495 - val_loss: 0.4464 - val_acc: 0.8411
Epoch 3/5
25000/25000 [==============================] - 82s - loss: 0.3048 - acc: 0.8774 - val_loss: 0.3181 - val_acc: 0.8662
Epoch 4/5
25000/25000 [==============================] - 83s - loss: 0.2825 - acc: 0.8858 - val_loss: 0.2985 - val_acc: 0.8759
Epoch 5/5
25000/25000 [==============================] - 83s - loss: 0.2588 - acc: 0.8964 - val_loss: 0.2927 - val_acc: 0.8795

JH's result: Train on 25000 samples, validate on 25000 samples Epoch 1/5 25000/25000 [==============================] - 100s - loss: 0.5007 - acc: 0.7446 - val_loss: 0.3475 - val_acc: 0.8531 Epoch 2/5 25000/25000 [==============================] - 100s - loss: 0.3524 - acc: 0.8507 - val_loss: 0.3602 - val_acc: 0.8453 Epoch 3/5 25000/25000 [==============================] - 99s - loss: 0.3750 - acc: 0.8342 - val_loss: 0.4758 - val_acc: 0.7710 Epoch 4/5 25000/25000 [==============================] - 99s - loss: 0.3238 - acc: 0.8652 - val_loss: 0.3094 - val_acc: 0.8725 Epoch 5/5 25000/25000 [==============================] - 99s - loss: 0.2681 - acc: 0.8920 - val_loss: 0.3018 - val_acc: 0.8776



In [101]:

    
model.save_weights(model_path+'glove50_lstm1_wt1.h5')

MDR: let's see if it's possible to improve on that.



In [102]:

    
model.optimizer.lr = 1e-5
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=5, batch_size=64)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/5
25000/25000 [==============================] - 82s - loss: 0.2439 - acc: 0.9009 - val_loss: 0.3326 - val_acc: 0.8594
Epoch 2/5
25000/25000 [==============================] - 82s - loss: 0.2322 - acc: 0.9080 - val_loss: 0.3005 - val_acc: 0.8785
Epoch 3/5
25000/25000 [==============================] - 84s - loss: 0.2064 - acc: 0.9188 - val_loss: 0.3347 - val_acc: 0.8700
Epoch 4/5
25000/25000 [==============================] - 83s - loss: 0.2020 - acc: 0.9184 - val_loss: 0.3178 - val_acc: 0.8694
Epoch 5/5
25000/25000 [==============================] - 83s - loss: 0.1983 - acc: 0.9221 - val_loss: 0.3009 - val_acc: 0.8744






    Out[102]:





<keras.callbacks.History at 0x7f370fa32048>

MDR: Conclusion: that may be all that's achievable with this dataset, of course. It's sentiment, after all!

MDR's lstm + preloaded embeddings

God knows whether this will work. Let's see if I can create an LSTM layer on top of pretrained embeddings...



In [150]:

    
model2 = Sequential([
    Embedding(vocab_size, 100, input_length = seq_len,
             #mask_zero=True, W_regularizer=l2(1e-6), ## used in lstm above - not needed?
              dropout=0.2, weights=[emb], trainable = False),
    LSTM(100, consume_less = 'gpu'),
    Dense(100, activation = 'sigmoid')
])



In [151]:

    
model2.summary()









    



____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_22 (Embedding)         (None, 500, 100)      500000      embedding_input_22[0][0]         
____________________________________________________________________________________________________
lstm_4 (LSTM)                    (None, 100)           80400       embedding_22[0][0]               
____________________________________________________________________________________________________
dense_40 (Dense)                 (None, 100)           10100       lstm_4[0][0]                     
====================================================================================================
Total params: 590,500
Trainable params: 90,500
Non-trainable params: 500,000
____________________________________________________________________________________________________



In [152]:

    
model2.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])



In [153]:

    
set_gpu_fan_speed(90)
model.fit(trn, labels_train, validation_data=(test, labels_test), nb_epoch=4, batch_size=64)
set_gpu_fan_speed(0)









    



Train on 25000 samples, validate on 25000 samples
Epoch 1/4
25000/25000 [==============================] - 21s - loss: 0.2042 - acc: 0.9190 - val_loss: 0.2563 - val_acc: 0.8962
Epoch 2/4
25000/25000 [==============================] - 21s - loss: 0.2022 - acc: 0.9200 - val_loss: 0.2583 - val_acc: 0.8943
Epoch 3/4
25000/25000 [==============================] - 21s - loss: 0.1895 - acc: 0.9257 - val_loss: 0.2540 - val_acc: 0.8965
Epoch 4/4
25000/25000 [==============================] - 21s - loss: 0.1796 - acc: 0.9276 - val_loss: 0.2564 - val_acc: 0.8958

MDR: OMFG. It needs one epoch to be 90% accurate.



In [ ]: