In [1]:

    
'''Trains two recurrent neural networks based upon a story and a question.
The resulting merged vector is then queried to answer a range of bAbI tasks.
The results are comparable to those for an LSTM model provided in Weston et al.:
"Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks"
http://arxiv.org/abs/1502.05698
Task Number                  | FB LSTM Baseline | Keras QA
---                          | ---              | ---
QA1 - Single Supporting Fact | 50               | 100.0
QA2 - Two Supporting Facts   | 20               | 50.0
QA3 - Three Supporting Facts | 20               | 20.5
QA4 - Two Arg. Relatimport numpy as np
import random
from math import floor
np.random.seed(1337)  # for reproducibility

#from keras.utils.data_utils import get_file
#from keras.datasets.data_utils import get_file
from keras.layers.embeddings import Embedding
from keras.layers.core import Dense, Merge, Dropout, RepeatVector,MaxoutDense,Activation
from keras.layers import recurrent
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences
import csvions     | 61               | 62.9
QA5 - Three Arg. Relations   | 70               | 61.9
QA6 - Yes/No Questions       | 48               | 50.7
QA7 - Counting               | 49               | 78.9
QA8 - Lists/Sets             | 45               | 77.2
QA9 - Simple Negation        | 64               | 64.0
QA10 - Indefinite Knowledge  | 44               | 47.7
QA11 - Basic Coreference     | 72               | 74.9
QA12 - Conjunction           | 74               | 76.4
QA13 - Compound Coreference  | 94               | 94.4
QA14 - Time Reasoning        | 27               | 34.8
QA15 - Basic Deduction       | 21               | 32.4
QA16 - Basic Induction       | 23               | 50.6
QA17 - Positional Reasoning  | 51               | 49.1
QA18 - Size Reasoning        | 52               | 90.8
QA19 - Path Finding          | 8                | 9.0
QA20 - Agent's Motivations   | 91               | 90.7
For the resources related to the bAbI project, refer to:
https://research.facebook.com/researchers/1543934539189348
Notes:
- With default word, sentence, and query vector sizes, the GRU model achieves:
  - 100% test accuracy on QA1 in 20 epochs (2 seconds per epoch on CPU)
  - 50% test accuracy on QA2 in 20 epochs (16 seconds per epoch on CPU)
In comparison, the Facebook paper achieves 50% and 20% for the LSTM baseline.
- The task does not traditionally parse the question separately. This likely
improves accuracy and is a good example of merging two RNNs.
- The word vector embeddings are not shared between the story and question RNNs.
- See how the accuracy changes given 10,000 training samples (en-10k) instead
of only 1000. 1000 was used in order to be comparable to the original paper.
- Experiment with GRU, LSTM, and JZS1-3 as they give subtly different results.
- The length and noise (i.e. 'useless' story components) impact the ability for
LSTMs / GRUs to provide the correct answer. Given only the supporting facts,
these RNNs can achieve 100% accuracy on many tasks. Memory networks and neural
networks that use attentional processes can efficiently search through this
noise to find the relevant statements, improving performance substantially.
This becomes especially obvious on QA2 and QA3, both far longer than QA1.
'''
#from __future__ import print_function
#from functools import reduce
#import re
#import tarfile

import numpy as np
import random
from math import floor
np.random.seed(1337)  # for reproducibility

#from keras.utils.data_utils import get_file
#from keras.datasets.data_utils import get_file
from keras.layers.embeddings import Embedding
from keras.layers.core import Dense, Merge, Dropout, RepeatVector,MaxoutDense,Activation
from keras.layers import recurrent
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences
import csv









    



Using Theano backend.
/usr/local/lib/python2.7/dist-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")



In [2]:

    
with open ('train.csv') as source:
    spam = csv.DictReader(source)
    trainset = list(spam)



In [3]:

    
def getData(splitper):
    random.shuffle(trainset)
    splitper = int(floor(splitper * len(trainset)) + 1)
    #get The Informations
    traininglst = trainset[splitper:]
    testinglst = trainset[:splitper]
    train_X=[]
    train_Xt=[]
    train_Y=[]
    test_X=[]
    test_Xt =[]
    test_Y=[]
    for element in traininglst:
        train_X.append(element['search_term'].split())
        train_Xt.append(element['product_title'].split())
        train_Y.append(float(element['relevance']))
    for element in testinglst:
        test_X.append(element['search_term'].split())
        test_Xt.append(element['product_title'].split())
        test_Y.append(float(element['relevance']))
    
    
    return (train_X,train_Xt,train_Y,test_X,test_Xt,test_Y)



In [4]:

    
(train_X,train_Xt,train_Y,test_X,test_Xt,test_Y)= getData(0.2)



In [5]:

    
ST_MAX_LENGTH = max(map(len, (x for x in train_X + test_X)))
PN_MAX_LENGTH = max(map(len, (x for x in train_Xt + test_Xt)))



In [6]:

    
print 'All Vocabulary , May take 3-5 minutes'
All_Vocabulary = sorted(reduce(lambda x, y: x | y, (set(vocab) for vocab in train_X+test_X+train_Xt+
                                                    test_Xt)))'









    



All Vocabulary , May take 3-5 minutes



In [7]:

    
vocab_size = len(All_Vocabulary)+1
word_idx = dict((c, i + 1) for i, c in enumerate(All_Vocabulary))



In [8]:

    
def Vectorize():
    global train_X,train_Xt,train_Y,test_X,test_Xt,test_Y
    for i in range (0,len(train_X)):
        train_X[i] = [word_idx[l] for l in train_X[i]]
    train_X= pad_sequences(train_X,ST_MAX_LENGTH)
    for i in range (0,len(train_Xt)):
        train_Xt[i] = [word_idx[l] for l in train_Xt[i]]
    train_Xt =pad_sequences(train_Xt,PN_MAX_LENGTH)
    for i in range (0,len(test_X)):
        test_X[i] = [word_idx[l] for l in test_X[i]]
    test_X= pad_sequences(test_X,ST_MAX_LENGTH)
    for i in range (0,len(test_Xt)):
        test_Xt[i] = [word_idx[l] for l in test_Xt[i]]
    test_Xt = pad_sequences(test_Xt,PN_MAX_LENGTH)
    train_Y = np.array(train_Y)
    test_Y = np.array(test_Y)



In [9]:

    
Vectorize()



In [10]:

    
print('All Vocabulary  = {}'.format(vocab_size))
print('Train X.shape = {},Test X.shape{}'.format(train_X.shape,test_X.shape))
print('Train Xt.shape = {},Test Xt.shape{}'.format(train_Xt.shape,test_Xt.shape))
print('Train_Y.shape = {},Test_Y.shape{}'.format(train_Y.shape,test_Y.shape))
print('ST_MAX_LENGTH, PN_MAX_LENGTH = {}, {}'.format(ST_MAX_LENGTH, PN_MAX_LENGTH))









    



All Vocabulary  = 36954
Train X.shape = (59253, 14),Test X.shape(14814, 14)
Train Xt.shape = (59253, 35),Test Xt.shape(14814, 35)
Train_Y.shape = (59253,),Test_Y.shape(14814,)
ST_MAX_LENGTH, PN_MAX_LENGTH = 14, 35



In [25]:

    
RNN = recurrent.LSTM
EMBED_HIDDEN_SIZE = 50
SENT_HIDDEN_SIZE = 100
QUERY_HIDDEN_SIZE = 100
BATCH_SIZE = 32
EPOCHS = 2



In [12]:

    
train_Y[0:10]









    Out[12]:





array([ 1.67,  3.  ,  1.67,  3.  ,  2.  ,  2.67,  2.67,  3.  ,  3.  ,  3.  ])

You could run one recurrent layer over product title and another one over search term. Then merge the resulting vectors together and use linear layer to predict the relevance. The closest thing I can suggest is the question answering example in Keras: https://github.com/fchollet/keras/blob/master/examples/babi_rnn.py

They have a sentence and question, and they predict answer. You have product title and search term, and trying to predict relevance. They use softmax and cross-entropy loss for prediction. Your case is simpler, because you predict just one number, you can use just linear layer and squared error loss.

Tambet

sentrnn is missing the RNN layer. RNN layer should have return_sequences=False.
qrnn is missing the mask_zero, this is useful when you are padding your sequences with zeros.
In Merge layer I would use mode='concat'.
The last Dense layer should have just 1 output and loss function should be mean_squared_error.

I'm not sure if these fix your error, but they are necessary to create the model the way I intended.

Tambet

This seems to be the main error: "Cast bool to uint8 is not supported". What are your targets, one-hot vectors? Then maybe softmax loss was actually reasonable. And some people say, that softmax tends to converge better than mse. Also check, if converting targets to int-s, instead of bool-s, works better.

Tambet



In [13]:

    
print('Build model...')

sentrnn = Sequential()
sentrnn.add(Embedding(vocab_size, EMBED_HIDDEN_SIZE, input_length=ST_MAX_LENGTH, mask_zero=True))
sentrnn.add(Dropout(0.3))
sentrnn.add(RNN(EMBED_HIDDEN_SIZE, return_sequences=False))

qrnn = Sequential()
qrnn.add(Embedding(vocab_size, EMBED_HIDDEN_SIZE, input_length=PN_MAX_LENGTH,mask_zero=True))
qrnn.add(Dropout(0.3))
qrnn.add(RNN(EMBED_HIDDEN_SIZE, return_sequences=False))









    



Build model...



In [14]:

    
print sentrnn.output_shape
print qrnn.output_shape
m = Merge([sentrnn, qrnn], mode='concat')
print m.output_shape
#m.set_input_shape((None,100))









    



(None, 50)
(None, 50)
(None, 100)



In [22]:

    
model = Sequential()
model.add(m)
model.add(Dense(10))
model.add(Dropout(0.3))
#orignial code#
model.add(Dense(output_dim=1))
#model.compile(optimizer='adam', loss='mean_squared_error')
### New Try
#model.add(Dense(output_dim=1,activati))

model.compile(loss='mse', optimizer='rmsprop')
#model.summary()



In [17]:

    
print train_X[0]
print train_Xt[0]
print train_Y[0]









    



[    0     0     0     0     0     0     0     0     0     0     0 33991
 30847  7239]
[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0 21111 12103
 22868 17984 14471 17315  9569  3283 32660 36864  7929 32660   872]
1.67



In [26]:

    
print('Training')
model.fit([train_X, train_Xt], train_Y, batch_size=BATCH_SIZE, nb_epoch=EPOCHS, validation_split=0.05, show_accuracy=True)
loss, acc = model.evaluate([test_X, test_Xt], test_Y, batch_size=BATCH_SIZE, show_accuracy=True)
print('Test loss / test accuracy = {:.4f} / {:.4f}'.format(loss, acc))









    



Training
Train on 56290 samples, validate on 2963 samples
Epoch 1/2
56290/56290 [==============================] - 201s - loss: 0.3902 - acc: 1.0000 - val_loss: 0.3925 - val_acc: 1.0000
Epoch 2/2
56290/56290 [==============================] - 210s - loss: 0.2863 - acc: 1.0000 - val_loss: 0.2576 - val_acc: 1.0000
14814/14814 [==============================] - 5s     
Test loss / test accuracy = 0.2537 / 1.0000