NMT-Keras tutorial


This notebook describes, step by step, how to build a neural machine translation model with NMT-Keras. The tutorial is organized in different sections:

  1. Create a Dataset instance, in order to properly manage the data.
  2. Create and train the Neural Translation Model in the training data.
  3. Apply the trained model on new (unseen) data.

All these steps are automatically run by the toolkit. But, to learn and understand the full process, it is didactic to follow this tutorial.

So, let's start installing the toolkit.


In [1]:
!pip install update pip
!pip uninstall -y keras  # Avoid crashes with pre-installed packages
!git clone https://github.com/lvapeab/nmt-keras
import os
os.chdir('nmt-keras')
!pip install -e .


Collecting update
  Downloading https://files.pythonhosted.org/packages/9f/c4/dfe8a392edd35cc635c35cd3b20df6a746aacdeb39b685d1668b56bf819b/update-0.0.1-py2.py3-none-any.whl
Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (19.3.1)
Collecting style==1.1.0
  Downloading https://files.pythonhosted.org/packages/4c/0b/6be2071e20c621e7beb01b86e8474c2ec344a9750ba5315886f24d6e7386/style-1.1.0-py2.py3-none-any.whl
Installing collected packages: style, update
Successfully installed style-1.1.0 update-0.0.1
Uninstalling Keras-2.3.1:
  Successfully uninstalled Keras-2.3.1
Cloning into 'nmt-keras'...
remote: Enumerating objects: 127, done.
remote: Counting objects: 100% (127/127), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 4667 (delta 82), reused 87 (delta 60), pack-reused 4540
Receiving objects: 100% (4667/4667), 5.70 MiB | 8.93 MiB/s, done.
Resolving deltas: 100% (3152/3152), done.
Obtaining file:///content/nmt-keras
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.3.0)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (0.16.0)
Collecting keras@ https://github.com/MarcBS/keras/archive/master.zip
  Downloading https://github.com/MarcBS/keras/archive/master.zip
     / 110.5MB 2.5MB/s
Requirement already satisfied: keras_applications in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.0.8)
Requirement already satisfied: keras_preprocessing in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.1.0)
Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (2.10.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (3.2.1)
Collecting multimodal-keras-wrapper
  Downloading https://files.pythonhosted.org/packages/5a/d3/4f9297540605e8200c7ef149cee98d2a45dbf3878b9d1e7eda7aabfd56ae/multimodal_keras_wrapper-3.0.10-py3-none-any.whl (126kB)
     |████████████████████████████████| 133kB 8.0MB/s 
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.18.2)
Requirement already satisfied: scikit-image in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (0.16.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (0.22.2.post1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.12.0)
Requirement already satisfied: tables in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (3.4.4)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.0.3)
Collecting sacrebleu
  Downloading https://files.pythonhosted.org/packages/f5/58/5c6cc352ea6271125325950715cf8b59b77abe5e93cf29f6e60b491a31d9/sacrebleu-1.4.6-py3-none-any.whl (59kB)
     |████████████████████████████████| 61kB 8.2MB/s 
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/99/50/93509f906a40bffd7d175f97fd75ea328ad9bd91f48f59c4bd084c94a25e/sacremoses-0.0.41.tar.gz (883kB)
     |████████████████████████████████| 890kB 20.5MB/s 
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from nmt-keras==0.6) (1.4.1)
Collecting tensorflow<2
  Downloading https://files.pythonhosted.org/packages/9a/d9/fd234c7bf68638423fb8e7f44af7fcfce3bcaf416b51e6d902391e47ec43/tensorflow-1.15.2-cp36-cp36m-manylinux2010_x86_64.whl (110.5MB)
     |████████████████████████████████| 110.5MB 1.1MB/s 
Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from keras@ https://github.com/MarcBS/keras/archive/master.zip->nmt-keras==0.6) (3.13)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->nmt-keras==0.6) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->nmt-keras==0.6) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->nmt-keras==0.6) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->nmt-keras==0.6) (2.4.7)
Requirement already satisfied: sklearn in /usr/local/lib/python3.6/dist-packages (from multimodal-keras-wrapper->nmt-keras==0.6) (0.0)
Requirement already satisfied: cython in /usr/local/lib/python3.6/dist-packages (from multimodal-keras-wrapper->nmt-keras==0.6) (0.29.16)
Requirement already satisfied: toolz in /usr/local/lib/python3.6/dist-packages (from multimodal-keras-wrapper->nmt-keras==0.6) (0.10.0)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->nmt-keras==0.6) (1.1.1)
Requirement already satisfied: pillow>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->nmt-keras==0.6) (7.0.0)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->nmt-keras==0.6) (2.4)
Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image->nmt-keras==0.6) (2.4.1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->nmt-keras==0.6) (0.14.1)
Requirement already satisfied: numexpr>=2.5.2 in /usr/local/lib/python3.6/dist-packages (from tables->nmt-keras==0.6) (2.7.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->nmt-keras==0.6) (2018.9)
Requirement already satisfied: typing in /usr/local/lib/python3.6/dist-packages (from sacrebleu->nmt-keras==0.6) (3.6.6)
Collecting portalocker
  Downloading https://files.pythonhosted.org/packages/53/84/7b3146ec6378d28abc73ab484f09f47dfa008ad6f03f33d90a369f880e25/portalocker-1.7.0-py2.py3-none-any.whl
Collecting mecab-python3
  Downloading https://files.pythonhosted.org/packages/18/49/b55a839a77189042960bf96490640c44816073f917d489acbc5d79fa5cc3/mecab_python3-0.996.5-cp36-cp36m-manylinux2010_x86_64.whl (17.1MB)
     |████████████████████████████████| 17.1MB 205kB/s 
Requirement already satisfied: regex in /usr/local/lib/python3.6/dist-packages (from sacremoses->nmt-keras==0.6) (2019.12.20)
Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->nmt-keras==0.6) (7.1.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from sacremoses->nmt-keras==0.6) (4.38.0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (0.8.1)
Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (3.2.0)
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading https://files.pythonhosted.org/packages/1e/e9/d3d747a97f7188f48aa5eda486907f3b345cd409f0a0850468ba867db246/tensorboard-1.15.0-py3-none-any.whl (3.8MB)
     |████████████████████████████████| 3.8MB 53.7MB/s 
Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (3.10.0)
Collecting tensorflow-estimator==1.15.1
  Downloading https://files.pythonhosted.org/packages/de/62/2ee9cd74c9fa2fa450877847ba560b260f5d0fb70ee0595203082dafcc9d/tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503kB)
     |████████████████████████████████| 512kB 54.0MB/s 
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (1.1.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (0.34.2)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (0.9.0)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (1.12.1)
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow<2->nmt-keras==0.6) (1.28.1)
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx>=2.0->scikit-image->nmt-keras==0.6) (4.4.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.16.0,>=1.15.0->tensorflow<2->nmt-keras==0.6) (3.2.1)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.16.0,>=1.15.0->tensorflow<2->nmt-keras==0.6) (46.1.3)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.16.0,>=1.15.0->tensorflow<2->nmt-keras==0.6) (1.0.1)
Building wheels for collected packages: keras, sacremoses, gast
  Building wheel for keras (setup.py) ... done
  Created wheel for keras: filename=Keras-2.3.1-cp36-none-any.whl size=487441 sha256=88ab86f6c833c38b953af16463362ef743be231bc4ed4019cf832e0f36febbe4
  Stored in directory: /tmp/pip-ephem-wheel-cache-fmz7jtic/wheels/82/f8/db/7c0c999dced9850abb60944d255a31dbdf10f76f645454b715
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.41-cp36-none-any.whl size=893334 sha256=33a91fa438ba96f3f40aefdcd425f673aed0f71746d5c87c52b895ad98a44e72
  Stored in directory: /root/.cache/pip/wheels/22/5a/d4/b020a81249de7dc63758a34222feaa668dbe8ebfe9170cc9b1
  Building wheel for gast (setup.py) ... done
  Created wheel for gast: filename=gast-0.2.2-cp36-none-any.whl size=7540 sha256=3fae3882844de912f8f4c5207c8a58630c7f9157f23f87838492f21867212b67
  Stored in directory: /root/.cache/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd
Successfully built keras sacremoses gast
Installing collected packages: keras, sacremoses, portalocker, mecab-python3, sacrebleu, multimodal-keras-wrapper, tensorboard, tensorflow-estimator, gast, tensorflow, nmt-keras
  Found existing installation: tensorboard 2.2.0
    Uninstalling tensorboard-2.2.0:
      Successfully uninstalled tensorboard-2.2.0
  Found existing installation: tensorflow-estimator 2.2.0rc0
    Uninstalling tensorflow-estimator-2.2.0rc0:
      Successfully uninstalled tensorflow-estimator-2.2.0rc0
  Found existing installation: gast 0.3.3
    Uninstalling gast-0.3.3:
      Successfully uninstalled gast-0.3.3
  Found existing installation: tensorflow 2.2.0rc2
    Uninstalling tensorflow-2.2.0rc2:
      Successfully uninstalled tensorflow-2.2.0rc2
  Running setup.py develop for nmt-keras
Successfully installed gast-0.2.2 keras-2.3.1 mecab-python3-0.996.5 multimodal-keras-wrapper-3.0.10 nmt-keras portalocker-1.7.0 sacrebleu-1.4.6 sacremoses-0.0.41 tensorboard-1.15.0 tensorflow-1.15.2 tensorflow-estimator-1.15.1

1. Building a Dataset model

First, we are creating a Dataset object (from the Multimodal Keras Wrapper library). This object will be the interface between our data (text files) and the model:


In [0]:
from keras_wrapper.dataset import Dataset, saveDataset
from data_engine.prepare_data import keep_n_captions
ds = Dataset('tutorial_dataset', 'tutorial', silence=False)

Now that we have the empty dataset, we must indicate its inputs and outputs. In our case, we'll have two different inputs and one single output:

  1. Outputs: target_text: Sentences in our target language.

  2. Inputs: source_text: Sentences in the source language.

state_below: Sentences in the target language, but shifted one position to the right (for teacher-forcing training of the model).

For setting up the outputs, we use the setOutputs function, with the appropriate parameters. Note that, when we are building the dataset for the training split, we build the vocabulary (up to 30000 words).


In [3]:
ds.setOutput('examples/EuTrans/training.en',
             'train',
             type='text',
             id='target_text',
             tokenization='tokenize_none',
             build_vocabulary=True,
             pad_on_batch=True,
             sample_weights=True,
             max_text_len=30,
             max_words=30000,
             min_occ=0)

ds.setOutput('examples/EuTrans/dev.en',
             'val',
             type='text',
             id='target_text',
             pad_on_batch=True,
             tokenization='tokenize_none',
             sample_weights=True,
             max_text_len=30,
             max_words=0)


[14/04/2020 11:18:59] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:18:59] Creating vocabulary for data with data_id 'target_text'.
[14/04/2020 11:18:59] 	 Total: 513 unique words in 9900 sentences with a total of 98304 words.
[14/04/2020 11:18:59] Creating dictionary of 30000 most common words, covering 100.0% of the text.
[14/04/2020 11:18:59] Loaded "train" set outputs of data_type "text" with data_id "target_text" and length 9900.
[14/04/2020 11:18:59] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:18:59] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 100.

Similarly, we introduce the source text data, with the setInputs function. Again, when building the training split, we must construct the vocabulary.


In [4]:
ds.setInput('examples/EuTrans/training.es',
            'train',
            type='text',
            id='source_text',
            pad_on_batch=True,
            tokenization='tokenize_none',
            build_vocabulary=True,
            fill='end',
            max_text_len=30,
            max_words=30000,
            min_occ=0)
ds.setInput('examples/EuTrans/dev.es',
            'val',
            type='text',
            id='source_text',
            pad_on_batch=True,
            tokenization='tokenize_none',
            fill='end',
            max_text_len=30,
            min_occ=0)


[14/04/2020 11:18:59] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:18:59] Creating vocabulary for data with data_id 'source_text'.
[14/04/2020 11:18:59] 	 Total: 686 unique words in 9900 sentences with a total of 96172 words.
[14/04/2020 11:18:59] Creating dictionary of 30000 most common words, covering 100.0% of the text.
[14/04/2020 11:18:59] Loaded "train" set inputs of data_type "text" with data_id "source_text" and length 9900.
[14/04/2020 11:18:59] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:18:59] Loaded "val" set inputs of data_type "text" with data_id "source_text" and length 100.

...and for the 'state_below' data. Note that: 1) The offset flat is set to 1, which means that the text will be shifted to the right 1 position. 2) During sampling time, we won't have this input. Hence, we 'hack' the dataset model by inserting an artificial input, of type 'ghost' for the validation split.


In [5]:
ds.setInput('examples/EuTrans/training.en',
            'train',
            type='text',
            id='state_below',
            required=False,
            tokenization='tokenize_none',
            pad_on_batch=True,
            build_vocabulary='target_text',
            offset=1,
            fill='end',
            max_text_len=30,
            max_words=30000)
ds.setInput(None,
            'val',
            type='ghost',
            id='state_below',
            required=False)


[14/04/2020 11:18:59] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:18:59] 	Reusing vocabulary named "target_text" for data with data_id "state_below".
[14/04/2020 11:18:59] Loaded "train" set inputs of data_type "text" with data_id "state_below" and length 9900.
[14/04/2020 11:18:59] Loaded "val" set inputs of data_type "ghost" with data_id "state_below" and length 100.

We can also keep the literal source words (for replacing unknown words).


In [6]:
for split, input_text_filename in zip(['train', 'val'], ['examples/EuTrans/training.es', 'examples/EuTrans/dev.es']):
    ds.setRawInput(input_text_filename,
                  split,
                  type='file-name',
                  id='raw_source_text',
                  overwrite_split=True)


[14/04/2020 11:18:59] Loaded "train" set inputs of type "file-name" with id "raw_source_text".
[14/04/2020 11:18:59] Loaded "val" set inputs of type "file-name" with id "raw_source_text".

We also need to match the references with the inputs. Since we only have one reference per input sample, we set repeat=1.


In [7]:
keep_n_captions(ds, repeat=1, n=1, set_names=['val'])


[14/04/2020 11:18:59] Keeping 1 captions per input on the val set.
[14/04/2020 11:18:59] Samples reduced to 100 in val set.

Finally, we can save our dataset instance for using in other experiments:


In [8]:
saveDataset(ds, 'datasets')


[14/04/2020 11:18:59] <<< creating directory datasets ... >>>
[14/04/2020 11:18:59] <<< Saving Dataset instance to datasets/Dataset_tutorial_dataset.pkl ... >>>
[14/04/2020 11:18:59] <<< Dataset instance saved >>>

2. Creating and training a Neural Translation Model

Now, we'll create and train a Neural Machine Translation (NMT) model. Since there is a significant number of hyperparameters, we'll use the default ones, specified in the config.py file. Note that almost every hardcoded parameter is automatically set from config if we run main.py.

We'll create an 'AttentionRNNEncoderDecoder' (a LSTM encoder-decoder with attention mechanism). Refer to the model_zoo.py file for other models (e.g. Transformer).

So first, let's import the model and the hyperparameters. We'll also load the dataset we stored in the previous section (not necessary as it is in memory, but as a demonstration):


In [9]:
from config import load_parameters
from nmt_keras.model_zoo import TranslationModel
from keras_wrapper.cnn_model import loadModel
from keras_wrapper.dataset import loadDataset
from keras_wrapper.extra.callbacks import PrintPerformanceMetricOnEpochEndOrEachNUpdates
params = load_parameters()
dataset = loadDataset('datasets/Dataset_tutorial_dataset.pkl')


Using TensorFlow backend.

The default version of TensorFlow in Colab will switch to TensorFlow 2.x on the 27th of March, 2020.
We recommend you upgrade now or ensure your notebook will continue to use TensorFlow 1.x via the %tensorflow_version 1.x magic: .

[14/04/2020 11:19:02] <<< Loading Dataset instance from datasets/Dataset_tutorial_dataset.pkl ... >>>
[14/04/2020 11:19:02] <<< Dataset instance loaded >>>

Since the number of words in the dataset may be unknown beforehand, we must update the params information according to the dataset instance:


In [0]:
params['INPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len['source_text']
params['OUTPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len['target_text']

Now, we create a TranslationModel instance:


In [11]:
params['MODEL_TYPE'] = 'AttentionRNNEncoderDecoder' #  Supported models: 'AttentionRNNEncoderDecoder' and 'Transformer'.
nmt_model = TranslationModel(params,
                             model_type=params['MODEL_TYPE'], 
                             model_name='tutorial_model',
                             vocabularies=dataset.vocabulary,
                             store_path='trained_models/tutorial_model/',
                             verbose=True)


[14/04/2020 11:19:02] <<< Building AttentionRNNEncoderDecoder Translation_Model >>>
[14/04/2020 11:19:02] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:650: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[14/04/2020 11:19:02] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4786: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

[14/04/2020 11:19:02] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:157: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

[14/04/2020 11:19:03] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3561: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
-----------------------------------------------------------------------------------
		TranslationModel instance
-----------------------------------------------------------------------------------
_model_type: AttentionRNNEncoderDecoder
name: tutorial_model
model_path: trained_models/tutorial_model/
verbose: True

Params:
	ACCUMULATE_GRADIENTS: 1
	ADDITIONAL_OUTPUT_MERGE_MODE: Add
	ALIGN_FROM_RAW: True
	ALPHA_FACTOR: 0.6
	AMSGRAD: False
	APPLY_DETOKENIZATION: False
	ATTENTION_DROPOUT_P: 0.0
	ATTENTION_MODE: add
	ATTENTION_SIZE: 32
	BATCH_NORMALIZATION_MODE: 1
	BATCH_SIZE: 50
	BEAM_SEARCH: True
	BEAM_SIZE: 6
	BETA_1: 0.9
	BETA_2: 0.999
	BIDIRECTIONAL_DEEP_ENCODER: True
	BIDIRECTIONAL_ENCODER: True
	BIDIRECTIONAL_MERGE_MODE: concat
	BPE_CODES_PATH: examples/EuTrans//training_codes.joint
	CLASSIFIER_ACTIVATION: softmax
	CLIP_C: 5.0
	CLIP_V: 0.0
	COVERAGE_NORM_FACTOR: 0.2
	COVERAGE_PENALTY: False
	DATASET_NAME: EuTrans
	DATASET_STORE_PATH: datasets/
	DATA_AUGMENTATION: False
	DATA_ROOT_PATH: examples/EuTrans/
	DECODER_HIDDEN_SIZE: 32
	DECODER_RNN_TYPE: ConditionalLSTM
	DEEP_OUTPUT_LAYERS: [('linear', 32)]
	DETOKENIZATION_METHOD: detokenize_none
	DOUBLE_STOCHASTIC_ATTENTION_REG: 0.0
	DROPOUT_P: 0.0
	EARLY_STOP: True
	EMBEDDINGS_FREQ: 1
	ENCODER_HIDDEN_SIZE: 32
	ENCODER_RNN_TYPE: LSTM
	EPOCHS_FOR_SAVE: 1
	EPSILON: 1e-08
	EVAL_EACH: 1
	EVAL_EACH_EPOCHS: True
	EVAL_ON_SETS: ['val']
	EXTRA_NAME: 
	FF_SIZE: 128
	FILL: end
	FORCE_RELOAD_VOCABULARY: False
	GLOSSARY: None
	HEURISTIC: 0
	HOMOGENEOUS_BATCHES: False
	INIT_ATT: glorot_uniform
	INIT_FUNCTION: glorot_uniform
	INIT_LAYERS: ['tanh']
	INNER_INIT: orthogonal
	INPUTS_IDS_DATASET: ['source_text', 'state_below']
	INPUTS_IDS_MODEL: ['source_text', 'state_below']
	INPUTS_TYPES_DATASET: ['text-features', 'text-features']
	INPUT_VOCABULARY_SIZE: 689
	JOINT_BATCHES: 4
	KERAS_METRICS: ['perplexity']
	LABEL_SMOOTHING: 0.0
	LENGTH_NORM_FACTOR: 0.2
	LENGTH_PENALTY: False
	LOG_DIR: tensorboard_logs
	LOSS: categorical_crossentropy
	LR: 0.001
	LR_DECAY: None
	LR_GAMMA: 0.8
	LR_HALF_LIFE: 100
	LR_REDUCER_EXP_BASE: -0.5
	LR_REDUCER_TYPE: exponential
	LR_REDUCE_EACH_EPOCHS: False
	LR_START_REDUCTION_ON_EPOCH: 0
	MAPPING: examples/EuTrans//mapping.es_en.pkl
	MAXLEN_GIVEN_X: True
	MAXLEN_GIVEN_X_FACTOR: 2
	MAX_EPOCH: 500
	MAX_INPUT_TEXT_LEN: 50
	MAX_OUTPUT_TEXT_LEN: 50
	MAX_OUTPUT_TEXT_LEN_TEST: 150
	METRICS: ['sacrebleu']
	MINLEN_GIVEN_X: True
	MINLEN_GIVEN_X_FACTOR: 3
	MIN_DELTA: 0.0
	MIN_LR: 1e-09
	MIN_OCCURRENCES_INPUT_VOCAB: 0
	MIN_OCCURRENCES_OUTPUT_VOCAB: 0
	MODE: training
	MODEL_NAME: EuTrans_esen_AttentionRNNEncoderDecoder_src_emb_32_bidir_True_enc_LSTM_32_dec_ConditionalLSTM_32_deepout_linear_trg_emb_32_Adam_0.001
	MODEL_SIZE: 32
	MODEL_TYPE: AttentionRNNEncoderDecoder
	MOMENTUM: 0.0
	MULTIHEAD_ATTENTION_ACTIVATION: linear
	NESTEROV_MOMENTUM: False
	NOISE_AMOUNT: 0.01
	NORMALIZE_SAMPLING: False
	N_GPUS: 1
	N_HEADS: 8
	N_LAYERS_DECODER: 1
	N_LAYERS_ENCODER: 1
	N_SAMPLES: 5
	OPTIMIZED_SEARCH: True
	OPTIMIZER: Adam
	OUTPUTS_IDS_DATASET: ['target_text']
	OUTPUTS_IDS_MODEL: ['target_text']
	OUTPUTS_TYPES_DATASET: ['text-features']
	OUTPUT_VOCABULARY_SIZE: 516
	PAD_ON_BATCH: True
	PARALLEL_LOADERS: 1
	PATIENCE: 10
	PLOT_EVALUATION: False
	POS_UNK: True
	REBUILD_DATASET: True
	RECURRENT_DROPOUT_P: 0.0
	RECURRENT_INPUT_DROPOUT_P: 0.0
	RECURRENT_WEIGHT_DECAY: 0.0
	REGULARIZATION_FN: L2
	RELOAD: 0
	RELOAD_EPOCH: True
	RHO: 0.9
	SAMPLE_EACH_UPDATES: 300
	SAMPLE_ON_SETS: ['train', 'val']
	SAMPLE_WEIGHTS: True
	SAMPLING: max_likelihood
	SAMPLING_SAVE_MODE: list
	SAVE_EACH_EVALUATION: True
	SCALE_SOURCE_WORD_EMBEDDINGS: False
	SCALE_TARGET_WORD_EMBEDDINGS: False
	SEARCH_PRUNING: False
	SKIP_VECTORS_HIDDEN_SIZE: 32
	SKIP_VECTORS_SHARED_ACTIVATION: tanh
	SOURCE_TEXT_EMBEDDING_SIZE: 32
	SRC_LAN: es
	SRC_PRETRAINED_VECTORS: None
	SRC_PRETRAINED_VECTORS_TRAINABLE: True
	START_EVAL_ON_EPOCH: 1
	START_SAMPLING_ON_EPOCH: 1
	STOP_METRIC: Bleu_4
	STORE_PATH: trained_models/EuTrans_esen_AttentionRNNEncoderDecoder_src_emb_32_bidir_True_enc_LSTM_32_dec_ConditionalLSTM_32_deepout_linear_trg_emb_32_Adam_0.001/
	TARGET_TEXT_EMBEDDING_SIZE: 32
	TASK_NAME: EuTrans
	TEMPERATURE: 1
	TENSORBOARD: True
	TEXT_FILES: {'train': 'training.', 'val': 'dev.', 'test': 'test.'}
	TIE_EMBEDDINGS: False
	TOKENIZATION_METHOD: tokenize_none
	TOKENIZE_HYPOTHESES: True
	TOKENIZE_REFERENCES: True
	TRAINABLE_DECODER: True
	TRAINABLE_ENCODER: True
	TRAIN_ON_TRAINVAL: False
	TRG_LAN: en
	TRG_PRETRAINED_VECTORS: None
	TRG_PRETRAINED_VECTORS_TRAINABLE: True
	USE_BATCH_NORMALIZATION: True
	USE_CUDNN: False
	USE_L1: False
	USE_L2: False
	USE_NOISE: False
	USE_PRELU: False
	USE_TF_OPTIMIZER: True
	VERBOSE: 1
	WARMUP_EXP: -1.5
	WEIGHT_DECAY: 0.0001
	WRITE_VALID_SAMPLES: True
-----------------------------------------------------------------------------------
Model: "tutorial_model_training"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
source_text (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
source_word_embedding (Embeddin (None, None, 32)     22048       source_text[0][0]                
__________________________________________________________________________________________________
src_embedding_batch_normalizati (None, None, 32)     128         source_word_embedding[0][0]      
__________________________________________________________________________________________________
remove_mask_1 (RemoveMask)      (None, None, 32)     0           src_embedding_batch_normalization
__________________________________________________________________________________________________
bidirectional_encoder_LSTM (Bid (None, None, 64)     16640       remove_mask_1[0][0]              
__________________________________________________________________________________________________
annotations_batch_normalization (None, None, 64)     256         bidirectional_encoder_LSTM[0][0] 
__________________________________________________________________________________________________
source_text_mask (GetMask)      (None, None, 32)     0           src_embedding_batch_normalization
__________________________________________________________________________________________________
annotations (ApplyMask)         (None, None, 64)     0           annotations_batch_normalization[0
                                                                 source_text_mask[0][0]           
__________________________________________________________________________________________________
state_below (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
ctx_mean (MaskedMean)           (None, 64)           0           annotations[0][0]                
__________________________________________________________________________________________________
target_word_embedding (Embeddin (None, None, 32)     16512       state_below[0][0]                
__________________________________________________________________________________________________
initial_state (Dense)           (None, 32)           2080        ctx_mean[0][0]                   
__________________________________________________________________________________________________
initial_memory (Dense)          (None, 32)           2080        ctx_mean[0][0]                   
__________________________________________________________________________________________________
state_below_batch_normalization (None, None, 32)     128         target_word_embedding[0][0]      
__________________________________________________________________________________________________
initial_state_batch_normalizati (None, 32)           128         initial_state[0][0]              
__________________________________________________________________________________________________
initial_memory_batch_normalizat (None, 32)           128         initial_memory[0][0]             
__________________________________________________________________________________________________
decoder_AttConditionalLSTMCond  [(None, None, 32), ( 23873       state_below_batch_normalization[0
                                                                 annotations[0][0]                
                                                                 initial_state_batch_normalization
                                                                 initial_memory_batch_normalizatio
__________________________________________________________________________________________________
proj_h0_batch_normalization (Ba (None, None, 32)     128         decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
logit_ctx (TimeDistributed)     (None, None, 32)     2080        decoder_AttConditionalLSTMCond[0]
__________________________________________________________________________________________________
logit_lstm (TimeDistributed)    (None, None, 32)     1056        proj_h0_batch_normalization[0][0]
__________________________________________________________________________________________________
permute_general_1 (PermuteGener (None, None, 32)     0           logit_ctx[0][0]                  
__________________________________________________________________________________________________
logit_emb (TimeDistributed)     (None, None, 32)     1056        state_below_batch_normalization[0
__________________________________________________________________________________________________
out_layer_mlp_batch_normalizati (None, None, 32)     128         logit_lstm[0][0]                 
__________________________________________________________________________________________________
out_layer_ctx_batch_normalizati (None, None, 32)     128         permute_general_1[0][0]          
__________________________________________________________________________________________________
out_layer_emb_batch_normalizati (None, None, 32)     128         logit_emb[0][0]                  
__________________________________________________________________________________________________
additional_input (Add)          (None, None, 32)     0           out_layer_mlp_batch_normalization
                                                                 out_layer_ctx_batch_normalization
                                                                 out_layer_emb_batch_normalization
__________________________________________________________________________________________________
activation_1 (Activation)       (None, None, 32)     0           additional_input[0][0]           
__________________________________________________________________________________________________
linear_0 (TimeDistributed)      (None, None, 32)     1056        activation_1[0][0]               
__________________________________________________________________________________________________
out_layer_linear_0_batch_normal (None, None, 32)     128         linear_0[0][0]                   
__________________________________________________________________________________________________
target_text (TimeDistributed)   (None, None, 516)    17028       out_layer_linear_0_batch_normaliz
==================================================================================================
Total params: 106,917
Trainable params: 106,213
Non-trainable params: 704
__________________________________________________________________________________________________
[14/04/2020 11:19:03] From /content/nmt-keras/nmt_keras/model_zoo.py:201: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

[14/04/2020 11:19:03] Preparing optimizer and compiling. Optimizer configuration: 
	 LR: 0.001
	 LOSS: categorical_crossentropy
	 BETA_1: 0.9
	 BETA_2: 0.999
	 EPSILON: 1e-08
[14/04/2020 11:19:03] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1192: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Next, we must define the inputs and outputs mapping from our Dataset instance to our model:


In [0]:
inputMapping = dict()
for i, id_in in enumerate(params['INPUTS_IDS_DATASET']):
    pos_source = dataset.ids_inputs.index(id_in)
    id_dest = nmt_model.ids_inputs[i]
    inputMapping[id_dest] = pos_source
nmt_model.setInputsMapping(inputMapping)

outputMapping = dict()
for i, id_out in enumerate(params['OUTPUTS_IDS_DATASET']):
    pos_target = dataset.ids_outputs.index(id_out)
    id_dest = nmt_model.ids_outputs[i]
    outputMapping[id_dest] = pos_target
nmt_model.setOutputsMapping(outputMapping)

We can add some callbacks for controlling the training (e.g. Sampling each N updates, early stop, learning rate annealing...). For instance, let's build a sampling callback. After each epoch, it will compute the BLEU scores on the development set using the sacreBLEU package. We need to pass some configuration variables to the callback (in the extra_vars dictionary):


In [0]:
is_transformer = params.get('ATTEND_ON_OUTPUT', 'transformer' in params['MODEL_TYPE'].lower())
search_params = {
    'language': 'en',
    'tokenize_f': eval('dataset.' + 'tokenize_none'),
    'beam_size': 12,
    'optimized_search': True,
    'model_inputs': params['INPUTS_IDS_MODEL'],
    'model_outputs': params['OUTPUTS_IDS_MODEL'],
    'dataset_inputs':  params['INPUTS_IDS_DATASET'],
    'dataset_outputs':  params['OUTPUTS_IDS_DATASET'],
    'n_parallel_loaders': 1,
    'maxlen': 50,
    'normalize_probs': True,
    'pos_unk': True and not is_transformer,  # Pos_unk is unimplemented for transformer models
    'heuristic': 0,
    'state_below_maxlen': -1,
    'attend_on_output': is_transformer,
    'val': {'references': dataset.extra_variables['val']['target_text']}
  }

vocab = dataset.vocabulary['target_text']['idx2words']
callbacks = []
input_text_id = params['INPUTS_IDS_DATASET'][0]

callbacks.append(PrintPerformanceMetricOnEpochEndOrEachNUpdates(nmt_model,
                                                                dataset,
                                                                gt_id='target_text',
                                                                metric_name=['sacrebleu'],
                                                                set_name=['val'],
                                                                batch_size=50,
                                                                each_n_epochs=1,
                                                                extra_vars=search_params,
                                                                reload_epoch=0,
                                                                is_text=True,
                                                                input_text_id=input_text_id,
                                                                index2word_y=vocab,
                                                                sampling_type='max_likelihood',
                                                                beam_search=True,
                                                                save_path=nmt_model.model_path,
                                                                start_eval_on_epoch=0,
                                                                write_samples=True,
                                                                write_type='list',
                                                                verbose=True))

Now we are ready to train. Let's set up some training parameters...


In [0]:
training_params = {'n_epochs': 4,
                   'batch_size': 50,
                   'maxlen': 30,
                   'epochs_for_save': 1,
                   'verbose': 1,
                   'eval_on_sets': [], 
                   'n_parallel_loaders': 1,
                   'extra_callbacks': callbacks,
                   'reload_epoch': 0,
                   'epoch_offset': 0}

And train!


In [15]:
nmt_model.trainNet(dataset, training_params)


[14/04/2020 11:19:03] <<< Training model >>>
[14/04/2020 11:19:03] Training parameters: { 
	batch_size: 50
	class_weights: None
	da_enhance_list: []
	da_patch_type: resize_and_rndcrop
	data_augmentation: False
	each_n_epochs: 1
	epoch_offset: 0
	epochs_for_save: 1
	eval_on_epochs: True
	eval_on_sets: []
	extra_callbacks: [<keras_wrapper.extra.callbacks.EvalPerformance object at 0x7fafbd6275f8>]
	homogeneous_batches: False
	initial_lr: 1.0
	joint_batches: 4
	lr_decay: None
	lr_gamma: 0.1
	lr_half_life: 50000
	lr_reducer_exp_base: 0.5
	lr_reducer_type: linear
	lr_warmup_exp: -1.5
	maxlen: 30
	mean_substraction: False
	metric_check: None
	min_delta: 0.0
	min_lr: 1e-09
	n_epochs: 4
	n_gpus: 1
	n_parallel_loaders: 1
	normalization_type: None
	normalize: False
	num_iterations_val: None
	patience: 0
	patience_check_split: val
	reduce_each_epochs: True
	reload_epoch: 0
	shuffle: True
	start_eval_on_epoch: 0
	start_reduction_on_epoch: 0
	tensorboard: False
	tensorboard_params: {'log_dir': 'tensorboard_logs', 'histogram_freq': 0, 'batch_size': 50, 'write_graph': True, 'write_grads': False, 'write_images': False, 'embeddings_freq': 0, 'embeddings_layer_names': None, 'embeddings_metadata': None, 'update_freq': 'epoch'}
	verbose: 1
	wo_da_patch_type: whole
}
[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3315: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:292: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:299: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:312: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:321: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

[14/04/2020 11:19:07] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:328: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

Epoch 1/4
198/198 [==============================] - 21s 104ms/step - loss: 1.8991 - perplexity: 123.1111
[14/04/2020 11:19:28] <<< Saving model to trained_models/tutorial_model/epoch_1 ... >>>

/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:165: UserWarning: TensorFlow optimizers do not make it possible to access optimizer attributes or optimizer state after instantiation. As a result, we cannot save the optimizer as part of the model save file.You will have to compile your model again after loading it. Prefer using a Keras optimizer instead (see keras.io/optimizers).
  'TensorFlow optimizers do not '
[14/04/2020 11:19:31] <<< Model saved >>>

[14/04/2020 11:19:31] <<< Predicting outputs of val set >>>

 Total cost: 593.549924 	 Average cost: 5.935499
The sampling took: 11.279574 secs (Speed: 0.112796 sec/sample)
[14/04/2020 11:19:42] Prediction output 0: target_text (text)
[14/04/2020 11:19:42] Decoding beam search prediction ...
[14/04/2020 11:19:42] Using heuristic 0
[14/04/2020 11:19:42] Evaluating on metric sacrebleu

[14/04/2020 11:19:42] Computing SacreBleu scores on the val split...
[14/04/2020 11:19:42] Bleu_4: 1.2871960240445284
[14/04/2020 11:19:42] Done evaluating on metric sacrebleu

[14/04/2020 11:19:42] <<< Progress plot saved in trained_models/tutorial_model/epoch_1.jpg >>>
Epoch 2/4
198/198 [==============================] - 19s 98ms/step - loss: 0.6560 - perplexity: 23.3535
[14/04/2020 11:20:01] <<< Saving model to trained_models/tutorial_model/epoch_2 ... >>>

[14/04/2020 11:20:02] <<< Model saved >>>

[14/04/2020 11:20:02] <<< Predicting outputs of val set >>>

 Total cost: 515.741690 	 Average cost: 5.157417
The sampling took: 5.638581 secs (Speed: 0.056386 sec/sample)
[14/04/2020 11:20:07] Prediction output 0: target_text (text)

[14/04/2020 11:20:07] Decoding beam search prediction ...
[14/04/2020 11:20:07] Using heuristic 0
[14/04/2020 11:20:07] Evaluating on metric sacrebleu
[14/04/2020 11:20:07] Computing SacreBleu scores on the val split...
[14/04/2020 11:20:07] Bleu_4: 31.511534690916882
[14/04/2020 11:20:07] Done evaluating on metric sacrebleu

[14/04/2020 11:20:07] <<< Progress plot saved in trained_models/tutorial_model/epoch_2.jpg >>>
Epoch 3/4
198/198 [==============================] - 19s 96ms/step - loss: 0.4077 - perplexity: 12.3151
[14/04/2020 11:20:26] <<< Saving model to trained_models/tutorial_model/epoch_3 ... >>>

[14/04/2020 11:20:27] <<< Model saved >>>

[14/04/2020 11:20:27] <<< Predicting outputs of val set >>>

 Total cost: 353.992723 	 Average cost: 3.539927
The sampling took: 4.853681 secs (Speed: 0.048537 sec/sample)
[14/04/2020 11:20:31] Prediction output 0: target_text (text)
[14/04/2020 11:20:31] Decoding beam search prediction ...
[14/04/2020 11:20:31] Using heuristic 0
[14/04/2020 11:20:31] Evaluating on metric sacrebleu

[14/04/2020 11:20:31] Computing SacreBleu scores on the val split...
[14/04/2020 11:20:31] Bleu_4: 51.77515719698533
[14/04/2020 11:20:31] Done evaluating on metric sacrebleu

[14/04/2020 11:20:31] <<< Progress plot saved in trained_models/tutorial_model/epoch_3.jpg >>>
Epoch 4/4
198/198 [==============================] - 19s 96ms/step - loss: 0.3098 - perplexity: 8.7050
[14/04/2020 11:20:51] <<< Saving model to trained_models/tutorial_model/epoch_4 ... >>>

[14/04/2020 11:20:51] <<< Model saved >>>

[14/04/2020 11:20:51] <<< Predicting outputs of val set >>>

 Total cost: 276.961603 	 Average cost: 2.769616
The sampling took: 4.989137 secs (Speed: 0.049891 sec/sample)
[14/04/2020 11:20:56] Prediction output 0: target_text (text)
[14/04/2020 11:20:56] Decoding beam search prediction ...
[14/04/2020 11:20:56] Using heuristic 0
[14/04/2020 11:20:56] Evaluating on metric sacrebleu
[14/04/2020 11:20:56] Computing SacreBleu scores on the val split...

[14/04/2020 11:20:56] Bleu_4: 67.47131115833595
[14/04/2020 11:20:56] Done evaluating on metric sacrebleu

[14/04/2020 11:20:56] <<< Progress plot saved in trained_models/tutorial_model/epoch_4.jpg >>>
[14/04/2020 11:20:56] <<< Finished training model >>>

3. Decoding with a trained Neural Machine Translation Model

Now, we'll load from disk the model we just trained and we'll apply it for translating new text. In this case, we want to translate the 'test' split from our dataset.

Since we want to translate a new data split ('test') we must add it to the dataset instance, just as we did before (at the first tutorial). In case we also had the refences of the test split and we wanted to evaluate it, we can add it to the dataset. Note that this is not mandatory and we could just predict without evaluating.


In [16]:
dataset.setInput('examples/EuTrans/test.es',
            'test',
            type='text',
            id='source_text',
            pad_on_batch=True,
            tokenization='tokenize_none',
            fill='end',
            max_text_len=30,
            min_occ=0)

dataset.setInput(None,
            'test',
            type='ghost',
            id='state_below',
            required=False)

dataset.setRawInput('examples/EuTrans/test.es',
              'test',
              type='file-name',
              id='raw_source_text',
              overwrite_split=True)


[14/04/2020 11:20:56] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:20:56] Loaded "test" set inputs of data_type "text" with data_id "source_text" and length 2996.
[14/04/2020 11:20:56] Loaded "test" set inputs of data_type "ghost" with data_id "state_below" and length 2996.
[14/04/2020 11:20:56] Loaded "test" set inputs of type "file-name" with id "raw_source_text".

Now, let's load the translation model. Suppose we want to load the model saved at the end of the epoch 4:


In [17]:
params['INPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len[params['INPUTS_IDS_DATASET'][0]]
params['OUTPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len[params['OUTPUTS_IDS_DATASET'][0]]

# Load model
nmt_model = loadModel('trained_models/tutorial_model', 4)


[14/04/2020 11:20:56] <<< Loading model from trained_models/tutorial_model/epoch_4_Model_Wrapper.pkl ... >>>
[14/04/2020 11:20:56] <<< Loading model from trained_models/tutorial_model/epoch_4.h5 ... >>>
[14/04/2020 11:20:58] <<< Loading optimized model... >>>
[14/04/2020 11:21:02] <<< Optimized model loaded. >>>
[14/04/2020 11:21:02] <<< Model loaded in 5.9873 seconds. >>>

Once we loaded the model, we just have to invoke the sampling method (in this case, the Beam Search algorithm) for the 'test' split:


In [18]:
is_transformer = params.get('ATTEND_ON_OUTPUT', 'transformer' in params['MODEL_TYPE'].lower())

params_prediction = {
    'language': 'en',
    'tokenize_f': eval('dataset.' + 'tokenize_none'),
    'beam_size': 12,
    'optimized_search': True,
    'model_inputs': params['INPUTS_IDS_MODEL'],
    'model_outputs': params['OUTPUTS_IDS_MODEL'],
    'dataset_inputs':  params['INPUTS_IDS_DATASET'],
    'dataset_outputs':  params['OUTPUTS_IDS_DATASET'],
    'n_parallel_loaders': 1,
    'maxlen': 50,
    'normalize_probs': True,
    'pos_unk': True and not is_transformer,
    'heuristic': 0,
    'state_below_maxlen': -1,
    'predict_on_sets': ['test'],
    'verbose': 0,
    'attend_on_output': is_transformer
  }
predictions = nmt_model.predictBeamSearchNet(dataset, params_prediction)['test']


[14/04/2020 11:21:02] <<< Predicting outputs of test set >>>

 Total cost: 11209.612887 	 Average cost: 3.741526
The sampling took: 176.808622 secs (Speed: 0.059015 sec/sample)

Up to now, in the variable 'predictions', we have the indices of the words of the hypotheses. We must decode them into words. For doing this, we'll use the dictionary stored in the dataset object:


In [19]:
from keras_wrapper.utils import decode_predictions_beam_search
vocab = dataset.vocabulary['target_text']['idx2words']
samples = predictions['samples'] # Get word indices from the samples.

predictions = decode_predictions_beam_search(samples,  
                                             vocab,
                                             verbose=params['VERBOSE'])


[14/04/2020 11:23:59] Decoding beam search prediction ...

Finally, we store the hypotheses:


In [20]:
filepath = 'test.pred'
from keras_wrapper.extra.read_write import list2file
list2file(filepath, predictions)
!head -n 4 test.pred


I would like to book a room until tomorrow , please .
please wake us up tomorrow at a quarter past one .
I am leaving today in the afternoon .
would you mind taxi for me , please ?

If we have the references of this split, we can also evaluate the performance of our system on it. First, we must add them to the dataset object:


In [21]:
dataset.setOutput('examples/EuTrans/test.en',
             'test',
             type='text',
             id='target_text',
             pad_on_batch=True,
             tokenization='tokenize_none',
             sample_weights=True,
             max_text_len=30,
             max_words=0)
keep_n_captions(dataset, repeat=1, n=1, set_names=['test'])


[14/04/2020 11:24:01] 	Applying tokenization function: "tokenize_none".
[14/04/2020 11:24:01] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 2996.
[14/04/2020 11:24:01] Keeping 1 captions per input on the test set.
[14/04/2020 11:24:01] Samples reduced to 2996 in test set.

Next, we call the evaluation system: the sacreBLEU package:


In [22]:
from keras_wrapper.extra.evaluation import select
metric = 'sacrebleu'
# Apply sampling
extra_vars = dict()
extra_vars['tokenize_f'] = eval('dataset.' + 'tokenize_none')
extra_vars['language'] = params['TRG_LAN']
extra_vars['test'] = dict()
extra_vars['test']['references'] = dataset.extra_variables['test']['target_text']
metrics = select[metric](pred_list=predictions,
                                          verbose=1,
                                          extra_vars=extra_vars,
                                          split='test')


[14/04/2020 11:24:02] Computing SacreBleu scores on the test split...
[14/04/2020 11:24:02] Bleu_4: 60.97881069640092

And that's all!