RNN

Overview

When human is thinking, they are thinking based on the understanding of previous time steps but not from scratch. Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to do sentimental analysis of some texts. It will be unclear if the traditional network cannot recognize the short phrase and sentences.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the above loop:

As demonstrated in the book, recurrent neural networks may be connected in many different ways: sequences in the input, the output, or in the most general case both.

Implementation

In our case, we implemented rnn with modules offered by the package of keras. To use keras and our module, you must have both tensorflow and keras installed as a prerequisite. keras offered very well defined high-level neural networks API which allows for easy and fast prototyping. keras supports many different types of networks such as convolutional and recurrent neural networks as well as user-defined networks. About how to get started with keras, please read the tutorial.

To view our implementation of a simple rnn, please use the following code:


In [1]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
import os, sys
sys.path = [os.path.abspath("../../")] + sys.path
from deep_learning4e import *
from notebook4e import *


Using TensorFlow backend.

In [2]:
psource(SimpleRNNLearner)


def SimpleRNNLearner(train_data, val_data, epochs=2):
    """
    RNN example for text sentimental analysis.
    :param train_data: a tuple of (training data, targets)
            Training data: ndarray taking training examples, while each example is coded by embedding
            Targets: ndarray taking targets of each example. Each target is mapped to an integer.
    :param val_data: a tuple of (validation data, targets)
    :param epochs: number of epochs
    :return: a keras model
    """

    total_inputs = 5000
    input_length = 500

    # init data
    X_train, y_train = train_data
    X_val, y_val = val_data

    # init a the sequential network (embedding layer, rnn layer, dense layer)
    model = Sequential()
    model.add(Embedding(total_inputs, 32, input_length=input_length))
    model.add(SimpleRNN(units=128))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    # train the model
    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=epochs, batch_size=128, verbose=2)

    return model

train_data and val_data are needed when creating a simple rnn learner. Both attributes take lists of examples and the targets in a tuple. Please note that we build the network by adding layers to a Sequential() model which means data are passed through the network one by one. SimpleRNN layer is the key layer of rnn which acts the recursive role. Both Embedding and Dense layers before and after the rnn layer are used to map inputs and outputs to data in rnn form. And the optimizer used in this case is the Adam optimizer.

Example

Here is an example of how we train the rnn network made with keras. In this case, we used the IMDB dataset which can be viewed here in detail. In short, the dataset is consist of movie reviews in text and their labels of sentiment (positive/negative). After loading the dataset we use keras_dataset_loader to split it into training, validation and test datasets.


In [3]:
from keras.datasets import imdb
data = imdb.load_data(num_words=5000)
train, val, test = keras_dataset_loader(data)

Then we build and train the rnn model for 10 epochs:


In [4]:
model = SimpleRNNLearner(train, val, epochs=10)


WARNING: Logging before flag parsing goes to stderr.
W1018 22:51:23.614058 140557804885824 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1018 22:51:24.267649 140557804885824 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 24990 samples, validate on 25000 samples
Epoch 1/10
 - 59s - loss: 0.6540 - accuracy: 0.5959 - val_loss: 0.6234 - val_accuracy: 0.6488
Epoch 2/10
 - 61s - loss: 0.5977 - accuracy: 0.6766 - val_loss: 0.6202 - val_accuracy: 0.6326
Epoch 3/10
 - 61s - loss: 0.5269 - accuracy: 0.7356 - val_loss: 0.4803 - val_accuracy: 0.7789
Epoch 4/10
 - 61s - loss: 0.4159 - accuracy: 0.8130 - val_loss: 0.5640 - val_accuracy: 0.7046
Epoch 5/10
 - 61s - loss: 0.3931 - accuracy: 0.8294 - val_loss: 0.4707 - val_accuracy: 0.8090
Epoch 6/10
 - 61s - loss: 0.3357 - accuracy: 0.8637 - val_loss: 0.4177 - val_accuracy: 0.8122
Epoch 7/10
 - 61s - loss: 0.3552 - accuracy: 0.8594 - val_loss: 0.4652 - val_accuracy: 0.7889
Epoch 8/10
 - 61s - loss: 0.3286 - accuracy: 0.8686 - val_loss: 0.4708 - val_accuracy: 0.7785
Epoch 9/10
 - 61s - loss: 0.3428 - accuracy: 0.8635 - val_loss: 0.4332 - val_accuracy: 0.8137
Epoch 10/10
 - 61s - loss: 0.3650 - accuracy: 0.8471 - val_loss: 0.4673 - val_accuracy: 0.7914

The accuracy of the training dataset and validation dataset are both over 80% which is very promising. Now let's try on some random examples in the test set:

Autoencoder

Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. It works by compressing the input into a latent-space representation, to do transformations on the data.

Autoencoders are learned automatically from data examples. It means that it is easy to train specialized instances of the algorithm that will perform well on a specific type of input and that it does not require any new engineering, only the appropriate training data.

Autoencoders have different architectures for different kinds of data. Here we only provide a simple example of a vanilla encoder, which means they're only one hidden layer in the network:

You can view the source code by:


In [5]:
psource(AutoencoderLearner)


def AutoencoderLearner(inputs, encoding_size, epochs=200):
    """
    Simple example of linear auto encoder learning producing the input itself.
    :param inputs: a batch of input data in np.ndarray type
    :param encoding_size: int, the size of encoding layer
    :param epochs: number of epochs
    :return: a keras model
    """

    # init data
    input_size = len(inputs[0])

    # init model
    model = Sequential()
    model.add(Dense(encoding_size, input_dim=input_size, activation='relu', kernel_initializer='random_uniform',
                    bias_initializer='ones'))
    model.add(Dense(input_size, activation='relu', kernel_initializer='random_uniform', bias_initializer='ones'))

    # update model with sgd
    sgd = optimizers.SGD(lr=0.01)
    model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])

    # train the model
    model.fit(inputs, inputs, epochs=epochs, batch_size=10, verbose=2)

    return model

It shows we added two dense layers to the network structures.