An RNN model to generate sequences

RNN models can generate long sequences based on past data. This can be used to predict stock markets, temperatures, traffic or sales data based on past patterns. They can also be adapted to generate text. The quality of the prediction will depend on training data, network architecture, hyperparameters, the distance in time at which you are predicting and so on. But most importantly, it will depend on wether your training data contains examples of the behaviour you are trying to predict.

Things to do:

[Choose a waveform](#assignment1) then execute the entire notebook. See results at the bottom: not great...
Implement the RNN model and try again ([Assignment #2](#assignment2))
Check that your state is passed around correctly:
1. Did you use `dynamic_rnn(initial_state=Hin)` [in your model](#assignment3A) ?
2. [During inference](#assignment3B): check the state (hint: it's OK)
3. [In the training loop](#assignment3C) and also [when batching your data](#assignment3C): check the state (hint: 2 bugs)
Make the predictions more robust ([Assignment #4](#assignment4)).

Play with these options until you get a good fit for at least 128 predicted samples. You can then try a [different waveform](#assignment1).



In [ ]:

    
import math
import numpy as np
from matplotlib import pyplot as plt
import utils_prettystyle
import utils_batching
import utils_display
import tensorflow as tf
import math
print("Tensorflow version: " + tf.__version__)

Generate fake dataset

**Assignment #1**: Choose a waveform. Three possible choices on the next line: 0, 1 or 2



In [ ]:

    
WAVEFORM_SELECT = 0 # select 0, 1 or 2

def create_time_series(datalen):
    # good waveforms
    frequencies = [(0.2, 0.15), (0.35, 0.3), (0.6, 0.55)]
    freq1, freq2 = frequencies[WAVEFORM_SELECT]
    noise = [np.random.random()*0.1 for i in range(datalen)]
    x1 = np.sin(np.arange(0,datalen) * freq1)  + noise
    x2 = np.sin(np.arange(0,datalen) * freq2)  + noise
    x = x1 + x2
    return x.astype(np.float32)

DATA_SEQ_LEN = 1024*128
data = create_time_series(DATA_SEQ_LEN)
plt.plot(data[:512])
plt.show()

Hyperparameters



In [ ]:

    
NB_EPOCHS = 5

RNN_CELLSIZE = 80   # size of the RNN cells
N_LAYERS = 1        # number of stacked RNN cells
SEQLEN = 32         # unrolled sequence length
BATCHSIZE = 32      # mini-batch size

Visualize training sequences

This is what the neural network will see during training.



In [ ]:

    
# The function dumb_minibatch_sequencer splits the data into batches of sequences sequentially.
for features, labels, epoch in utils_batching.dumb_minibatch_sequencer(data, BATCHSIZE, SEQLEN, nb_epochs=1):
    break
print("Features shape: " + str(features.shape))
print("Labels shape: " + str(labels.shape))
print("Excerpt from first batch:")

utils_display.picture_this_7(features)

The model definition

When executed, this function instantiates the Tensorflow graph for our model.

**Assignment #2**: implement a single-layer RNN using a GRU cell: `tf.nn.rnn_cell.GRUCell(RNN_CELLSIZE)`

**Assignment #3.A**: check that state is passed around correctly: `dynamic_rnn(initial_state=Hin)`

**Assignment #4**: Make the predictions more robust. You can try the following:

Use a stacked RNN cell with 2 layers with in the model:
``` cells = [tf.nn.rnn_cell.GRUCell(RNN_CELLSIZE) for _ in range(N_LAYERS)] cell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False) ```
Do not forget to set N_LAYERS=2 in [hyperparameters](#hyperparameters) so that the input state gets initialized correctly.
Regularisation: learning rate decay. Locate the line where the AdamOptimizer is instantiated and notice that this model uses an exponentially decaying learning rate. This allows you to train for longer. Set NB_EPOCHS=10 in [hyperparameters](#hyperparameters).

Play with these options until you get a good fit for at least 128 predicted samples. You can then try a [different waveform](#assignment1). Dropout (another regularisation technique) could further improve this model but it is more interesting to try it out on real data: go to the next exercise for that.

  X shape [BATCHSIZE, SEQLEN, 1]

  Y shape [BATCHSIZE, SEQLEN, 1]

  H shape [BATCHSIZE, RNN_CELLSIZE*NLAYERS]



In [ ]:

    
def model_fn(features, Hin, labels, step):
    # inputs shapes during training (for inference, we will use BATCHSIZE=1 and SEQLEN=1):
    # features [BATCHSIZE, SEQLEN, 1]
    # labels [BATCHSIZE, SEQLEN, 1]
    # Hin [BATCHSIZE, RNN_CELLSIZE*N_LAYERS]
    X = features
    batchsize = tf.shape(X)[0] # allow for variable batch size
    seqlen = tf.shape(X)[1] # allow for variable sequence length
    
    # --- dummy model: please implement a real RNN model ---
    Yn = X * tf.ones([RNN_CELLSIZE], name="dummy") # Yn shape [BATCHSIZE, SEQLEN, RNN_CELLSIZE]
    H = Hin
    # TODO: create a tf.nn.rnn_cell.GRUCell
    # TODO: unroll the cell using tf.nn.dynamic_rnn
    # --- end of dummy model ---
    
    # This is the regression layer. It is already implemented.
    # Yn [BATCHSIZE, SEQLEN, RNN_CELLSIZE]
    Yn = tf.reshape(Yn, [batchsize*seqlen, RNN_CELLSIZE])
    Yr = tf.layers.dense(Yn, 1) # Yr [BATCHSIZE*SEQLEN, 1] predicting vectors of 1 element
    Yr = tf.reshape(Yr, [batchsize, seqlen, 1]) # Yr [BATCHSIZE, SEQLEN, 1]
    
    # Yr[BATCHSIZE, SEQLEN, 1]
    Yout = Yr[:,-1,:] # Last output in sequence Yout [BATCHSIZE, 1]
    
    loss = tf.losses.mean_squared_error(Yr, labels) # shapes Yr[BATCHSIZE, SEQLEN, 1], labels[BATCHSIZE, SEQLEN, 1]
    lr = 0.001 + tf.train.exponential_decay(0.01, step, 400, 0.5) # 0.001+0.01*0.5^(step/400)
    optimizer = tf.train.AdamOptimizer(learning_rate=lr)
    train_op = optimizer.minimize(loss)
    
    return Yout, H, loss, train_op # output shapes Yout[BATCHSIZE, 1], H[BATCHSIZE, RNN_CELLSIZE*N_LAYERS]

Instantiate the model



In [ ]:

    
tf.reset_default_graph() # restart model graph from scratch

# placeholder for inputs
Hin = tf.placeholder(tf.float32, [None, RNN_CELLSIZE * N_LAYERS])
features = tf.placeholder(tf.float32, [None, None, 1]) # [BATCHSIZE, SEQLEN, 1]
labels = tf.placeholder(tf.float32, [None, None, 1]) # [BATCHSIZE, SEQLEN, 1]
step = tf.placeholder(tf.int32)

# instantiate the model
Yout, H, loss, train_op = model_fn(features, Hin, labels, step)

Inference

This is a generative model: run one trained RNN cell in a loop

**Assignment #3.B**: Check that the RNN state is passed around correctly (hint: it's OK)



In [ ]:

    
def prediction_run(prime_data, run_length):
    H_ = np.zeros([1, RNN_CELLSIZE * N_LAYERS]) # zero state initially
    Yout_ = np.zeros([1, 1])
    data_len = prime_data.shape[0]

    # prime the state from data
    if data_len > 0:
        Yin = np.array(prime_data)
        Yin = np.reshape(Yin, [1, data_len, 1]) # reshape as one sequence of length data_len
        feed = {Hin: H_, features: Yin}
        Yout_, H_ = sess.run([Yout, H], feed_dict=feed)
    
    # run prediction
    # To generate a sequence, run a trained cell in a loop passing as input and input state
    # respectively the output and output state from the previous iteration.
    results = []
    for i in range(run_length):
        Yout_ = np.reshape(Yout_, [1, 1, 1]) # batch of a single sequence of a single vector with one element
        feed = {Hin: H_, features: Yout_}
        Yout_, H_ = sess.run([Yout, H], feed_dict=feed)
        results.append(Yout_[0,0])
        
    return np.array(results)

Initialize Tensorflow session

This resets all neuron weights and biases to initial random values



In [ ]:

    
# first input state
Hzero = np.zeros([BATCHSIZE, RNN_CELLSIZE * N_LAYERS])
# variable initialization
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run([init])

The training loop

You can re-execute this cell to continue training

**Assignment #3.C**: find and resolve RNN state bugs. **hint**: there are 2 bugs. One in the core of the training loop. Find the state and pass it around correctly. The second bug is in the way the data was sliced into batches of sequences. Special care is needed when batching sequences for an RNN. [See this slide](https://docs.google.com/presentation/d/18MiZndRCOxB7g-TcCl2EZOElS5udVaCuxnGznLnmOlE/pub?slide=id.g139650d17f_0_584) to understand the situation. You can fix it by using `rnn_minibatch_sequencer` instead of `dumb_minibatch_sequencer`)



In [ ]:

    
H_ = Hzero
losses = []
indices = []
for i, (next_features, next_labels, epoch) in enumerate(utils_batching.dumb_minibatch_sequencer(data, BATCHSIZE, SEQLEN, nb_epochs=NB_EPOCHS)):
    next_features = np.expand_dims(next_features, axis=2) # model wants 3D inputs [BATCHSIZE, SEQLEN, 1] 
    next_labels = np.expand_dims(next_labels, axis=2)

    feed = {Hin: Hzero, features: next_features, labels: next_labels, dropout_pkeep: DROPOUT_PKEEP}
    Yout_, _, loss_, _ = sess.run([Yout, H, loss, train_op], feed_dict=feed)
    # print progress
    if i%100 == 0:
        print("epoch " + str(epoch) + ", batch " + str(i) + ", loss=" + str(np.mean(loss_)))
    if i%10 == 0:
        losses.append(np.mean(loss_))
        indices.append(i)



In [ ]:

    
plt.ylim(ymax=np.amax(losses[1:])) # ignore first value for scaling
plt.plot(indices, losses)
plt.show()



In [ ]:

    
PRIMELEN=256
RUNLEN=512
OFFSET=20
RMSELEN=128

prime_data = data[OFFSET:OFFSET+PRIMELEN]
results = prediction_run(prime_data, RUNLEN)

utils_display.picture_this_8(data, prime_data, results, OFFSET, PRIMELEN, RUNLEN, RMSELEN)

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.