Recurrent Neural Networks for Beginners (in TensorFlow)

This iPython notebook is designed to serve as a walkthrough for beginners on how to implement a simple recurrent neural network using Python and Tensorflow. The code in this notebook is based on min-char-rnn by Andrej Karpathy, and the Tensorflow port by Vinh Khuc. Credit goes to them for the original work. I have renamed variables and cleaned up code for the purpose of instruction and clarity.

Liscenced under BSD


In [ ]:
import tensorflow as tf
import numpy as np
import random

Reading and processing input text

We start by importing a text document that we will use to train the recurrent neural network. This can be any body of text, as long as it is plaintext. I recommend Project Gutenberg as a way to get public domain texts to train on. After importing the text, we then get a list of all the unique characters in the document. For an english document, this will be the alphabet, along with punctuation and linebreaks. We will need to be able to get the index of a given character, and get a given character with an index, and the last two lines here do that.


In [ ]:
text = open('text.txt', 'r').read() # should be simple plain text file
uniqueChars = list(set(text))
text_size, vocab_size = len(text), len(uniqueChars)
print 'data has %d characters, %d unique.' % (text_size, vocab_size)
char_to_ix = { ch:i for i,ch in enumerate(uniqueChars) }
ix_to_char = { i:ch for i,ch in enumerate(uniqueChars) }

Hyperparameters

Hidden_size adjusts how many units are in the hidden layer of our recurrent network. Seq_length referrs to the number of characters the neural network will learn to predict for each pass-through.


In [ ]:
hidden_size = 100
seq_length = 25

This function allows us to convert the character labels to one-hot variants


In [ ]:
def one_hot(v):
    return np.eye(vocab_size)[v]

Defining the network

The first thing we do is create placeholder variables for the x inputs, expected y outputs, and starting value of the hidden layer.


In [ ]:
x = tf.placeholder(tf.float32, [None,vocab_size])
y_in = tf.placeholder(tf.float32, [None,vocab_size])
hStart = tf.placeholder(tf.float32,[1,hidden_size])

Next we define the architecture of the network itself. To do this, we begin by intitializing the hidden state (hState) to the starting hidden state (hStart). We then create an empty list (y_outAll). This will serve to collect each of the y values choosen by the network. The for-loop allows for the recurrent part of the recurrent neural network. This will repeat for the size of seq_length we defined above. Within the loop we initialize our weights and bias variables. Then we compute the new hidden state from the x input, and also the previous hidden state. It is this additional matrix multiply of the previous hidden state that allows each new pass through the network to use information from the past. Tanh allows us to get non-linearity from the hidden state. Finally we multiply the hidden state by the top-layer weights and add the bias to get an output y, which will be collected in y_outAll.


In [ ]:
initializer = tf.random_normal_initializer(stddev=0.1)
with tf.variable_scope("RNN") as scope:
    hState = hStart
    y_outAll = []
    for t,xIn in enumerate(tf.split(0,seq_length,x)):
        if t > 0: scope.reuse_variables()
        Wxh = tf.get_variable("Wxh", [vocab_size,hidden_size], initializer=initializer)
        Whh = tf.get_variable("Whh", [hidden_size,hidden_size], initializer=initializer)
        Why = tf.get_variable("Why",[hidden_size,vocab_size], initializer=initializer)
        bh = tf.get_variable("bh", [hidden_size],initializer=initializer)
        by = tf.get_variable("by",[vocab_size], initializer=initializer)
        
        hState = tf.tanh(tf.matmul(xIn,Wxh) + tf.matmul(hState,Whh) + bh)
        y_out = tf.matmul(hState,Why) + by
        y_outAll.append(y_out)

One all the input xs have been run through the network, we save the last hidden state of the network to use again. We then take all the y values we collected and concatenate them into a single output variable. Using this variable and the true y values (y_in) we compute cross-entropy loss.


In [ ]:
hLast = hState
output_softmax = tf.nn.softmax(y_outAll[-1])
outputs = tf.concat(0,y_outAll)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(outputs,y_in))

Now that we have a loss, we want to minimize it, in order to train our network. We use Adam, which handles the back-propogation and gradient descent for us in an optimized way. When using a recurrent neural network however, we can't simply optimize using the gradients noramlly computer. Left by themselves, the gradients in an RNN will increase exponentially, and the network will fail to converge. In order to deal with this, we clip them within the range of -5 to 5. Lastly, tell Adam to apply the gradients.


In [ ]:
minimizer = tf.train.AdamOptimizer()
grads_and_vars = minimizer.compute_gradients(loss)

grad_clipping = tf.constant(5.0, name="grad_clipping")
clipped_grads_and_vars = []
for grad, var in grads_and_vars:
    clipped_grad = tf.clip_by_value(grad, -grad_clipping, grad_clipping)
    clipped_grads_and_vars.append((clipped_grad, var))

updates = minimizer.apply_gradients(clipped_grads_and_vars)

Sampling from the network

We are going to train the network below, but first we'd like to define a function that we can call that will give us an intiutivie sense of how well the network is doing. The code below allows us to generate a string of text from the network to see if it is indeed producing something resembling the input text.


In [ ]:
def sampleNetwork():
    sample_length = 200
    start_ix      = random.randint(0, len(text) - seq_length)
    sample_seq_ix = [char_to_ix[ch] for ch in text[start_ix:start_ix + seq_length]]
    ixes          = []
    sample_prev_state_val = np.copy(hStart_val)

    for t in range(sample_length):
        sample_input_vals = one_hot(sample_seq_ix)
        sample_output_softmax_val, sample_prev_state_val = \
        sess.run([output_softmax, hLast], feed_dict={x: sample_input_vals, hStart: sample_prev_state_val})

        ix = np.random.choice(range(vocab_size), p=sample_output_softmax_val.ravel())
        ixes.append(ix)
        sample_seq_ix = sample_seq_ix[1:] + [ix]

    txt = ''.join(ix_to_char[ix] for ix in ixes)
    print('----\n %s \n----\n' % (txt,))

Running the recurrent neural network

Now that we have defined the network, it is time to train it! To do this we start by defining a Tensorflow session, and then initializing the network we defined above, and starting it.


In [ ]:
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)

Next we set the starting position in the text and the number of iterations to be 0. We also initialize the value of the hidden layer to be zero, as well as choose how many iterations we'd like to train the network for. For large documents, the longer the better.


In [ ]:
positionInText = 0
numberOfIterations = 0
totalIterations = 1000
hStart_val = np.zeros([1,hidden_size])

Within the main training loop, we first check to see if we have reached the end of the document, or are just beginning the document. If so, we reset the hidden layer, and start at the beginning of the document again. Then we draw from the document what will be our input and target characters. The first character in input should be followed by the first character in target in the document. So if it is the word "cat, " input 1 would be c, and target 1 would be a. We then send these through our network, along with our starting hidden layer state. The network will return the final hidden state, which we will save to use for the next iteration of training. It will also give us a loss value, which we can display periodically to ensure the model is indeed improving. Every 500 iterations, we can check the loss, and sample some output from the network to see how well it is doing.


In [ ]:
while numberOfIterations < totalIterations:
    if positionInText+seq_length+1 >= len(text) or numberOfIterations == 0: 
        hStart_val = np.zeros([1,hidden_size])
        positionInText = 0
        
    inputs = one_hot([char_to_ix[ch] for ch in text[positionInText:positionInText+seq_length]])
    targets = one_hot([char_to_ix[ch] for ch in text[positionInText+1:positionInText+seq_length+1]])
    
    hStart_val, loss_val, _ = sess.run([hLast,loss,updates],feed_dict={x:inputs, y_in:targets,hStart:hStart_val})
    
    if numberOfIterations % 500 == 0:
        print('iter: %d, p: %d, loss: %f' % (numberOfIterations, positionInText, loss_val))
        sampleNetwork()
    
    positionInText += seq_length
    numberOfIterations +=1

Conclusion

And thats it! We now have a working, simple recurrent neural network. Of course, this is just the beginning. You may notice that even with a lot of iterations, the network still isn't doing ideally on the sample outputs. This is because most people actually use more complex recurrent neural networks called LSTMs, which work much better in practice. Hopefully though this walkthrough can give an intuition as to how recurrent network work on a basic level. There is a lot that can be done with recurrent networks that goes beyond text generation, and a network like this is just the beginning.