A minimal Char RNN using TensorFlow

This Jupyter Notebook implement RNN at char level and is inspired by the Minimal character-level Vanilla RNN model written by Andrej Karpathy

Decoding is based on this code from Sherjil Ozair

I did some modifications to the original code to accomodate Jupyter, for instance the orginial code is splited in several files and are optimized to run using parameters from a shell command line. I added comments, some code to test some parts line by line.

Also I've removed the ability to use LSTM or GRU and the embedings. The results are less impressive than original code, but closer to Karpathy's Minimal character-level Vanilla RNN model

Let's dive in :)

Imports

Import needed for Tensorflow


In [1]:
import numpy as np
import tensorflow as tf

Import needed for Jupiter


In [2]:
%matplotlib notebook
import matplotlib
import matplotlib.pyplot as plt

Imports needed for utilities

to load the text and transform it as a vector


In [3]:
import codecs
import os
import collections
from six.moves import cPickle
from six import text_type
import time
from __future__ import print_function

Args, to define all parameters

The original code use argparse to manage the args.

Here we define a class for it. Feel free to edit to try different settings


In [30]:
class Args():
    def __init__(self):
        '''data directory containing input.txt'''
        self.data_dir = 'data_rnn/tinyshakespeare'
        '''directory to store checkpointed models'''
        self.save_dir = 'save_vec'
        
        '''size of RNN hidden state'''
        self.rnn_size = 128
        '''minibatch size'''
        self.batch_size = 1 #was 40
        '''RNN sequence length'''
        self.seq_length = 50
        '''number of epochs'''
        self.num_epochs = 1 # was 5
        '''save frequency'''
        self.save_every = 500 # was 500
        '''Print frequency'''
        self.print_every = 100 # was 100
        '''clip gradients at this value'''
        self.grad_clip = 5.
        '''learning rate'''
        self.learning_rate = 0.002 # was ?
        '''decay rate for rmsprop'''
        self.decay_rate = 0.98 # was 0.97?
        """continue training from saved model at this path. Path must contain files saved by previous training process: 
                            'config.pkl'        : configuration;
                            'chars_vocab.pkl'   : vocabulary definitions;
                            'checkpoint'        : paths to model file(s) (created by tf).
                                                  Note: this file contains absolute paths, be careful when moving files around;
                            'model.ckpt-*'      : file(s) with model definition (created by tf)
                        """
        self.init_from = 'save_vec'
        #self.init_from = None
        
        
        '''number of characters to sample'''
        self.n = 500
        '''prime text'''
        self.prime = u' '

Load the data

Transforming the original dataset in vector that can be use by a NN is always necessary.

This Class need to be replaced if you want to deal with other kind of data.

This class is able to cache the preprocessed data:

  • Check if the data are processed allready
    • if yes load the data using Numpy (not tensorflow)
    • if not
      • process the data
      • save them using Numpy Process the data

In [31]:
class TextLoader():
    def __init__(self, data_dir, batch_size, seq_length, encoding='utf-8'):
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join(data_dir, "input.txt")
        vocab_file = os.path.join(data_dir, "vocab.pkl")
        tensor_file = os.path.join(data_dir, "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)

    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

    def create_batches(self):
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

        # When the data (tensor) is too small, let's give them a better error message
        if self.num_batches==0:
            assert False, "Not enough data. Make seq_length and batch_size small."

        self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
        xdata = self.tensor
        ydata = np.copy(self.tensor)
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)
     
    def vectorize(self, x):
        vectorized = np.zeros((len(x), len(x[0]), self.vocab_size))
        for i in range(0, len(x)):
            for j in range(0, len(x[0])):
                vectorized[i][j][x[i][j]] = 1
        return vectorized
    
    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        x_vectorized = self.vectorize(x)
        y_vectorized = self.vectorize(y)
        return x_vectorized, y_vectorized

    def reset_batch_pointer(self):
        self.pointer = 0

Let's see how preprocessing works:


In [32]:
## First we open the file
args = Args()
input_file = os.path.join(args.data_dir, "input.txt")
f =  codecs.open(input_file, "r", 'utf-8')
data = f.read()
print (data[0:300])


First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us

Then we have:

counter = collections.Counter(data)
count_pairs = sorted(counter.items(), key=lambda x: -x[1])
chars, _ = zip(*count_pairs)
vocab_size = len(chars)
vocab = dict(zip(chars, range(len(chars))))

Witch do the same than this:

chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
vocab = { ch:i for i,ch in enumerate(chars) }

Let's see the details here:


In [33]:
counter = collections.Counter(data)
print ('histogram of char from the input data file:', counter)


histogram of char from the input data file: Counter({u' ': 169892, u'e': 94611, u't': 67009, u'o': 65798, u'a': 55507, u'h': 51310, u's': 49696, u'r': 48889, u'n': 48529, u'i': 45537, u'\n': 40000, u'l': 33339, u'd': 31358, u'u': 26584, u'm': 22243, u'y': 20448, u',': 19846, u'w': 17585, u'f': 15770, u'c': 15623, u'g': 13356, u'I': 11832, u'b': 11321, u'p': 10808, u':': 10316, u'.': 7885, u'A': 7819, u'v': 7793, u'k': 7088, u'T': 7015, u"'": 6187, u'E': 6041, u'O': 5481, u'N': 5079, u'R': 4869, u'S': 4523, u'L': 3876, u'C': 3820, u';': 3628, u'W': 3530, u'U': 3313, u'H': 3068, u'M': 2840, u'B': 2761, u'?': 2462, u'G': 2399, u'!': 2172, u'D': 2089, u'-': 1897, u'F': 1797, u'Y': 1718, u'P': 1641, u'K': 1584, u'V': 798, u'j': 628, u'q': 609, u'x': 529, u'z': 356, u'J': 320, u'Q': 231, u'Z': 198, u'X': 112, u'3': 27, u'&': 3, u'$': 1})

In [34]:
count_pairs = sorted(counter.items(), key=lambda x: -x[1])
print (count_pairs)


[(u' ', 169892), (u'e', 94611), (u't', 67009), (u'o', 65798), (u'a', 55507), (u'h', 51310), (u's', 49696), (u'r', 48889), (u'n', 48529), (u'i', 45537), (u'\n', 40000), (u'l', 33339), (u'd', 31358), (u'u', 26584), (u'm', 22243), (u'y', 20448), (u',', 19846), (u'w', 17585), (u'f', 15770), (u'c', 15623), (u'g', 13356), (u'I', 11832), (u'b', 11321), (u'p', 10808), (u':', 10316), (u'.', 7885), (u'A', 7819), (u'v', 7793), (u'k', 7088), (u'T', 7015), (u"'", 6187), (u'E', 6041), (u'O', 5481), (u'N', 5079), (u'R', 4869), (u'S', 4523), (u'L', 3876), (u'C', 3820), (u';', 3628), (u'W', 3530), (u'U', 3313), (u'H', 3068), (u'M', 2840), (u'B', 2761), (u'?', 2462), (u'G', 2399), (u'!', 2172), (u'D', 2089), (u'-', 1897), (u'F', 1797), (u'Y', 1718), (u'P', 1641), (u'K', 1584), (u'V', 798), (u'j', 628), (u'q', 609), (u'x', 529), (u'z', 356), (u'J', 320), (u'Q', 231), (u'Z', 198), (u'X', 112), (u'3', 27), (u'&', 3), (u'$', 1)]

In [35]:
chars, _ = zip(*count_pairs)
print ('chars', chars)


chars (u' ', u'e', u't', u'o', u'a', u'h', u's', u'r', u'n', u'i', u'\n', u'l', u'd', u'u', u'm', u'y', u',', u'w', u'f', u'c', u'g', u'I', u'b', u'p', u':', u'.', u'A', u'v', u'k', u'T', u"'", u'E', u'O', u'N', u'R', u'S', u'L', u'C', u';', u'W', u'U', u'H', u'M', u'B', u'?', u'G', u'!', u'D', u'-', u'F', u'Y', u'P', u'K', u'V', u'j', u'q', u'x', u'z', u'J', u'Q', u'Z', u'X', u'3', u'&', u'$')

In [36]:
vocab_size = len(chars)
print (vocab_size)


65

In [37]:
vocab = dict(zip(chars, range(len(chars))))
print (vocab)


{u'\n': 10, u'!': 46, u' ': 0, u'$': 64, u"'": 30, u'&': 63, u'-': 48, u',': 16, u'.': 25, u'3': 62, u';': 38, u':': 24, u'?': 44, u'A': 26, u'C': 37, u'B': 43, u'E': 31, u'D': 47, u'G': 45, u'F': 49, u'I': 21, u'H': 41, u'K': 52, u'J': 58, u'M': 42, u'L': 36, u'O': 32, u'N': 33, u'Q': 59, u'P': 51, u'S': 35, u'R': 34, u'U': 40, u'T': 29, u'W': 39, u'V': 53, u'Y': 50, u'X': 61, u'Z': 60, u'a': 4, u'c': 19, u'b': 22, u'e': 1, u'd': 12, u'g': 20, u'f': 18, u'i': 9, u'h': 5, u'k': 28, u'j': 54, u'm': 14, u'l': 11, u'o': 3, u'n': 8, u'q': 55, u'p': 23, u's': 6, u'r': 7, u'u': 13, u't': 2, u'w': 17, u'v': 27, u'y': 15, u'x': 56, u'z': 57}

It can be used to calculate an ID from vocab


In [38]:
print (vocab['a'])


4

This is equivalent of the following code by Karpathy: it associate a unique int to any all char used in the file.


In [39]:
# Karpathy orginal code seems to do the same:
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
vocab = { ch:i for i,ch in enumerate(chars) }
print (vocab)


{u'\n': 0, u'!': 1, u' ': 2, u'$': 3, u"'": 4, u'&': 5, u'-': 6, u',': 7, u'.': 8, u'3': 9, u';': 10, u':': 11, u'?': 12, u'A': 13, u'C': 14, u'B': 15, u'E': 16, u'D': 17, u'G': 18, u'F': 19, u'I': 20, u'H': 21, u'K': 22, u'J': 23, u'M': 24, u'L': 25, u'O': 26, u'N': 27, u'Q': 28, u'P': 29, u'S': 30, u'R': 31, u'U': 32, u'T': 33, u'W': 34, u'V': 35, u'Y': 36, u'X': 37, u'Z': 38, u'a': 39, u'c': 40, u'b': 41, u'e': 42, u'd': 43, u'g': 44, u'f': 45, u'i': 46, u'h': 47, u'k': 48, u'j': 49, u'm': 50, u'l': 51, u'o': 52, u'n': 53, u'q': 54, u'p': 55, u's': 56, u'r': 57, u'u': 58, u't': 59, u'w': 60, u'v': 61, u'y': 62, u'x': 63, u'z': 64}

Now we have to make a tensor out of the data.

The tensor is done using this:

tensor = np.array(list(map(vocab.get, data)))

Let's split the line to see in details how it works:


In [40]:
data_in_array = map(vocab.get, data)
print (len(data_in_array))
print (data_in_array[0:200])


1115394
[19, 46, 57, 56, 59, 2, 14, 46, 59, 46, 64, 42, 53, 11, 0, 15, 42, 45, 52, 57, 42, 2, 60, 42, 2, 55, 57, 52, 40, 42, 42, 43, 2, 39, 53, 62, 2, 45, 58, 57, 59, 47, 42, 57, 7, 2, 47, 42, 39, 57, 2, 50, 42, 2, 56, 55, 42, 39, 48, 8, 0, 0, 13, 51, 51, 11, 0, 30, 55, 42, 39, 48, 7, 2, 56, 55, 42, 39, 48, 8, 0, 0, 19, 46, 57, 56, 59, 2, 14, 46, 59, 46, 64, 42, 53, 11, 0, 36, 52, 58, 2, 39, 57, 42, 2, 39, 51, 51, 2, 57, 42, 56, 52, 51, 61, 42, 43, 2, 57, 39, 59, 47, 42, 57, 2, 59, 52, 2, 43, 46, 42, 2, 59, 47, 39, 53, 2, 59, 52, 2, 45, 39, 50, 46, 56, 47, 12, 0, 0, 13, 51, 51, 11, 0, 31, 42, 56, 52, 51, 61, 42, 43, 8, 2, 57, 42, 56, 52, 51, 61, 42, 43, 8, 0, 0, 19, 46, 57, 56, 59, 2, 14, 46, 59, 46, 64, 42, 53, 11, 0, 19, 46, 57, 56, 59, 7, 2, 62, 52, 58]

In [41]:
print (data_in_array[0], 'means', data[0],'witch is the first letter in data' )


19 means F witch is the first letter in data

Then we create a numpy array out of it!


In [42]:
tensor = np.array(data_in_array)

Let's see how batching works:

Here a reminder about the "create batches" function

def create_batches(self):
    self.num_batches = int(self.tensor.size / (self.batch_size *
                                               self.seq_length))

    # When the data (tesor) is too small, let's give them a better error message
    if self.num_batches==0:
        assert False, "Not enough data. Make seq_length and batch_size small."

    self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
    xdata = self.tensor
    ydata = np.copy(self.tensor)
    ydata[:-1] = xdata[1:]
    ydata[-1] = xdata[0]
    self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
    self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)

Let's try!


In [43]:
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
data_loader.create_batches()
x, y = data_loader.next_batch()
print ('x and y are matrix ', len(x), 'x', len(x[0]) )
print ('there are', len(x), 'batch that contains', len(x[0]), 'vector that have a size of', len(x[0][0]))


loading preprocessed files
x and y are matrix  1 x 50
there are 1 batch that contains 50 vector that have a size of 65

In [44]:
print ('x[0] is the first batch of input:')
print (x[0])
print ('x[0][0] is the first char:')
print (x[0][0])
print ('y[0][0] is the first batch of expected char:')
print (y[0][0])


x[0] is the first batch of input:
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  1.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
x[0][0] is the first char:
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
y[0][0] is the first batch of expected char:
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

In [45]:
print ('y[0] is x[0] shifted by one, in other words: y[0][x] == x[0][x+1]')
print ('y[0][10] ==', y[0][10])
print ('x[0][11] ==', x[0][11])


y[0] is x[0] shifted by one, in other words: y[0][x] == x[0][x+1]
y[0][10] == [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
x[0][11] == [ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

The Model


In [46]:
class Model():
    def __init__(self, args, infer=False):
        self.args = args
        if infer:
            '''Infer is true when the model is used for sampling'''
            args.seq_length = 1
   
        hidden_size = args.rnn_size
        vocab_size = args.vocab_size
        
        # define place holder to for the input data and the target.
        self.input_data = tf.placeholder(tf.float32, [args.batch_size, args.seq_length, vocab_size], name='input_data')
        self.target_data = tf.placeholder(tf.float32, [args.batch_size, args.seq_length, vocab_size], name='target_data') 
        # define the input xs
        one_batch_input = tf.squeeze(tf.slice(self.input_data, [0, 0, 0], [1, args.seq_length, vocab_size]),[0])
        xs = tf.split(0, args.seq_length, one_batch_input)
        # define the target
        one_batch_target = tf.squeeze(tf.slice(self.target_data, [0, 0, 0], [1, args.seq_length, vocab_size]),[0])
        targets = tf.split(0, args.seq_length, one_batch_target)  
        #initial_state
        self.initial_state = tf.zeros((hidden_size,1))
        #last_state = tf.placeholder(tf.float32, (hidden_size, 1))
        
        # model parameters
        Wxh = tf.Variable(tf.random_uniform((hidden_size, vocab_size))*0.01, name='Wxh') # input to hidden
        Whh = tf.Variable(tf.random_uniform((hidden_size, hidden_size))*0.01, name='Whh') # hidden to hidden
        Why = tf.Variable(tf.random_uniform((vocab_size, hidden_size))*0.01, name='Why') # hidden to output
        bh = tf.Variable(tf.zeros((hidden_size, 1)), name='bh') # hidden bias
        by = tf.Variable(tf.zeros((vocab_size, 1)), name='by') # output bias
        loss = tf.zeros([1], name='loss')
        
        hs, ys, ps = {}, {}, {}
        
        hs[-1] = self.initial_state
        # forward pass                                                                                                                                                                              
        for t in xrange(args.seq_length):
            xs_t = tf.transpose(xs[t])
            targets_t = tf.transpose(targets[t]) 
            
            hs[t] = tf.tanh(tf.matmul(Wxh, xs_t) + tf.matmul(Whh, hs[t-1]) + bh) # hidden state
            ys[t] = tf.matmul(Why, hs[t]) + by # unnormalized log probabilities for next chars
            ps[t] = tf.exp(ys[t]) / tf.reduce_sum(tf.exp(ys[t])) # probabilities for next chars
            
            loss += -tf.log(tf.reduce_sum(tf.mul(ps[t], targets_t))) # softmax (cross-entropy loss)

        #self.probs = ps[t]
        self.cost = loss / args.batch_size / args.seq_length
        self.final_state = hs[args.seq_length-1]
        self.lr = tf.Variable(0.0, trainable=False, name='learning_rate')
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
                args.grad_clip)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))


    def sample(self, sess, chars, vocab, num=200, prime='The '):
        state = model.initial_state.eval()
        for char in prime[:-1]:
            x = np.zeros((1,1, 65))
            x[0,0, vocab[char]] = 1
            feed = {self.input_data: x, self.initial_state:state}
            [state] = sess.run([self.final_state], feed)

        def weighted_pick(weights):
            t = np.cumsum(weights)
            s = np.sum(weights)
            return(int(np.searchsorted(t, np.random.rand(1)*s)))

        ret = prime
        char = prime[-1]
        for n in range(num):
            x = np.zeros((1,1, 65))
            x[0,0, vocab[char]] = 1
            feed = {self.input_data: x, self.initial_state:state}
            [probs, state] = sess.run([self.probs, self.final_state], feed)
            #print ('p', probs.ravel())
            #print ('state', state.ravel())
            sample = weighted_pick(probs)
            #print ('sample', sample)
            pred = chars[sample]
            ret += pred
            char = pred
        return ret
    
    def inspect(self, draw=False):
        for var in tf.all_variables():
            if var in tf.trainable_variables():
                print ('t', var.name, var.eval().shape)
                if draw:
                    plt.figure(figsize=(1,1))
                    plt.figimage(var.eval())
                    plt.show()
            else:
                print ('nt', var.name, var.eval().shape)

Inspect the model variables

Looking at the shape of use variable can help to understand the flow.
't' as a prefix means trainable 'nt' as a prefix means not trainable


In [47]:
tf.reset_default_graph()
args = Args()
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
args.vocab_size = data_loader.vocab_size
print (args.vocab_size)

model = Model(args)
print ("model created")

# Open a session to inspect the model
with tf.Session() as sess:
    tf.initialize_all_variables().run()
    print('All variable initialized')
    model.inspect()
    '''
    saver = tf.train.Saver(tf.all_variables())
    ckpt = tf.train.get_checkpoint_state(args.save_dir)
    print (ckpt)
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)

        model.inspect()
        plt.figure(figsize=(1,1))
        plt.figimage(model.vectorize.eval())
        plt.show()'''


loading preprocessed files
65
model created
All variable initialized
t Wxh:0 (128, 65)
t Whh:0 (128, 128)
t Why:0 (65, 128)
t bh:0 (128, 1)
t by:0 (65, 1)
nt learning_rate:0 ()
nt beta1_power:0 ()
nt beta2_power:0 ()
nt Wxh/Adam:0 (128, 65)
nt Wxh/Adam_1:0 (128, 65)
nt Whh/Adam:0 (128, 128)
nt Whh/Adam_1:0 (128, 128)
nt Why/Adam:0 (65, 128)
nt Why/Adam_1:0 (65, 128)
nt bh/Adam:0 (128, 1)
nt bh/Adam_1:0 (128, 1)
nt by/Adam:0 (65, 1)
nt by/Adam_1:0 (65, 1)

Visualize the graph

The following code came from the deepdream jupyter tutorial

It allow to draw a graph in Jupyter. It looks cool, but I'm not sure it is usefull.


In [48]:
# this code from:
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb

from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def
  
def rename_nodes(graph_def, rename_func):
    res_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = res_def.node.add() 
        n.MergeFrom(n0)
        n.name = rename_func(n.name)
        for i, s in enumerate(n.input):
            n.input[i] = rename_func(s) if s[0]!='^' else '^'+rename_func(s[1:])
    return res_def
  
def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))
  
    iframe = """
        <iframe seamless style="width:800px;height:620px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

In [49]:
# write the graph to help visualizing it
model_fn = 'model.pb'
tf.train.write_graph(sess.graph.as_graph_def(),'.', model_fn, as_text=False) 
    
# Visualizing the network graph. Be sure expand the "mixed" nodes to see their 
with tf.gfile.FastGFile(model_fn, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

tmp_def = rename_nodes(graph_def, lambda s:"/".join(s.split('_',1)))
#show_graph(tmp_def)

Trainning

Loading the data and process them if needed


In [53]:
args = Args()

data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
args.vocab_size = data_loader.vocab_size

# check compatibility if training is continued from previously saved model
if args.init_from is not None:
    print ("need to load file from", args.init_from)
    # check if all necessary files exist 
    assert os.path.isdir(args.init_from)," %s must be a a path" % args.init_from
    assert os.path.isfile(os.path.join(args.init_from,"config.pkl")),"config.pkl file does not exist in path %s"%args.init_from
    assert os.path.isfile(os.path.join(args.init_from,"chars_vocab.pkl")),"chars_vocab.pkl.pkl file does not exist in path %s" % args.init_from
    ckpt = tf.train.get_checkpoint_state(args.init_from)
    assert ckpt,"No checkpoint found"
    assert ckpt.model_checkpoint_path,"No model path found in checkpoint"

    # open old config and check if models are compatible
    with open(os.path.join(args.init_from, 'config.pkl')) as f:
        saved_model_args = cPickle.load(f)
    print (saved_model_args)
    need_be_same=["model","rnn_size","seq_length"]
    for checkme in need_be_same:
        assert vars(saved_model_args)[checkme]==vars(args)[checkme],"Command line argument and saved model disagree on '%s' "%checkme

    # open saved vocab/dict and check if vocabs/dicts are compatible
    with open(os.path.join(args.init_from, 'chars_vocab.pkl')) as f:
        saved_chars, saved_vocab = cPickle.load(f)
    assert saved_chars==data_loader.chars, "Data and loaded model disagreee on character set!"
    assert saved_vocab==data_loader.vocab, "Data and loaded model disagreee on dictionary mappings!"
    print ("config loaded")

with open(os.path.join(args.save_dir, 'config.pkl'), 'wb') as f:
    cPickle.dump(args, f)
with open(os.path.join(args.save_dir, 'chars_vocab.pkl'), 'wb') as f:
    cPickle.dump((data_loader.chars, data_loader.vocab), f)


loading preprocessed files
need to load file from save_vec
<__main__.Args instance at 0x1109f0488>
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-53-bd7d20759741> in <module>()
     21     need_be_same=["model","rnn_size","seq_length"]
     22     for checkme in need_be_same:
---> 23         assert vars(saved_model_args)[checkme]==vars(args)[checkme],"Command line argument and saved model disagree on '%s' "%checkme
     24 
     25     # open saved vocab/dict and check if vocabs/dicts are compatible

KeyError: 'model'

Instanciate the model and train it.


In [ ]:
print (args.print_every)

In [52]:
tf.reset_default_graph()
model = Model(args)
print ("model created")

cost_optimisation = []

with tf.Session() as sess:
    tf.initialize_all_variables().run()
    print ("variable initialized")
    saver = tf.train.Saver(tf.all_variables())
    # restore model
    if args.init_from is not None:
        saver.restore(sess, ckpt.model_checkpoint_path)
        print ("model restored")
    for e in range(args.num_epochs):
        sess.run(tf.assign(model.lr, args.learning_rate * (args.decay_rate ** e)))
        data_loader.reset_batch_pointer()
        state = model.initial_state.eval()
        for b in range(data_loader.num_batches):
            start = time.time()
            # Get learning data
            x, y = data_loader.next_batch()
            # Create the structure for the learning data
            feed = {model.input_data: x, model.target_data: y, model.initial_state: state}
            # Run a session using train_op
            [train_loss], state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
            end = time.time()
            if (e * data_loader.num_batches + b) % args.print_every == 0:
                cost_optimisation.append(train_loss)
                print("{}/{} (epoch {}), train_loss = {:.6f}, time/batch = {:.3f}" \
                    .format(e * data_loader.num_batches + b,
                            args.num_epochs * data_loader.num_batches,
                            e, train_loss, end - start))
            if (e * data_loader.num_batches + b) % args.save_every == 0\
                or (e==args.num_epochs-1 and b == data_loader.num_batches-1): # save for the last result
                checkpoint_path = os.path.join(args.save_dir, 'model.ckpt')
                saver.save(sess, checkpoint_path, global_step = e * data_loader.num_batches + b)
                print("model saved to {}".format(checkpoint_path))


model created
variable initialized
model restored
0/22307 (epoch 0), train_loss = 2.152187, time/batch = 0.446
model saved to save_vec/model.ckpt
100/22307 (epoch 0), train_loss = 2.011497, time/batch = 0.183
200/22307 (epoch 0), train_loss = 1.922973, time/batch = 0.181
300/22307 (epoch 0), train_loss = 2.083013, time/batch = 0.166
400/22307 (epoch 0), train_loss = 1.424449, time/batch = 0.173
500/22307 (epoch 0), train_loss = 1.849237, time/batch = 0.180
model saved to save_vec/model.ckpt
600/22307 (epoch 0), train_loss = 2.027658, time/batch = 0.167
700/22307 (epoch 0), train_loss = 1.564220, time/batch = 0.184
800/22307 (epoch 0), train_loss = 1.350735, time/batch = 0.212
900/22307 (epoch 0), train_loss = 1.986346, time/batch = 0.170
1000/22307 (epoch 0), train_loss = 1.977910, time/batch = 0.172
model saved to save_vec/model.ckpt
1100/22307 (epoch 0), train_loss = 1.676137, time/batch = 0.168
1200/22307 (epoch 0), train_loss = 2.327292, time/batch = 0.168
1300/22307 (epoch 0), train_loss = 1.953639, time/batch = 0.168
1400/22307 (epoch 0), train_loss = 2.030562, time/batch = 0.166
1500/22307 (epoch 0), train_loss = 1.457734, time/batch = 0.166
model saved to save_vec/model.ckpt
1600/22307 (epoch 0), train_loss = 1.670410, time/batch = 0.167
1700/22307 (epoch 0), train_loss = 1.279751, time/batch = 0.166
1800/22307 (epoch 0), train_loss = 1.817698, time/batch = 0.166
1900/22307 (epoch 0), train_loss = 1.899714, time/batch = 0.168
2000/22307 (epoch 0), train_loss = 1.634101, time/batch = 0.166
model saved to save_vec/model.ckpt
2100/22307 (epoch 0), train_loss = 1.883314, time/batch = 0.165
2200/22307 (epoch 0), train_loss = 2.067735, time/batch = 0.166
2300/22307 (epoch 0), train_loss = 1.454144, time/batch = 0.172
2400/22307 (epoch 0), train_loss = 1.761648, time/batch = 0.166
2500/22307 (epoch 0), train_loss = 1.983632, time/batch = 0.168
model saved to save_vec/model.ckpt
2600/22307 (epoch 0), train_loss = 1.444991, time/batch = 0.167
2700/22307 (epoch 0), train_loss = 2.040055, time/batch = 0.168
2800/22307 (epoch 0), train_loss = 1.978591, time/batch = 0.166
2900/22307 (epoch 0), train_loss = 1.602581, time/batch = 0.177
3000/22307 (epoch 0), train_loss = 1.740325, time/batch = 0.166
model saved to save_vec/model.ckpt
3100/22307 (epoch 0), train_loss = 1.931208, time/batch = 0.175
3200/22307 (epoch 0), train_loss = 1.958346, time/batch = 0.175
3300/22307 (epoch 0), train_loss = 2.273568, time/batch = 0.168
3400/22307 (epoch 0), train_loss = 2.046148, time/batch = 0.175
3500/22307 (epoch 0), train_loss = 1.816602, time/batch = 0.179
model saved to save_vec/model.ckpt
3600/22307 (epoch 0), train_loss = 1.481727, time/batch = 0.170
3700/22307 (epoch 0), train_loss = 1.712463, time/batch = 0.166
3800/22307 (epoch 0), train_loss = 1.283632, time/batch = 0.166
3900/22307 (epoch 0), train_loss = 1.632998, time/batch = 0.166
4000/22307 (epoch 0), train_loss = 1.939250, time/batch = 0.167
model saved to save_vec/model.ckpt
4100/22307 (epoch 0), train_loss = 1.920266, time/batch = 0.177
4200/22307 (epoch 0), train_loss = 2.337096, time/batch = 0.167
4300/22307 (epoch 0), train_loss = 2.235835, time/batch = 0.166
4400/22307 (epoch 0), train_loss = 1.818429, time/batch = 0.167
4500/22307 (epoch 0), train_loss = 1.695233, time/batch = 0.168
model saved to save_vec/model.ckpt
4600/22307 (epoch 0), train_loss = 1.997222, time/batch = 0.166
4700/22307 (epoch 0), train_loss = 1.921496, time/batch = 0.166
4800/22307 (epoch 0), train_loss = 1.569810, time/batch = 0.167
4900/22307 (epoch 0), train_loss = 2.019497, time/batch = 0.166
5000/22307 (epoch 0), train_loss = 2.062987, time/batch = 0.176
model saved to save_vec/model.ckpt
5100/22307 (epoch 0), train_loss = 1.905364, time/batch = 0.171
5200/22307 (epoch 0), train_loss = 1.254680, time/batch = 0.202
5300/22307 (epoch 0), train_loss = 1.730256, time/batch = 0.168
5400/22307 (epoch 0), train_loss = 1.341452, time/batch = 0.166
5500/22307 (epoch 0), train_loss = 2.073293, time/batch = 0.178
model saved to save_vec/model.ckpt
5600/22307 (epoch 0), train_loss = 2.273921, time/batch = 0.166
5700/22307 (epoch 0), train_loss = 1.696314, time/batch = 0.168
5800/22307 (epoch 0), train_loss = 1.418136, time/batch = 0.169
5900/22307 (epoch 0), train_loss = 1.196017, time/batch = 0.188
6000/22307 (epoch 0), train_loss = 2.451196, time/batch = 0.175
model saved to save_vec/model.ckpt
6100/22307 (epoch 0), train_loss = 1.426051, time/batch = 0.193
6200/22307 (epoch 0), train_loss = 1.325799, time/batch = 0.169
6300/22307 (epoch 0), train_loss = 1.681414, time/batch = 0.173
6400/22307 (epoch 0), train_loss = 2.123528, time/batch = 0.169
6500/22307 (epoch 0), train_loss = 2.171817, time/batch = 0.167
model saved to save_vec/model.ckpt
6600/22307 (epoch 0), train_loss = 2.069588, time/batch = 0.168
6700/22307 (epoch 0), train_loss = 1.916512, time/batch = 0.175
6800/22307 (epoch 0), train_loss = 0.959311, time/batch = 0.166
6900/22307 (epoch 0), train_loss = 2.311517, time/batch = 0.168
7000/22307 (epoch 0), train_loss = 1.767336, time/batch = 0.169
model saved to save_vec/model.ckpt
7100/22307 (epoch 0), train_loss = 1.356428, time/batch = 0.168
7200/22307 (epoch 0), train_loss = 1.982831, time/batch = 0.173
7300/22307 (epoch 0), train_loss = 2.049194, time/batch = 0.183
7400/22307 (epoch 0), train_loss = 1.503814, time/batch = 0.166
7500/22307 (epoch 0), train_loss = 2.450796, time/batch = 0.168
model saved to save_vec/model.ckpt
7600/22307 (epoch 0), train_loss = 1.941393, time/batch = 0.172
7700/22307 (epoch 0), train_loss = 1.607015, time/batch = 0.182
7800/22307 (epoch 0), train_loss = 1.754258, time/batch = 0.287
7900/22307 (epoch 0), train_loss = 1.516202, time/batch = 0.231
8000/22307 (epoch 0), train_loss = 2.130601, time/batch = 0.174
model saved to save_vec/model.ckpt
8100/22307 (epoch 0), train_loss = 1.641015, time/batch = 0.195
8200/22307 (epoch 0), train_loss = 2.134670, time/batch = 0.240
8300/22307 (epoch 0), train_loss = 1.457696, time/batch = 0.203
8400/22307 (epoch 0), train_loss = 1.624693, time/batch = 0.192
8500/22307 (epoch 0), train_loss = 1.328304, time/batch = 0.177
model saved to save_vec/model.ckpt
8600/22307 (epoch 0), train_loss = 1.696817, time/batch = 0.173
8700/22307 (epoch 0), train_loss = 1.989542, time/batch = 0.226
8800/22307 (epoch 0), train_loss = 1.889212, time/batch = 0.169
8900/22307 (epoch 0), train_loss = 1.897197, time/batch = 0.179
9000/22307 (epoch 0), train_loss = 1.526857, time/batch = 0.167
model saved to save_vec/model.ckpt
9100/22307 (epoch 0), train_loss = 2.190273, time/batch = 0.212
9200/22307 (epoch 0), train_loss = 1.914494, time/batch = 0.200
9300/22307 (epoch 0), train_loss = 1.140114, time/batch = 0.168
9400/22307 (epoch 0), train_loss = 2.063158, time/batch = 0.172
9500/22307 (epoch 0), train_loss = 1.720322, time/batch = 0.170
model saved to save_vec/model.ckpt
9600/22307 (epoch 0), train_loss = 1.272899, time/batch = 0.167
9700/22307 (epoch 0), train_loss = 1.670237, time/batch = 0.168
9800/22307 (epoch 0), train_loss = 1.753250, time/batch = 0.168
9900/22307 (epoch 0), train_loss = 1.366881, time/batch = 0.213
10000/22307 (epoch 0), train_loss = 1.583282, time/batch = 0.178
model saved to save_vec/model.ckpt
10100/22307 (epoch 0), train_loss = 2.029258, time/batch = 0.173
10200/22307 (epoch 0), train_loss = 1.527530, time/batch = 0.169
10300/22307 (epoch 0), train_loss = 2.092223, time/batch = 0.174
10400/22307 (epoch 0), train_loss = 1.933074, time/batch = 0.171
10500/22307 (epoch 0), train_loss = 1.831258, time/batch = 0.175
model saved to save_vec/model.ckpt
10600/22307 (epoch 0), train_loss = 1.515499, time/batch = 0.171
10700/22307 (epoch 0), train_loss = 1.329195, time/batch = 0.170
10800/22307 (epoch 0), train_loss = 1.362785, time/batch = 0.177
10900/22307 (epoch 0), train_loss = 2.287330, time/batch = 0.166
11000/22307 (epoch 0), train_loss = 1.296272, time/batch = 0.185
model saved to save_vec/model.ckpt
11100/22307 (epoch 0), train_loss = 1.972670, time/batch = 0.169
11200/22307 (epoch 0), train_loss = 1.835899, time/batch = 0.226
11300/22307 (epoch 0), train_loss = 2.067368, time/batch = 0.167
11400/22307 (epoch 0), train_loss = 2.213986, time/batch = 0.167
11500/22307 (epoch 0), train_loss = 1.338577, time/batch = 0.166
model saved to save_vec/model.ckpt
11600/22307 (epoch 0), train_loss = 2.010125, time/batch = 0.168
11700/22307 (epoch 0), train_loss = 1.544025, time/batch = 0.171
11800/22307 (epoch 0), train_loss = 1.546484, time/batch = 0.170
11900/22307 (epoch 0), train_loss = 1.958509, time/batch = 0.177
12000/22307 (epoch 0), train_loss = 1.873554, time/batch = 0.172
model saved to save_vec/model.ckpt
12100/22307 (epoch 0), train_loss = 1.667102, time/batch = 0.171
12200/22307 (epoch 0), train_loss = 1.651166, time/batch = 0.170
12300/22307 (epoch 0), train_loss = 1.473441, time/batch = 0.169
12400/22307 (epoch 0), train_loss = 1.648211, time/batch = 0.170
12500/22307 (epoch 0), train_loss = 2.067415, time/batch = 0.175
model saved to save_vec/model.ckpt
12600/22307 (epoch 0), train_loss = 1.935000, time/batch = 0.212
12700/22307 (epoch 0), train_loss = 1.752274, time/batch = 0.167
12800/22307 (epoch 0), train_loss = 2.012103, time/batch = 0.168
12900/22307 (epoch 0), train_loss = 1.146594, time/batch = 0.188
13000/22307 (epoch 0), train_loss = 1.726766, time/batch = 0.167
model saved to save_vec/model.ckpt
13100/22307 (epoch 0), train_loss = 1.319776, time/batch = 0.167
13200/22307 (epoch 0), train_loss = 1.950185, time/batch = 0.169
13300/22307 (epoch 0), train_loss = 1.564314, time/batch = 0.169
13400/22307 (epoch 0), train_loss = 1.577636, time/batch = 0.173
13500/22307 (epoch 0), train_loss = 2.038313, time/batch = 0.181
model saved to save_vec/model.ckpt
13600/22307 (epoch 0), train_loss = 1.777441, time/batch = 0.167
13700/22307 (epoch 0), train_loss = 1.617358, time/batch = 0.190
13800/22307 (epoch 0), train_loss = 1.384678, time/batch = 0.171
13900/22307 (epoch 0), train_loss = 1.936072, time/batch = 0.167
14000/22307 (epoch 0), train_loss = 1.410601, time/batch = 0.169
model saved to save_vec/model.ckpt
14100/22307 (epoch 0), train_loss = 2.279730, time/batch = 0.180
14200/22307 (epoch 0), train_loss = 1.708565, time/batch = 0.177
14300/22307 (epoch 0), train_loss = 1.899429, time/batch = 0.175
14400/22307 (epoch 0), train_loss = 2.112174, time/batch = 0.168
14500/22307 (epoch 0), train_loss = 1.556154, time/batch = 0.167
model saved to save_vec/model.ckpt
14600/22307 (epoch 0), train_loss = 1.979329, time/batch = 0.200
14700/22307 (epoch 0), train_loss = 1.518648, time/batch = 0.217
14800/22307 (epoch 0), train_loss = 1.710269, time/batch = 0.190
14900/22307 (epoch 0), train_loss = 1.602670, time/batch = 0.171
15000/22307 (epoch 0), train_loss = 1.835187, time/batch = 0.175
model saved to save_vec/model.ckpt
15100/22307 (epoch 0), train_loss = 1.789178, time/batch = 0.166
15200/22307 (epoch 0), train_loss = 1.767259, time/batch = 0.167
15300/22307 (epoch 0), train_loss = 2.010516, time/batch = 0.169
15400/22307 (epoch 0), train_loss = 2.775822, time/batch = 0.167
15500/22307 (epoch 0), train_loss = 1.834219, time/batch = 0.183
model saved to save_vec/model.ckpt
15600/22307 (epoch 0), train_loss = 1.881806, time/batch = 0.168
15700/22307 (epoch 0), train_loss = 1.970585, time/batch = 0.185
15800/22307 (epoch 0), train_loss = 1.371429, time/batch = 0.168
15900/22307 (epoch 0), train_loss = 1.817719, time/batch = 0.168
16000/22307 (epoch 0), train_loss = 1.816345, time/batch = 0.170
model saved to save_vec/model.ckpt
16100/22307 (epoch 0), train_loss = 1.672847, time/batch = 0.171
16200/22307 (epoch 0), train_loss = 1.931189, time/batch = 0.173
16300/22307 (epoch 0), train_loss = 2.031380, time/batch = 0.168
16400/22307 (epoch 0), train_loss = 2.023208, time/batch = 0.169
16500/22307 (epoch 0), train_loss = 1.818879, time/batch = 0.168
model saved to save_vec/model.ckpt
16600/22307 (epoch 0), train_loss = 1.819087, time/batch = 0.173
16700/22307 (epoch 0), train_loss = 2.126461, time/batch = 0.169
16800/22307 (epoch 0), train_loss = 1.774534, time/batch = 0.169
16900/22307 (epoch 0), train_loss = 1.809120, time/batch = 0.168
17000/22307 (epoch 0), train_loss = 2.099270, time/batch = 0.168
model saved to save_vec/model.ckpt
17100/22307 (epoch 0), train_loss = 2.027678, time/batch = 0.173
17200/22307 (epoch 0), train_loss = 1.799949, time/batch = 0.167
17300/22307 (epoch 0), train_loss = 1.443723, time/batch = 0.169
17400/22307 (epoch 0), train_loss = 1.161970, time/batch = 0.166
17500/22307 (epoch 0), train_loss = 0.952187, time/batch = 0.168
model saved to save_vec/model.ckpt
17600/22307 (epoch 0), train_loss = 1.548178, time/batch = 0.166
17700/22307 (epoch 0), train_loss = 1.709871, time/batch = 0.166
17800/22307 (epoch 0), train_loss = 2.455598, time/batch = 0.167
17900/22307 (epoch 0), train_loss = 2.147510, time/batch = 0.167
18000/22307 (epoch 0), train_loss = 1.981208, time/batch = 0.166
model saved to save_vec/model.ckpt
18100/22307 (epoch 0), train_loss = 1.923201, time/batch = 0.166
18200/22307 (epoch 0), train_loss = 1.864007, time/batch = 0.166
18300/22307 (epoch 0), train_loss = 1.936964, time/batch = 0.166
18400/22307 (epoch 0), train_loss = 1.929036, time/batch = 0.168
18500/22307 (epoch 0), train_loss = 1.852554, time/batch = 0.168
model saved to save_vec/model.ckpt
18600/22307 (epoch 0), train_loss = 1.593444, time/batch = 0.168
18700/22307 (epoch 0), train_loss = 1.972444, time/batch = 0.168
18800/22307 (epoch 0), train_loss = 1.929490, time/batch = 0.176
18900/22307 (epoch 0), train_loss = 1.991560, time/batch = 0.166
19000/22307 (epoch 0), train_loss = 1.720168, time/batch = 0.168
model saved to save_vec/model.ckpt
19100/22307 (epoch 0), train_loss = 1.724941, time/batch = 0.173
19200/22307 (epoch 0), train_loss = 1.597429, time/batch = 0.168
19300/22307 (epoch 0), train_loss = 1.656376, time/batch = 0.167
19400/22307 (epoch 0), train_loss = 1.709567, time/batch = 0.174
19500/22307 (epoch 0), train_loss = 2.068377, time/batch = 0.187
model saved to save_vec/model.ckpt
19600/22307 (epoch 0), train_loss = 2.032332, time/batch = 0.167
19700/22307 (epoch 0), train_loss = 1.721388, time/batch = 0.172
19800/22307 (epoch 0), train_loss = 1.685130, time/batch = 0.168
19900/22307 (epoch 0), train_loss = 1.984532, time/batch = 0.184
20000/22307 (epoch 0), train_loss = 1.562227, time/batch = 0.166
model saved to save_vec/model.ckpt
20100/22307 (epoch 0), train_loss = 1.476743, time/batch = 0.180
20200/22307 (epoch 0), train_loss = 2.248961, time/batch = 0.170
20300/22307 (epoch 0), train_loss = 1.911033, time/batch = 0.174
20400/22307 (epoch 0), train_loss = 1.624131, time/batch = 0.169
20500/22307 (epoch 0), train_loss = 1.646900, time/batch = 0.166
model saved to save_vec/model.ckpt
20600/22307 (epoch 0), train_loss = 2.127131, time/batch = 0.166
20700/22307 (epoch 0), train_loss = 1.555288, time/batch = 0.166
20800/22307 (epoch 0), train_loss = 1.778828, time/batch = 0.174
20900/22307 (epoch 0), train_loss = 1.734951, time/batch = 0.178
21000/22307 (epoch 0), train_loss = 2.114552, time/batch = 0.182
model saved to save_vec/model.ckpt
21100/22307 (epoch 0), train_loss = 2.110011, time/batch = 0.169
21200/22307 (epoch 0), train_loss = 1.459632, time/batch = 0.169
21300/22307 (epoch 0), train_loss = 1.669630, time/batch = 0.173
21400/22307 (epoch 0), train_loss = 1.543281, time/batch = 0.171
21500/22307 (epoch 0), train_loss = 1.839819, time/batch = 0.169
model saved to save_vec/model.ckpt
21600/22307 (epoch 0), train_loss = 1.824439, time/batch = 0.167
21700/22307 (epoch 0), train_loss = 1.796373, time/batch = 0.168
21800/22307 (epoch 0), train_loss = 1.653873, time/batch = 0.179
21900/22307 (epoch 0), train_loss = 1.539179, time/batch = 0.167
22000/22307 (epoch 0), train_loss = 1.986701, time/batch = 0.167
model saved to save_vec/model.ckpt
22100/22307 (epoch 0), train_loss = 1.658178, time/batch = 0.172
22200/22307 (epoch 0), train_loss = 1.280431, time/batch = 0.175
22300/22307 (epoch 0), train_loss = 1.188453, time/batch = 0.190
model saved to save_vec/model.ckpt

In [27]:
plt.figure(figsize=(12,5))
plt.plot(range(len(cost_optimisation)), cost_optimisation, label='cost')
plt.legend()
plt.show()


Check Learning


In [28]:
tf.reset_default_graph()
model_fn = 'model.pb'

with open(os.path.join(args.save_dir, 'config.pkl'), 'rb') as f:
    saved_args = cPickle.load(f)
with open(os.path.join(args.save_dir, 'chars_vocab.pkl'), 'rb') as f:
    chars, vocab = cPickle.load(f)
    
model = Model(saved_args, True)  # True to generate the model in sampling mode
with tf.Session() as sess:
    tf.initialize_all_variables().run()
    saver = tf.train.Saver(tf.all_variables())
    ckpt = tf.train.get_checkpoint_state(args.save_dir)
    print (ckpt)
    
    model.inspect(draw=True)


model_checkpoint_path: "save_vec/model.ckpt-6500"
all_model_checkpoint_paths: "save_vec/model.ckpt-4500"
all_model_checkpoint_paths: "save_vec/model.ckpt-5000"
all_model_checkpoint_paths: "save_vec/model.ckpt-5500"
all_model_checkpoint_paths: "save_vec/model.ckpt-6000"
all_model_checkpoint_paths: "save_vec/model.ckpt-6500"

t Wxh:0 (128, 65)
t Whh:0 (128, 128)
t Why:0 (65, 128)
t bh:0 (128, 1)
t by:0 (65, 1)
nt learning_rate:0 ()
nt beta1_power:0 ()
nt beta2_power:0 ()
nt Wxh/Adam:0 (128, 65)
nt Wxh/Adam_1:0 (128, 65)
nt Whh/Adam:0 (128, 128)
nt Whh/Adam_1:0 (128, 128)
nt Why/Adam:0 (65, 128)
nt Why/Adam_1:0 (65, 128)
nt bh/Adam:0 (128, 1)
nt bh/Adam_1:0 (128, 1)
nt by/Adam:0 (65, 1)
nt by/Adam_1:0 (65, 1)

Sampling


In [29]:
with tf.Session() as sess:
    tf.initialize_all_variables().run()
    saver = tf.train.Saver(tf.all_variables())
    ckpt = tf.train.get_checkpoint_state(args.save_dir)
    print (ckpt)
    
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
        
        print(model.sample(sess, chars, vocab, args.n, args.prime))


model_checkpoint_path: "save_vec/model.ckpt-6500"
all_model_checkpoint_paths: "save_vec/model.ckpt-4500"
all_model_checkpoint_paths: "save_vec/model.ckpt-5000"
all_model_checkpoint_paths: "save_vec/model.ckpt-5500"
all_model_checkpoint_paths: "save_vec/model.ckpt-6000"
all_model_checkpoint_paths: "save_vec/model.ckpt-6500"

 this dibghamb'd bup the gando, oor, pent broth, our hantied no firs dingedpe,
My leefeak, cers werle ouk pram decturce.

KING RICHAOF I Evebstns orrenged by lick beage.
Threm ot what terpichoproad.

NONCHy
Whate good a pors,
Enstikes, of Norruenderey enf me to mears stoo,
Them by hade my'd dear.
The racthorg tham dow bus storant, the, his.

JONB:
I erold: thrord blooke ghes, dest;
Th ow thee chom hy hishaid, to this blood there bepe, and whot chim that upon beothen, be taTtoub. Muks; butcle I sh

That's it!

If you want to acheive better result, you can switch to LSTM with 2 layers, and add an embeding space. All of this is implemented in the original code

Feedback wellcome @dh7net