Minimal character-level Vanilla RNN model.

RNN stand for "Recurent Neural Network".
To understand why RNN are so hot you must read this!

This notebook to explain the Minimal character-level Vanilla RNN model written by Andrej Karpathy
This code create a RNN to generate a text, char after char, by learning char after char from a textfile.

I love this character-level Vanilla RNN code because it doesn't use any library except numpy. All the NN magic in 112 lines of code, no need to understand any dependency. Everything is there! I'll try to explain in detail every line of it. Disclamer: I still need to use some external links for reference.

This notebook is for real beginners who whant to understand RNN concept by reading code.
Feedback welcome @dh7net

Let's start!

Let's see the original code and the results for the first 1000 iterations.


In [1]:
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np

# data I/O
data = open('methamorphosis.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print 'data has %d characters, %d unique.' % (data_size, vocab_size)
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }

# hyperparameters
hidden_size = 100 # size of hidden layer of neurons
seq_length = 25 # number of steps to unroll the RNN for
learning_rate = 1e-1

# model parameters
Wxh = np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden
Whh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden
Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output
bh = np.zeros((hidden_size, 1)) # hidden bias
by = np.zeros((vocab_size, 1)) # output bias

def lossFun(inputs, targets, hprev):
  """
  inputs,targets are both list of integers.
  hprev is Hx1 array of initial hidden state
  returns the loss, gradients on model parameters, and last hidden state
  """
  xs, hs, ys, ps = {}, {}, {}, {}
  hs[-1] = np.copy(hprev)
  loss = 0
  # forward pass
  for t in xrange(len(inputs)):
    xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation
    xs[t][inputs[t]] = 1
    hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
    ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
    ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars
    loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
  # backward pass: compute gradients going backwards
  dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
  dbh, dby = np.zeros_like(bh), np.zeros_like(by)
  dhnext = np.zeros_like(hs[0])
  for t in reversed(xrange(len(inputs))):
    dy = np.copy(ps[t])
    dy[targets[t]] -= 1 # backprop into y
    dWhy += np.dot(dy, hs[t].T)
    dby += dy
    dh = np.dot(Why.T, dy) + dhnext # backprop into h
    dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity
    dbh += dhraw
    dWxh += np.dot(dhraw, xs[t].T)
    dWhh += np.dot(dhraw, hs[t-1].T)
    dhnext = np.dot(Whh.T, dhraw)
  for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
    np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients
  return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]

def sample(h, seed_ix, n):
  """ 
  sample a sequence of integers from the model 
  h is memory state, seed_ix is seed letter for first time step
  """
  x = np.zeros((vocab_size, 1))
  x[seed_ix] = 1
  ixes = []
  for t in xrange(n):
    h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
    y = np.dot(Why, h) + by
    p = np.exp(y) / np.sum(np.exp(y))
    ix = np.random.choice(range(vocab_size), p=p.ravel())
    x = np.zeros((vocab_size, 1))
    x[ix] = 1
    ixes.append(ix)
  return ixes

n, p = 0, 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad
smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0
while n<=1000: # was while True: in original code
  # prepare inputs (we're sweeping from left to right in steps seq_length long)
  if p+seq_length+1 >= len(data) or n == 0: 
    hprev = np.zeros((hidden_size,1)) # reset RNN memory
    p = 0 # go from start of data
  inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
  targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

  # sample from the model now and then
  if n % 100 == 0:
    sample_ix = sample(hprev, inputs[0], 200)
    txt = ''.join(ix_to_char[ix] for ix in sample_ix)
    print '----\n %s \n----' % (txt, )

  # forward seq_length characters through the net and fetch gradient
  loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
  smooth_loss = smooth_loss * 0.999 + loss * 0.001
  if n % 100 == 0: print 'iter %d, loss: %f' % (n, smooth_loss) # print progress
  
  # perform parameter update with Adagrad
  for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                                [dWxh, dWhh, dWhy, dbh, dby], 
                                [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update

  p += seq_length # move data pointer
  n += 1 # iteration counter


data has 119163 characters, 61 unique.
----
 ,JLBhcuSwFdHYCU!i?mFSePjkl)v(t!M.ldrDV-OqSAWx:iBocJGHUUlw b.E i-gtiHnTOabv omHfcmAMw'(i:uo"iQY!."um'GA!W jYANMydLbbu!.a'MUHzxnsJDiAzULI,U!pWT)vaYCQwPx:;ulgoPqGD
mJxivk'alh,LiSJxDgS;r.yGeJnfCmDkD,EWQio 
----
iter 0, loss: 102.771854
----
 O hA  ieeaicmauytaas ovtf seGwayt as,ulf aot'batLbuc L idhassbac dbseoat aic  fp losnhegw ,PcdUcnna co-tW o  oAcMm
t  uyn ao bug dDsmhatqoAtliaseeap'ha.; ec desn eee ss,!ucttoc(a,c aoeelacnorn h.oSblo 
----
iter 100, loss: 102.940407
----
 ln h wsa a
afe vdbee bh tfveioyoloeeltsaliaWrlkadylclnruhpead wa 
i-ditx oab wts
  em.lsoesu t" H d  rsa wst wdteaidthss tsdlushl vinceG)eaeOz eops hieoesf
t ei WiaHb-e
n-slrwsh
sdroe,wedpn neresl rhg 
----
iter 200, loss: 101.077410
----
 Gned -nd  wt wftiwp so LardpfenWimeooWpd
u ieacetH   oponebpGa rgo"femnkoha,
"ie efngnkd fhoie gt eoteh Glt.vQOtueT sl tl  n.hiipge itth e islmktw ow g rhet hauoqun cwkblo rslpuCf shas cTgtoce.l,ts fe 
----
iter 300, loss: 99.188995
----
  mpnoftoefre wn fcrsrdoi dn,ldoilyn haOyttle l rrc  ownnitiiedr ihrm e bnnamsanehioeeli s
 tuhue veB aodAnopu onlcrfcwetuojdys eim ntautaar hann,letesa   ampc  esi fiaedsaelw ashi ur,otll
Aseeinntpuwe 
----
iter 400, loss: 97.071554
----
 d dyoeY anip
hwgihefhow mwiVefcwe I fe
nrollaaiticn
 ptlahwteasenm ie'Apeftp athabloe iwH ,nn de  A  ktwtthdhel d ka flfoo sosoie alreilemFtowurlatt  na nv c?nl slleo l btanmoBto
l"c,gc le bIechmdim,e 
----
iter 500, loss: 95.014480
----
 ewplos. onfeso sditginpcaepseolGr.s intin f,d ai tlcnemId tomefl e r nphe  mho u
 tbt gdoh t Ieni t evrnan ut psusbcnena ?lsiwik Gilcepfhees  Skyt nu, t edirels yep lfek  sanbshdApc,rs ieno  t  anrm
p 
----
iter 600, loss: 93.053803
----
 thixl yinodheglhigedr sor  ht heersod neben  gpit  Ar nrn ham "i d rt h mssutIhhcnnwte atrest y tvsow un, add atinodevor
v telrwengis ahbth
W.god enes ren dncrn cari fis mot Yo lfod tkBithu.diny resho 
----
iter 700, loss: 91.329670
----
 agn
 nliaedito f se ithiri Gathhhn rl? waffemn Gnuted  o"g wouroln
e.f mono  lha s ocf  pio s  Hm tere it weenvon wop d "irelnn siShed Ite matOas Caukt de medn  ,d ss cp s pebotr th
c trfentoraeccsa p 
----
iter 800, loss: 89.594637
----
 n w "Ficotalslothr lure l te m le r
. .dar;s mthlee va d, teal ,om coohehluny megeobinhitivy uodenalluterl eou nOw k:e th  hlrekgiwels ssa
the nIb se  k aisdS  "elo. n;cteNren, t dyrwiuu to'nllAy doyr 
----
iter 900, loss: 87.895688
----
 l hetimerwddrn h k thir y b fohfre h ith!yse th th fithedm sg , ttho niusn ay?hut uanved taachitl en rire on 
jhe n bl adanrt DAght en eodeuf. s nuoat rsihoy tarthe mhlen uasitel hl  crdocthe thcef oa 
----
iter 1000, loss: 86.080407

If you are not a NN expert, the code is not easy to understand.

If you look to the results you can see that the code iterate 1000 time, calculate a loss that decrease over time, and output some text each 100 iteration. The output from the first iteration looks random.
After 1000 iterations, the NN is able to create words that have plausible size, don't use too much caps, and can create correct small words like "the", "they", "be", "to".
If you let the code learn over a nigth the NN will be able to create almost correct sentences:
"with home to get there was much hadinge everything and he could that ho women this tending applear space"
This is just a simple exemple, and there is no doubt this code can do much better.

Theorie

This code build a neural network that is able to predict one char from the previous one.
In this example, it learn from a text file, so he can learn words and sentence ; if you feed HTML or XML during the tranning it can produce valid HTML or XML sequences.
At each step it can use some results from the previous step to keep in memory what is going on.
For instance if the previous char are "hello worl" the model can guess that the next char is "d".

This model contain parameters that are initialized randomly and the trainning phase try to find optimal values for each of them. During the trainning process we do a "gradient descent":

  • We give to the model a pair of char: the input char and the target char. The target char is the char the network should guess, it is the next char in our trainning text file.
  • We calculate the probability for every possible next char according to the state of the model, using the paramters (This is the forward pass).
  • We create a distance (the loss) between the previous probabilty and the target char.
  • We calculate gradients for each of our parameters to see witch impact they have on the loss. (A fast way to calculate all gradients is called the backward pass).
  • We update all parameters in the direction that help to minimise the loss
  • We iterate until their is no more progress and print a generated sentence from times to times.

Let's dive in!

The code contains 4 parts

  • Load the trainning data
    • encode char into vectors
  • Define the Network
  • Define a function to create sentences from the model
  • Define a loss function
    • Forward pass
    • Loss
    • Backward pass
  • Train the network
    • Feed the network
    • Calculate gradiend and update the model parameters
    • Output a text to see the progress of the training

Let's have a closer look to every line of the code.
Disclaimer: the following code is cut and pasted from the original ones, with some adaptation to make it clearer for this notebook, like adding some print.

Load the training data

The network need a big txt file as an input.

The content of the file will be used to train the network.

For this example, I used Methamorphosis from Kafka (Public Domain).


In [2]:
"""                                                                                                                                                                                           
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)                                                                                                             
BSD License                                                                                                                                                                                   
"""
import numpy as np

# data I/O                                                                                                                                                                                    
data = open('methamorphosis.txt', 'r').read() # should be simple plain text file

Encode/Decode char/vector

Neural networks can only works on vectors. (a vector is an array of float) So we need a way to encode and decode a char as a vector.

For this we count the number of unique char (vocab_size). It will be the size of the vector. The vector contain only zero exept for the position of the char wherae the value is 1.

First we calculate vocab_size:


In [3]:
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print 'data has %d characters, %d unique.' % (data_size, vocab_size)


data has 119163 characters, 61 unique.

Then we create 2 dictionary to encode and decode a char to an int


In [4]:
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
print char_to_ix
print ix_to_char


{'\n': 0, '!': 1, ' ': 2, '"': 3, "'": 4, ')': 5, '(': 6, '-': 7, ',': 8, '.': 9, ';': 10, ':': 11, '?': 12, 'A': 13, 'C': 14, 'B': 15, 'E': 16, 'D': 17, 'G': 18, 'F': 19, 'I': 20, 'H': 21, 'J': 22, 'M': 23, 'L': 24, 'O': 25, 'N': 26, 'Q': 27, 'P': 28, 'S': 29, 'U': 30, 'T': 31, 'W': 32, 'V': 33, 'Y': 34, 'a': 35, 'c': 36, 'b': 37, 'e': 38, 'd': 39, 'g': 40, 'f': 41, 'i': 42, 'h': 43, 'k': 44, 'j': 45, 'm': 46, 'l': 47, 'o': 48, 'n': 49, 'q': 50, 'p': 51, 's': 52, 'r': 53, 'u': 54, 't': 55, 'w': 56, 'v': 57, 'y': 58, 'x': 59, 'z': 60}
{0: '\n', 1: '!', 2: ' ', 3: '"', 4: "'", 5: ')', 6: '(', 7: '-', 8: ',', 9: '.', 10: ';', 11: ':', 12: '?', 13: 'A', 14: 'C', 15: 'B', 16: 'E', 17: 'D', 18: 'G', 19: 'F', 20: 'I', 21: 'H', 22: 'J', 23: 'M', 24: 'L', 25: 'O', 26: 'N', 27: 'Q', 28: 'P', 29: 'S', 30: 'U', 31: 'T', 32: 'W', 33: 'V', 34: 'Y', 35: 'a', 36: 'c', 37: 'b', 38: 'e', 39: 'd', 40: 'g', 41: 'f', 42: 'i', 43: 'h', 44: 'k', 45: 'j', 46: 'm', 47: 'l', 48: 'o', 49: 'n', 50: 'q', 51: 'p', 52: 's', 53: 'r', 54: 'u', 55: 't', 56: 'w', 57: 'v', 58: 'y', 59: 'x', 60: 'z'}

Finaly we create a vector from a char like this:

The dictionary defined above allow us to create a vector of size 61 instead of 256.
Here and exemple for char 'a'
The vector contains only zero, except at position char_to_ix['a'] where we put a 1.


In [17]:
%matplotlib notebook

import matplotlib
import matplotlib.pyplot as plt

vector_for_char_a = np.zeros((vocab_size, 1))
vector_for_char_a[char_to_ix['a']] = 1
#print vector_for_char_a
print vector_for_char_a.ravel()

x = range(0,len(chars))
plt.figure(figsize=(10,2))
plt.bar(x, vector_for_char_a.ravel(), 0.3)
plt.xticks(x, chars)
plt.show()


[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.]

Definition of the network

The neural network is made of 3 layers:

  • an input layer
  • an hidden layer
  • an output layer

All layers are fully connected to the next one: each node of a layer are conected to all nodes of the next layer. The hidden layer is connected to the output and to itself: the values from an iteration are used for the next one.

To centralise values that matter for the training (hyper parameters) we also define the sequence lenght and the learning rate


In [6]:
# hyperparameters                                                                                                                                                                             
hidden_size = 100 # size of hidden layer of neurons                                                                                                                                           
seq_length = 25 # number of steps to unroll the RNN for
learning_rate = 1e-1

In [7]:
# model parameters                                                                                                                                                                            
Wxh = np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden
print 'Wxh contain', Wxh.size, 'parameters'
Whh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden
print 'Whh contain', Whh.size, 'parameters'
Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output    
print 'Why contain', Why.size, 'parameters'
bh = np.zeros((hidden_size, 1)) # hidden bias
print 'bh contain', bh.size, 'parameters'
by = np.zeros((vocab_size, 1)) # output bias
print 'by contain', by.size, 'parameters'


Wxh contain 6100 parameters
Whh contain 10000 parameters
Why contain 6100 parameters
bh contain 100 parameters
by contain 61 parameters

The model parameters are adjusted during the trainning.

  • Wxh are parameters to connect a vector that contain one input to the hidden layer.
  • Whh are parameters to connect the hidden layer to itself. This is the Key of the Rnn: Recursion is done by injecting the previous values from the output of the hidden state, to itself at the next iteration.
  • Why are parameters to connect the hidden layer to the output
  • bh contains the hidden bias
  • by contains the output bias

You'll see in the next section how theses parameters are used to create a sentence.

Create a sentence from the model


In [23]:
def sample(h, seed_ix, n):
  """                                                                                                                                                                                         
  sample a sequence of integers from the model                                                                                                                                                
  h is memory state, seed_ix is seed letter for first time step                                                                                                                               
  """
  x = np.zeros((vocab_size, 1))
  x[seed_ix] = 1
  ixes = []
  for t in xrange(n):
    h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
    y = np.dot(Why, h) + by
    p = np.exp(y) / np.sum(np.exp(y))
    ix = np.random.choice(range(vocab_size), p=p.ravel())
    x = np.zeros((vocab_size, 1))
    x[ix] = 1
    ixes.append(ix)
  txt = ''.join(ix_to_char[ix] for ix in ixes)
  print '----\n %s \n----' % (txt, )

hprev = np.zeros((hidden_size,1)) # reset RNN memory  
sample(hprev,char_to_ix['a'],200)


----
 th there pseblfily to serm havestice."  Gregor with and be not intwing, sisticharry morey morcting enfureathid net it with trom breather he climale would pulling staclime, hadsing, but thears't eraide 
----

Define the loss function

The loss is a key concept in all neural networks trainning. It is a value that describe how bag/good is our model.
It is always positive, the closest to zero, the better is our model.
(A good model is a model where the predicted output is close to the training output)

During the trainning phase we want to minimize the loss.

The loss function calculate the loss but also the gradients (see backward pass):

  • It perform a forward pass: calculate the next char given a char from the trainning set.
  • It calculate the loss by comparing the predicted char to the target char. (The target char is the input following char in the tranning set)
  • It calculate the backward pass to calculate the gradients (see the backword pass paragraph)

This function take as input:

  • a list of input char
  • a list of target char
  • and the previous hidden state

This function output:

  • the loss
  • the gradient for each parameters between layers
  • the last hidden state

Here the code:


In [9]:
def lossFun(inputs, targets, hprev):
  """                                                                                                                                                                                         
  inputs,targets are both list of integers.                                                                                                                                                   
  hprev is Hx1 array of initial hidden state                                                                                                                                                  
  returns the loss, gradients on model parameters, and last hidden state                                                                                                                      
  """
  xs, hs, ys, ps = {}, {}, {}, {}
  hs[-1] = np.copy(hprev)
  loss = 0
  # forward pass                                                                                                                                                                              
  for t in xrange(len(inputs)):
    xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation                                                                                                                        
    xs[t][inputs[t]] = 1
    hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state                                                                                                            
    ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars                                                                                                           
    ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars                                                                                                              
    loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)                                                                                                                       
  # backward pass: compute gradients going backwards                                                                                                                                          
  dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
  dbh, dby = np.zeros_like(bh), np.zeros_like(by)
  dhnext = np.zeros_like(hs[0])
  for t in reversed(xrange(len(inputs))):
    dy = np.copy(ps[t])
    dy[targets[t]] -= 1 # backprop into y                                                                                                                                                     
    dWhy += np.dot(dy, hs[t].T)
    dby += dy
    dh = np.dot(Why.T, dy) + dhnext # backprop into h                                                                                                                                         
    dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity                                                                                                                     
    dbh += dhraw
    dWxh += np.dot(dhraw, xs[t].T)
    dWhh += np.dot(dhraw, hs[t-1].T)
    dhnext = np.dot(Whh.T, dhraw)
  for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
    np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients                                                                                                                 
  return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]

Forward pass

The forward pass use the parameters of the model (Wxh, Whh, Why, bh, by) to calculate the next char given a char from the trainning set.

xs[t] is the vector that encode the char at position t ps[t] is the probabilities for next char

hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars

or is dirty pseudo code for each char

hs = input*Wxh + last_value_of_hidden_state*Whh + bh
ys = hs*Why + by
ps = normalized(ys)

To dive into the code, we'll work on one char only (we set t=0 ; instead of the "for each" loop).


In [24]:
# uncomment the print to get some details
xs, hs, ys, ps = {}, {}, {}, {}
hs[-1] = np.copy(hprev)
# forward pass                                                                                                                                                                              
t=0 # for t in xrange(len(inputs)):
xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation
xs[t][inputs[t]] = 1 
# print xs[t]
hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state 
ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
# print ys[t]
ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars  
# print ps[t].ravel()

# Let's build a dict to see witch probablity is associated with witch char
probability_per_char =  { ch:ps[t].ravel()[i] for i,ch in enumerate(chars) }
# uncoment the next line to see the raw result
# print probability_per_char

# To print the probability in a way that is more easy to read.
for x in range(vocab_size):
    print 'p('+ ix_to_char[x] + ")=", "%.4f" % ps[t].ravel()[x],
    if (x%7==0):
        print ""
    else:
        print "",

x = range(0,len(chars))
plt.figure(figsize=(10,5))
plt.bar(x, ps[t], 0.3)
plt.xticks(x, chars)
plt.show()


p(
)= 0.0046 
p(!)= 0.0011  p( )= 0.0443  p(")= 0.0006  p(')= 0.0020  p())= 0.0001  p(()= 0.0000  p(-)= 0.0001 
p(,)= 0.0028  p(.)= 0.0010  p(;)= 0.0019  p(:)= 0.0003  p(?)= 0.0006  p(A)= 0.0000  p(C)= 0.0000 
p(B)= 0.0000  p(E)= 0.0000  p(D)= 0.0000  p(G)= 0.0000  p(F)= 0.0000  p(I)= 0.0000  p(H)= 0.0000 
p(J)= 0.0000  p(M)= 0.0000  p(L)= 0.0000  p(O)= 0.0000  p(N)= 0.0000  p(Q)= 0.0000  p(P)= 0.0000 
p(S)= 0.0000  p(U)= 0.0000  p(T)= 0.0001  p(W)= 0.0000  p(V)= 0.0000  p(Y)= 0.0000  p(a)= 0.0051 
p(c)= 0.0018  p(b)= 0.0002  p(e)= 0.0075  p(d)= 0.1185  p(g)= 0.0011  p(f)= 0.3273  p(i)= 0.0000 
p(h)= 0.0001  p(k)= 0.0002  p(j)= 0.0002  p(m)= 0.0923  p(l)= 0.0028  p(o)= 0.0001  p(n)= 0.0222 
p(q)= 0.0006  p(p)= 0.0051  p(s)= 0.0191  p(r)= 0.0087  p(u)= 0.0000  p(t)= 0.0127  p(w)= 0.0040 
p(v)= 0.3057  p(y)= 0.0001  p(x)= 0.0046  p(z)= 0.0000 


In [11]:
# We can create the next char from the above distribution
ix = np.random.choice(range(vocab_size), p=ps[t].ravel())
print
print "Next char code is:", ix
print "Next char is:", ix_to_char[ix]


Next char code is: 41
Next char is: f

You can run the previous code several time. A char is generated for a given probability.

Loss

For each char in the input the forward pass calculate the probability of the next char
The loss is the sum

loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)

The loss is calculate using Softmax. more info here and here.


In [12]:
print 'Next char from training (target) was number', targets[t], 'witch is "' + ix_to_char[targets[t]] + '"'
print 'Probability for this letter was', ps[t][targets[t],0]

loss = -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
print 'loss for this input&target pair is', loss


Next char from training (target) was number 36 witch is "c"
Probability for this letter was 0.016374230879
loss for this input&target pair is 4.11204646779

Backward pass

The goal of the backward pass is to calculate all gradients.
Gradients tell in witch direction you have to move your parameter to make a better model.

The naive way to calculate all gradients would be to recalculate a loss for small variations for each parameters. This is possible but would be time consuming. We have more than 20k parameters. There is a technic to calculates all the gradients for all the parameters at once: the backdrop propagation.
Gradients are calculated in the oposite order of the forward pass, using simple technics.

For instance if we have:

loss = a.x + b

If we want to minimize loss, we need to calculate d(loss)/dx and use it to calculate the new_x value.

new_x = x - d(loss)/dx * step_size

If new_loss is smaller than loss, it is a win: we succeed to find a better x input.

Lets do the math:
d(loss)/dx = d(a.x)/dx +d(b)/dx
d(loss)/dx = (d(a)/dx)1 + ad(x)/dx + 0
d(loss)/dx = 0 + a*1
d(loss)/dx = a


In [13]:
x = 10  
a = 3  
b = 7

loss = a+x + b
print 'initial loss =', loss
# dx stand for d(loss)/dx
dx = a #Calculate dx=d(loss)/dx analytically
step_size = 0.1
# use dx and step size to calculate new x
new_x = x - dx * step_size
new_loss = a+new_x + b
print 'new loss =',new_loss
if (new_loss<loss): print 'New loss is smaller, Yeah!'


initial loss = 20
new loss = 19.7
New loss is smaller, Yeah!

goal is to calculate gradients for the forward formula:

hs = input*Wxh + last_value_of_hidden_state*Whh + bh  
ys = hs*Why + by

This part need more work to explain the code, but here a great source to understand this technic in detail.

# Backdrop this: ys = hs*Why + by
dy=-1 # because the smaller the loss, the better is the model.
dWhy = np.dot(dy, hs.T)
dby = dy
dh = np.dot(Why.T, dy) + dhnext # backprop into h  

dhraw = (1 - hs * hs) * dh # backprop through tanh nonlinearity 

# Backdrop this: hs = input*Wxh + last_value_of_hidden_state*Whh + bh 
dbh += dhraw
dWxh += np.dot(dhraw, xs.T)
dWhh += np.dot(dhraw, hs.T)
dhnext = np.dot(Whh.T, dhraw)

In [14]:
# backward pass: compute gradients going backwards                                                                                                                                          
dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
dbh, dby = np.zeros_like(bh), np.zeros_like(by)
dhnext = np.zeros_like(hs[0])
t=0 #for t in reversed(xrange(len(inputs))):
dy = np.copy(ps[t])
dy[targets[t]] -= 1 # backprop into y   
#print dy.ravel()
dWhy += np.dot(dy, hs[t].T)
#print dWhy.ravel()
dby += dy
#print dby.ravel()
dh = np.dot(Why.T, dy) + dhnext # backprop into h                                                                                                                                         
dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity                                                                                                                     
dbh += dhraw
dWxh += np.dot(dhraw, xs[t].T)
dWhh += np.dot(dhraw, hs[t-1].T)
dhnext = np.dot(Whh.T, dhraw)
for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
  np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients
  #print dparam

Training

This last part of the code is the main trainning loop:

  • Feed the network with portion of the file. Size of cunck is seq_lengh
  • Use the loss function to:
    • Do forward pass to calculate all parameters for the model for a given input/output pairs
    • Do backward pass to calculate all gradiens
  • Print a sentence from a random seed using the parameters of the network
  • Update the model using the Adaptative Gradien technique Adagrad

Feed the loss function with inputs and targets

We create two array of char from the data file, the targets one is shifted compare to the inputs one.

For each char in the input array, the target array give the char that follows.


In [15]:
p=0  
inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
print "inputs", inputs
targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]
print "targets", targets


inputs [25, 49, 38, 2, 46, 48, 53, 49, 42, 49, 40, 8, 2, 56, 43, 38, 49, 2, 18, 53, 38, 40, 48, 53, 2]
targets [49, 38, 2, 46, 48, 53, 49, 42, 49, 40, 8, 2, 56, 43, 38, 49, 2, 18, 53, 38, 40, 48, 53, 2, 29]

Adagrad to update the parameters

The easiest technics to update the parmeters of the model is this:

param += dparam * step_size

Adagrad is a more efficient technique where the step_size are getting smaller during the training.

It use a memory variable that grow over time:

mem += dparam * dparam

and use it to calculate the step_size:

step_size = 1./np.sqrt(mem + 1e-8)

In short:

mem += dparam * dparam
param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update

Smooth_loss

Smooth_loss doesn't play any role in the training. It is just a low pass filtered version of the loss:

smooth_loss = smooth_loss * 0.999 + loss * 0.001

It is a way to average the loss on over the last iterations to better track the progress

So finally

Here the code of the main loop that does both trainning and generating text from times to times:


In [16]:
n, p = 0, 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad                                                                                                                
smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0                                                                                                                        
while n<=1000*100:
  # prepare inputs (we're sweeping from left to right in steps seq_length long)
  # check "How to feed the loss function to see how this part works
  if p+seq_length+1 >= len(data) or n == 0:
    hprev = np.zeros((hidden_size,1)) # reset RNN memory                                                                                                                                      
    p = 0 # go from start of data                                                                                                                                                             
  inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
  targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

  # forward seq_length characters through the net and fetch gradient                                                                                                                          
  loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
  smooth_loss = smooth_loss * 0.999 + loss * 0.001

  # sample from the model now and then                                                                                                                                                        
  if n % 1000 == 0:
    print 'iter %d, loss: %f' % (n, smooth_loss) # print progress
    sample(hprev, inputs[0], 200)

  # perform parameter update with Adagrad                                                                                                                                                     
  for param, dparam, mem in zip([Wxh, Whh, Why, bh, by],
                                [dWxh, dWhh, dWhy, dbh, dby],
                                [mWxh, mWhh, mWhy, mbh, mby]):
    mem += dparam * dparam
    param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update                                                                                                                   

  p += seq_length # move data pointer                                                                                                                                                         
  n += 1 # iteration counter


iter 0, loss: 102.771859
----
 iOaGkb;?nUkdb(wAW :,a.ztetLhS:VNBzJE'mWDtLSU(?mrh'kEtlMPInjI
ae?oINmD'mWhpA"nuSf? sw;)QkOJgtFyvYgUdu:Ug-bC;L'sdd;M().,xiGrxVWqwb,H("bDdyyICCc)aCgNHgTzrrmGutrEBW
)m-?SgE,yPglzt))YfS,ll)Bc":
owsBJCqwNNB 
----
iter 1000, loss: 84.873661
----
  hee hsl, Hpooep, arr gethefy c lhor bomins ;l ithed ha, olebhe tas ncuf hershe chore ef heg tmrec
anonvm, gon tad ghe tipert apsr e.. wat the strent o. ce nn thet  hetifit, t sos, hetiu inhunrneo, wh 
----
iter 2000, loss: 70.122666
----
   finulvet hooncely coolta Wwa thein t hat mout on atkar, apt nwer so hod moosey thabd. whiuc authiut camoachaky, mom -han
tom wwosle mine too
wous rowol. forimeid aoumt ondlif - hheg
s of mabvaut ahi 
----
iter 3000, loss: 62.053303
----
 e lallen the Gicede nfar heath,
anpet
mite repurlraling
cur, ent
th des wrow fald fis sedey
stmint ite whal Gre ccounir, ingmand an hiyis ditd ang hecy, ,ot. to fashasbe
I ank ftane on t his Ob th ,,  
----
iter 4000, loss: 57.657486
----
 ulles owaid waw hikpist alutde llesistuld oloye un picthit
ofpelcdis kerid too
gh thot inth Gr. ted acige woosinle k aro fak freaf hey hatic
tasithens wit moomy.  Bat vit sir roo anfos bos ould shled
 
----
iter 5000, loss: 56.447184
----
 Thturtirs, h" hit to bros fowsprecovourdfor iut whe Develad, ther famet
ir fo thas bisnt chabl mit a'f for. out whe beins the k
on
of her Grourd atr chi,e sald th  Tule, t, ciok be, bud tork, verqne
n 
----
iter 6000, loss: 55.264441
----
 t
hir to -o, te the moreretilvere wiply adeg aifreertoog ust of the lavises that untreeal'ns thet whou cere, bas onto tot o'th.
 Io sir ios hick wasled"'t to sptell wathe
thit mooting wad ored brof as 
----
iter 7000, loss: 52.592921
----
 toused wnat lit for hey q, ould's iny, the thouglwinfing lpes dut ony and his , of rut Grase lf then.  Greterastaled com, th the sumt ever the fromethternth thewent the ven"y Groug mocto
thiny peen on 
----
iter 8000, loss: 51.449716
----
 es h, to a dond gat wasly to dist hiss bxisenfule "mow seve. "Sacgegon to keoble cpedel ent opve comen hacmuid himcenome iw ont asled paeve not backed hely stos room womed oom of leithet wher yow be n 
----
iter 9000, loss: 50.655520
----
 loully sion.  urmued ilw ond gor's sable pastowcowe. 
Int wis ntars at out. 
Grsat bo andingrog to ghe extertan
ezof thellen ver, hero saipes he noug his if;. :", le it ald, wad
sout out nret bacd as  
----
iter 10000, loss: 51.148408
----
 le
betor.  And co last nos buck ision to the siqulpinighin! ""esseroke wing
th pamat,
Thadrels sougha llols he crecroom soremsle hel peat mere, butn hitheld woraclae chack ont sibnt he bertrou llencem 
----
iter 11000, loss: 50.508766
----
 mupseat eftrwale be machefe stobid.  Nol, deed he, uveed ghat he of to gom sike of cas to wiim and ably , on with wero had arance he fored hit hit once thelk deide to exweacea
sussiznet h aist of the
 
----
iter 12000, loss: 49.002135
----
 hed
of the ghall tove proming.  He to pood aster it betope thacs of exporewap af that and
cough with in outhelco ind for had that had beer an.  fMle his nothe gouls w's deen no wenther
hark wele fisci 
----
iter 13000, loss: 48.449597
----
 and anderren whet pull boe, ras futh him
felcey yeadeace tod
bagesy all had maed
the  hat at aflentionf wameat was mothed homkiat himsy the simning to herrewimes arpien hord and fer of then, was nothi 
----
iter 14000, loss: 48.207952
----
 e Gregeh they she stored alr bace  aOd at chingiss
tathund shack.  Grecore that thenes was waslling ino bameem so efadl
ques is oumhend apenly dokd selibd
doflore enghing hipal as inse'mclwone nonk be 
----
iter 15000, loss: 48.907801
----
 nd she was The nom.  Hlyen
in was loor, seoo
stired
and sed, and couls alsly doim onid cas woid,, epidi"sr. , and to ke was take for seve
time the kid, am that he he has eaiborgong
opinathe whimsis la 
----
iter 16000, loss: 48.000070
----
 e in there he at him lehruine had yound that to thelo mouth of wast all ald futume the chenf alle sarn and wad himseatif's had on'them thew wnow buld and no
mum reen his tor'bole caid mas smyord haven 
----
iter 17000, loss: 47.052738
----
 ly though whewart"y way with he tios ad to caan bo to wave bithen  Mranedse fo fisuned the purtent,; aurivire the all spree,
here of this
faid thit to ever this was tut a, deven whetel thew had uile w 
----
iter 18000, loss: 46.613418
----
 fomserte the dowtor
he come him, thitas door ghen bae thime hed penalr, trersenter.  Thime chag, able
seacteat sheremerong chen'l ut hinge agly forve. whan -
qut the gooad hissen, sfed ofthart.", evis 
----
iter 19000, loss: 47.133525
----
 
bee. Seather litped shist and maked the mowing.  Jramelt ust th e, soo there tood sarwly his lonly, intad she alat stand suthe with thing hers ith
seliig moughtr non frmed
wad the rtellamy wMrk srond 
----
iter 20000, loss: 48.061885
----
 o enyenciartecllouch coul sidcing, able withen.
him erover fom wath mole
cerswtisizis was be whing  ood dast.  By to dordiif, sure?" - seen here he clims sfing thes that ure
whath fout leer was yoa so 
----
iter 21000, loss: 46.719345
----
 s luct wonk his cithtired
the
penout
deinod park - had his for in- he
haver, asly torbed aillurs, ceall to to
enathat hirply of tha
lide of so thes gishingith
had siscede frough Yom.  The sifser as he 
----
iter 22000, loss: 45.884816
----
 or
of plournew, the  him, unames, to the 
ild asly fid on intte to to not pee heat and there aof th sted his fawcowt the tiste suth.  Os, male Gregor
flowe furloor, reat had fither.  He seonenom, apem 
----
iter 23000, loss: 45.671542
----
 r obe hep the wathany that seremabuitisse thoy tot sosed to aid ouking
the otimed abin!edy it the kad toce bame tow refpesion than it hat sith aiding to seatremeresten tad
oud sald but it top hin to w 
----
iter 24000, loss: 46.853643
----
 ide wint that gone to hit
pat heve had ting the lore's tader ande, opecons nos dowprotor.  Mrow of
to purunlyeoulk; antistonrs anled had more not in wat wotl, to lightruge wour hor, her
suted hunan's  
----
iter 25000, loss: 47.061258
----
 It himpenaling lous
of tros mowisplwert was
cost alled and whed and in hamas andivelf to ha fend yo
ghe ny's if the sall, But thid, of he heps to
on ableap could to thot lever ono
gonk dompires null e 
----
iter 26000, loss: 45.746991
----
 etles at had dantictay Inded simpobewthand deild he
feat and catle topentiffing buthen they in moruspainted the sicaithtlight lachte'nd.  He domsand
wim leprly alcplentrigh hor
the reeny ut get wadnit 
----
iter 27000, loss: 45.419832
----
 n hich him caate, the
trematilg aly.  Buse ttine her aepay; of pueming and leing.  Hiatse, in to hit she ver furmith a
disilester hip
he hones Gregor to kod carefut, at thher eryas usming on wo
rut he 
----
iter 28000, loss: 45.144714
----
 ed tbow yoursiinn
wak hif to dinawintiweved his
reen to the
was bute negridiged refordly tloy
warligiming ur but terelalr, furnoughthed and reat aboow he coand to his was to the dagarboning of thed th 
----
iter 29000, loss: 46.233013
----
 ing intas "f am ofith to be
hars qua dound  rack blede somelf
his but.  He stack tuther
to suthep gef ted vis mleel doaike ford
not at firl, to he reenf he had withar whe conly nes mentirter refhan wi 
----
iter 30000, loss: 46.070603
----
 y now stortire. ." urive stilling enod to fidale lent to werifle he corsle ith erelfing really fare to have reath yer.  Their, to Greilly seat rourd oning roor
alke fullly wouly nof, mutred, would thi 
----
iter 31000, loss: 44.834632
----
 omsane was iths fory, door botind he ofill of
would hing beer and his singly at backay they wenfure halst been thed and whele endors if foring and to would slae hes ast erely jusmett conw aghe room in 
----
iter 32000, loss: 44.809857
----
 was was the
muth him no
bentelft alr any whe call it was of
gut way the onsiwting to donbaped and
in her ardene way hil thowing bletbullyos onitimetint!
him pad, would fald ers
wim as they was gather. 
----
iter 33000, loss: 44.672250
----
 handin; wo k", "e congen gentmentors the had he cresill sa that of the room on whios go flopaly her. 
He rangaking on chat
he thall and his
sa firs it, babad he hand alf they hust's roord," and fork a 
----
iter 34000, loss: 45.456074
----
 ad th uchinis allage pading to! gakethin't leave anve pohs thes thouse cloul "Gregor the choor.  And he now it hes to he.  Het everor be's for? Hous'th at himsing it rous it rosm camlicine liygon as i 
----
iter 35000, loss: 45.000613
----
 od
of
ereaar - werd eniwe reapple lond  Grona beef, fraintroding in dome.  Gregor it ontroage as door was wes veruge filly ureed he nild? bad gonthar tothen famed was to
his faread riol make dist nide 
----
iter 36000, loss: 44.181932
----
 she wauch lecke a bund chenheded to her
sapy chere comout it mot as been for the filgher of all cane have that cee
- peat out opes him simetage weile not ort
with
thas and, and to repcrunt homed heatl 
----
iter 37000, loss: 43.932254
----
 tsen somsimoustout to dist her, leak restenter the hopey herreave samelite,
the laf; soven Grest quitol, there comeltaighap mome in hak to had suster of
crazed aruttain, cistreced on,
been
fompen henc 
----
iter 38000, loss: 44.421262
----
  reay
as
he to seact omor torea flit thed a foul upt hind.  Id mos yor, boyd her and puldivaveway and her be reen as apleto was uncongtrem, was dr the misibreray, apdefprethen miak ard come and to hha 
----
iter 39000, loss: 45.395142
----
  soygoon, a bed alr was by, shis sfuld been ly theimssed, in core."  I'm, distof
he nablly to necr cong rNenly to diat that to ke him cead, every of go
you ill hel to been seven not
his parfatrus no g 
----
iter 40000, loss: 44.448459
----
 med
it there but whowe purlige sidand to frmers go that where side hood
to see him to to the chatt of the
ceachout his fadibed thene fockibe he we cest to get There he arseminghing nfugh the piouscess 
----
iter 41000, loss: 43.715473
----
 rout at he was restald there unt what 
bather lild door beltes fverov, the vrom uidt ayeped, not for the eas; wighrreay biturn butl uidastmen she save cound not becrel heit the pack dome
waspay, in to 
----
iter 42000, loss: 43.480933
----
 ick op and the lemplearas was as wherestemowo his mother alias more was ereat then the deshat.  ""'s alm the sable roun thay ibmeding hea dicung the ricing had they lade ar, ary
Gr good,, theims -efin 
----
iter 43000, loss: 44.505852
----
 on saly
bet? "Ow.  I' torder, "ees.  Gregor be non gentod he clald and alf and cist timand or a nove courr that
to and wank out perncite, way ceacher atand sumain am woulsented; the chisted to gas it, 
----
iter 44000, loss: 44.762261
----
 id gettleck of the, mowhanginess or
hour cis there hadrersolk be fling thompsars hirrinith.  Gregor whad to had sallidny wasle in., lead ond himsed
and bet
ho seadly pliy.  Mny cont, his heat the aid  
----
iter 45000, loss: 43.822472
----
 ely this ranfien and the fand now as some bexuled then, she of irto
houlm tor evening of cllasinckesfuld had had his sumnerainst com-spratither pvery soase sainy so had that his
now s, and with od sal 
----
iter 46000, loss: 43.471045
----
 had co therether at exublebone the enxireen hered appeect, wouved out had bed ir sible waplouse though, monemeritt cor hirrirnkyo to move to sleer thares, mormatpsennoat the
anding hit hith, at outher 
----
iter 47000, loss: 43.292716
----
 nting of his, Gregoud to it.  "Gy
hercedy Gres a frdetf out
woser
themstadr the sleache wanded. 
Thet weser tamutclered, areworf hey latharite,
brimettroursien aliged ith sporing whakercessly wrom his 
----
iter 48000, loss: 44.591429
----
 en, the
strong cas:
gat yous and nout in that were
ast the copt abouthed purrith, ho fat he wadreariceachion, pact whingeingore oulcarr Gregor's sisting's fakentone
ha ghersenateable but pat butw be G 
----
iter 49000, loss: 44.479155
----
 lemstelfing.  Gregor thay though hor sef, injuon, anding evenficher, everighere to prear leall of fire the vor gote hip had
pat tikentarte shach the she he weide and she roing by a descan to deswatoly 
----
iter 50000, loss: 43.319191
----
 the the rain, and howl fand a moms reither bang has hall hersetims his notiswould it.  Ontire leay, wtuld; of cricay, notlech have a hake on to she it bien fation thourneyst aided the ristale andy but 
----
iter 51000, loss: 43.290763
----
 t for every wele wanted clewe it to lespednanct becase "elly agentrong in
as cantile his falf his stactood net a comrintely as, out at ts explingarry ittinge even to Gres.  He heatiste bed into him, w 
----
iter 52000, loss: 43.135467
----
  othest if ly.

"Comereving he als
sinnsiungruther mocreding int concole srow, Gregor's beand wingone go blabreauthen of but hak even they hers that shear, was cat ast, in they yous mith that without
 
----
iter 53000, loss: 44.011039
----
  in wammike
 Mro, ne siot able the ops self aide the other, and the
played, foudared opery w"ime to that had ly theigher take the look thac usted quiped him, to bace
orenuled entwat the room thelf hea 
----
iter 54000, loss: 43.622811
----
 oure.  He weple his did quipt hise chat wimssono thithen  upmy oto at had bea laying in thing this been, the wised of the shis was frnimelfarmen.  Gregor had the meand, sumbowas nin siled a riour his  
----
iter 55000, loss: 42.689613
----
 queparen trist now lordemestast his lowny ursune
been
inh with the room out and sustly
Mneteffart rowng on alr to cired thaus in her all to merelfursets therrswent shicl and ale murpery aid, the say s 
----
iter 56000, loss: 42.712604
----
 otile and gork fiored sowd oth werle of him precisterd tell expliped in Grence, irwthea'ster had enough tudenountwon't he
couk
out he bed gote.  That had rissing mob hall her and he could almstroatsly 
----
iter 57000, loss: 43.096987
----
 n ged and enterids dess
been evenout on to
time led thenivion a difray the filt ingo
was arbiging the ltaver tum. s oveind his not rake dowrilind
mogher, himses, shour, wotly os rapered ouft the fle n 
----
iter 58000, loss: 43.931446
----
 xarm the bact invawently bonit, warned an have to de serswous
and.  "Ho shrelf, swa
fallthing
fatse a and gas eved his expery moukhit, of able in kivent whet iver threen
agethaiked had tordying as
kef 
----
iter 59000, loss: 43.196573
----
 f fomet the sideded
to havh hist sust beco rimm an it he reatusceter the furn the sisced with omse lformarte the werrying threlf he was stork formet his timeJned, was "eall
jumself
ans from he could a 
----
iter 60000, loss: 42.410234
----
 e cours
at Gregor of
how in
had whigher leally, wfury on to thewfilenands queet was Gregor shaded
in the chanit as wher even appreation aparetmentoos brise that  For wiod sa torse tot
ham.  He came hi 
----
iter 61000, loss: 42.353292
----
 owen vien a
doom wnaned tarkne her one stion als, him his pory for't bess, sinftrivereading festroon
Gregor's wakings; in onjeched aclayifive.  "Yor habewing early coulced wampear been", Gregor segete 
----
iter 62000, loss: 43.192527
----
 the somance't corgon
"food and the buct
of thit he simpe) much exive gehe gead, any sustion a roying corgissertent song the was
undecharesidn the t, at what it, and haver a freatcoom to mut dowfach le 
----
iter 63000, loss: 43.605465
----
 n how thom trom wird his vor hers be sillide
sned goishing, he cless
and there had out oply with he had of had that he would susst was and kited of wassing, while at lelf their convention? It not tad  
----
iter 64000, loss: 42.703089
----
  fhespleid lethel
his
sfar word room
soidn a ple reegoned
the
waund reons he the being reaniffelw dreaking falled
him
ago to mufterpennister hear a leef a doith and itlyer the flas reelald now",
andsi 
----
iter 65000, loss: 42.307132
----
 even havied an him by foud uut her abouts evy ungints awer aboly his
been he coul up aboe
mowe buseid nowoun, tha room now to drabpenine food, out be that dont; stions but seat mous
it ide whole up cl 
----
iter 66000, loss: 42.252442
----
 gon had hay sightre that she
doughis the she to dights would beand to even hapde reother thine would nindthers, out poosly dois bufredmy
not her in the stact on the
prolbeaver as keell his to
antsice, 
----
iter 67000, loss: 43.444681
----
 p.  At he weme heversed.

Ithabered, and she enterlying tles it.  I wise tond without proble each.  Shes with fancesum, for upjor the fletres much: "Jurilt here out it beas the lovely lork of the toin 
----
iter 68000, loss: 43.532603
----
 ilde liye than
him.

Breelvee slours his furing pack if the freen
derk hander wa pust thes all then you dest hin the coul bad than eat thing, so hand that thime- to asmy fly had oning bed outh pale ke 
----
iter 69000, loss: 42.511739
----
 n
had mase Gregor de sister the iny in, ththe wams hifr at lefs, the ais the restrooded out"'s had ast Withed was had not him fane be that quicore he wast bet with int with a toled, bvealent undentrou 
----
iter 70000, loss: 42.292701
----
 e"d but brease to peented" he had sherong ring effer wo dap, to abling that, that heinfile his dakias dis room she and id onsh they beed -wan sood did sisture how shitout, of Gregor opengur, whalk, Gr 
----
iter 71000, loss: 42.332440
----
  but was not dowe;
that wen tmanterned acarotlingentmaid the of orgeustionst say when be this to his morned youst of he
wotend ily coorce anyd as ighhaunst room it
mote to gen now wher - to She fily r 
----
iter 72000, loss: 43.056898
----
 riemw out
the rocite cle mole elf thele
wam drar to Gregor things the keet boyglas upsed fitgons so the been from himpeew had helt his forverned timem any of was become
wher lit whirring everyoin, and 
----
iter 73000, loss: 42.671558
----
 s sosh onely filid seen.

She diimempnith! It
heat, dorster would "othever
elvery and an
would stapced.  Gregor. 
UNra temprough and ather, as if he ver fees meren chice prowreasibut emp asour alt; an 
----
iter 74000, loss: 41.943240
----
 w even bed evingow living over alpisters formared whund premet had the
fert be o sisteat can
alven, bove, when mughat of side the doon
he caaked thing woveltily carrerided of the sime a sored tovequra 
----
iter 75000, loss: 41.960831
----
 his heave dige; ain sed to doot his
carent
strap had the chinded awteef mully whenry.  Grchas ustalssy elrood it's rate be
trage streatfor the how the her wos coudlenpy thicwant nowing sode liftlecour 
----
iter 76000, loss: 42.124438
----
 all litta letthoust had read now?d was "Yete
frong nowo stall her the fathen to him.  The onteely thow she it Gre meather
and was his shad ontins.
 Wivemed whe nack mach seathicuitury, bped ftaif the  
----
iter 77000, loss: 42.769596
----
 re tunithinur as had meremanly come in that thelr at whee
were wind he copes.  Nemore to a freatanal he wan evenegs mood e of of the oth
almsuped react without - on thre's ins beetinnibly.  Yovemer th 
----
iter 78000, loss: 42.181239
----
  the lived ore fremer, com.  Hhe lay outtousted une hingiverut for), buenssed from were he eadly with toisk of crack that Gregor heaf the had pubwought hal reratiest dove frung to could to you
on thec 
----
iter 79000, loss: 41.601897
----
 ut say, a tuted handpenther even of cuad a lowly andely, the
from him, in nown so dook and in thes, she was way", puther
he crajuol, nother mpact bem the ol's not it's disor been farry have made had d 
----
iter 80000, loss: 41.544162
----
  apparn,, her indid for the fady now he neapso eyck by Greg time ht he htraxinsting, in pleat erhabins
the door his foncy if his eppiss.
", or thep, expeceout
even even thard toor
on and he onot cast, 
----
iter 81000, loss: 42.151469
----
  operdound thearffomen and and lether. "But 
epeatits compare, "roms.  "It Paid was ur. yery ommed intiog thereven just a praate cerscicking wart:
"And
they
wastont ound,
wasreat.  Arm, was newore, ma 
----
iter 82000, loss: 42.878405
----
 lamay bed gould out somered as clell inhe to enty time had speciticisifage same reying whiling.  Mage
he hims
coom.  Agk,
"Gregire, ot
her sidt, lates if custrowike be be breevel?
Thad
no Mut", persoi 
----
iter 83000, loss: 41.986615
----
 ters
it.  Gregor out the ase
she had flott unkis.  "I was ronented, inthe
the sing and so the beced cly
sive her for the cery gould
hards, roornened of
streaturd a mowe
letf in overy inmast enproentio 
----
iter 84000, loss: 41.241120
----
 , had fike sings of Gregor fored even his foplow prom, of
dikide had un houch the of he himsebodem; nid he hall
desself
to shrement could way it was sistle louk as he her on he was yotay, no des for h 
----
iter 85000, loss: 41.489131
----
 mores ag
his learf.  Juther tole she puikes.  It
worenk Grete, Aat to kit murpentems hert buck undefst in then sishe hissid
to the win
soon's alurtelvelwto blieroon his able of hards of
the cours, he  
----
iter 86000, loss: 42.668925
----
 os
that the dister an, lovelown bed aper cle
and reeppeds.  By to ss?by bove of in roiching out
fell on the
save to by injom,
he
ceas
it a sullsspanible It
he lift, updone a ling his moths abroned we  
----
iter 87000, loss: 42.873488
----
  illuvine, all to Srow sape all enterupsh or.  Thelest of the daped eren expeatelf then leer waoned to a had room ain amayeve allly to fald prombe intont", irtutly, bet it would sistlelurd hady into m 
----
iter 88000, loss: 41.661146
----
  leen abe tunodeds mick that he stainy to thick, mecale rescall the here of at hifle was now fill con't all be to more had yortister and bent have to be that his seeding the
lery to do ons you sentore 
----
iter 89000, loss: 41.561881
----
 mole, the way nothing with.

inded her he pule sarss; hil other all forch? Sem in anyding. Yeule he leathis carly hit adelved fouve the biftaatly for had on rave operet out pall have had serplice, shi 
----
iter 90000, loss: 41.619195
----
 isting therd com.  But to cetcalo
fle be his slock
thatslyo to gher not now, enttssen, had reacked undous seavioutwnesing was bemed on torfummilf sumssse the chain
thay and liod methe on the carp thin 
----
iter 91000, loss: 42.458557
----
  craying he soid with the stremd and the'rresse quice the mocled in at emereabe befock the mory been conser
dwould dill a forks go
could nothing cterreen be whove soon's coodshen, be the tolk ha lough 
----
iter 92000, loss: 42.160907
----
  he had faked its - Gregor fing had was ento hese a she
stomed, and expan. 
"Ix Sithouf. "Could lid ceppostontars
stros llad but towis lepent, and agaif
fithing asked the chisple.  The giles morsto
st 
----
iter 93000, loss: 41.307235
----
 ? Wat every by and soundt!"
from some for's have even bed".
"Seche, sorded other the exped had lets the chenr to all that on, Gregor on cosped.  Gregor was prestliture uchions, reaseith, harde ad nead 
----
iter 94000, loss: 41.396381
----
 e cartraatemeve.  Bonicain, hither ancegal
was do that whing unbeemed re? and of entispolf his moth withsrounecine way
and
the ditere is Gregor's ceatlyed oring gosellespent theis; to awhon than had u 
----
iter 95000, loss: 41.388006
----
 de ard she him with the doom thingloiss daply anking now nex daund meadly other to
seresenther, loor that ent eachatily the alation he willowting ba the wquening im and not and beforess agaying chasie 
----
iter 96000, loss: 42.180598
----
 ing the get wofkard and stolchoutted wat ir
himnaze
or reawit would he no whom. 
Duthars on the sistrinove frocly mote
the clerk a labody iny
st. Shes stipred!", wnom wampoure crolruirs, all impaped h 
----
iter 97000, loss: 41.516028
----
  if the would, it
soul the dise, studing noor would
his inceed lugh. Samseat apped now in abode sids alress; arnbo- jushed awlall the nonediftled
ttartited was been
sive ute the sionetilted in wond bu 
----
iter 98000, loss: 41.140045
----
 the lithlily nom in the from his been moshed to
him.
 His tork's
noto but he at need the
stak on go knate will nes was of the would
rackked.  He had ferced on so there what cleay arither
the stave foi 
----
iter 99000, loss: 41.060023
----
 red wald; Grese, aventer, thad his clo
ceathing the room
thucusly
he time as tuch
but hely in ghe it he showemst to jubovely even and tace, on evenyy, yigigh sone rilemed theie thentay, cly.

Them
the 
----
iter 100000, loss: 41.516106
----
 ced it them
fore his soont but peicborand his manest rentabed with go to notht, in les with,t the lith a clas he evelece cerfite; even and greemaching roung now one britgon.  The makepars and to ir, w 
----

Feedback welcome @dh7net!