In [1]:
from __future__ import division, print_function
%matplotlib inline
from importlib import reload  # Python 3
import utils; reload(utils)
from utils import *


Using cuDNN version 5105 on context None
Mapped name None to device cuda0: GeForce GTX TITAN X (0000:04:00.0)
Using Theano backend.

In [2]:
from keras.layers import TimeDistributed, Activation
from numpy.random import choice

Setup

We haven't really looked into the detail of how this works yet - so this is provided for self-study for those who are interested. We'll look at it closely next week.


In [3]:
path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path).read().lower()
print('corpus length:', len(text))


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
corpus length: 600893

In [4]:
!tail -n 25 {path}


are thinkers who believe in the saints.


144

It stands to reason that this sketch of the saint, made upon the model
of the whole species, can be confronted with many opposing sketches that
would create a more agreeable impression. There are certain exceptions
among the species who distinguish themselves either by especial
gentleness or especial humanity, and perhaps by the strength of their
own personality. Others are in the highest degree fascinating because
certain of their delusions shed a particular glow over their whole
being, as is the case with the founder of christianity who took himself
for the only begotten son of God and hence felt himself sinless; so that
through his imagination--that should not be too harshly judged since the
whole of antiquity swarmed with sons of god--he attained the same goal,
the sense of complete sinlessness, complete irresponsibility, that can
now be attained by every individual through science.--In the same manner
I have viewed the saints of India who occupy an intermediate station
between the christian saints and the Greek philosophers and hence are
not to be regarded as a pure type. Knowledge and science--as far as they
existed--and superiority to the rest of mankind by logical discipline
and training of the intellectual powers were insisted upon by the
Buddhists as essential to sanctity, just as they were denounced by the
christian world as the indications of sinfulness.

In [5]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)


total chars: 58

In [6]:
chars.insert(0, "\0")

In [7]:
''.join(chars[1:-6])


Out[7]:
'\n !"\'(),-.0123456789:;=?[]_abcdefghijklmnopqrstuvwx'

In [8]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

In [9]:
idx = [char_indices[c] for c in text]

In [10]:
idx[:10]


Out[10]:
[43, 45, 32, 33, 28, 30, 32, 1, 1, 1]

In [11]:
''.join(indices_char[i] for i in idx[:70])


Out[11]:
'preface\n\n\nsupposing that truth is a woman--what then? is there not gro'

Preprocess and create model


In [12]:
maxlen = 40
sentences = []
next_chars = []
for i in range(0, len(idx) - maxlen+1):
    sentences.append(idx[i: i + maxlen])
    next_chars.append(idx[i+1: i+maxlen+1])
print('nb sequences:', len(sentences))


nb sequences: 600854

In [13]:
sentences = np.concatenate([[np.array(o)] for o in sentences[:-2]])
next_chars = np.concatenate([[np.array(o)] for o in next_chars[:-2]])

In [14]:
sentences.shape, next_chars.shape


Out[14]:
((600852, 40), (600852, 40))

In [15]:
n_fac = 24

In [16]:
model=Sequential([
        Embedding(vocab_size, n_fac, input_length=maxlen),
        LSTM(units=512, input_shape=(n_fac,),return_sequences=True, dropout=0.2, recurrent_dropout=0.2,
             implementation=2),
        Dropout(0.2),
        LSTM(512, return_sequences=True, dropout=0.2, recurrent_dropout=0.2,
             implementation=2),
        Dropout(0.2),
        TimeDistributed(Dense(vocab_size)),
        Activation('softmax')
    ])

In [17]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())

Train


In [18]:
def print_example():
    seed_string="ethics is a basic foundation of all that"
    for i in range(320):
        x=np.array([char_indices[c] for c in seed_string[-40:]])[np.newaxis,:]  # [-40] picks up the last 40 chars
        preds = model.predict(x, verbose=0)[0][-1]  # [-1] picks up the last char
        preds = preds/np.sum(preds)
        next_char = choice(chars, p=preds)
        seed_string = seed_string + next_char
    print(seed_string)

In [19]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)


Epoch 1/1
600852/600852 [==============================] - 656s - loss: 1.5087   
Out[19]:
<keras.callbacks.History at 0x7f4dbb55cf98>

In [20]:
print_example()


ethics is a basic foundation of all that all a process has hitherto standing (which now flows irresponsibility in morals, or must be congensed with his much tensive, believe among through attempt a sign of the following irpossible. this is also a far absolutely disagreeable, fourths of men to be,
he may these means of blood, including the negle in a fearful 

In [21]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)


Epoch 1/1
600852/600852 [==============================] - 659s - loss: 1.2929   
Out[21]:
<keras.callbacks.History at 0x7f4db4112668>

In [22]:
print_example()


ethics is a basic foundation of all that comes up with the note of god and virtue; perhaps, indeed,
the acted with which counterfeiging man, he himself always of his highest appearance, and
one
egoism, a mankind who again
and breaks so under the evil,
and
invention, whereas they have comprehended
that
they extended by the older, so that
it is only with that 

In [ ]:
model.optimizer.lr=0.001

In [ ]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)

In [ ]:
print_example()

In [ ]:
model.optimizer.lr=0.0001

In [ ]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)

In [ ]:
print_example()

In [23]:
model.save_weights('data/char_rnn.h5')

In [ ]:
model.optimizer.lr=0.00001

In [ ]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)

In [ ]:
print_example()

In [ ]:
model.fit(sentences, np.expand_dims(next_chars,-1), batch_size=64, epochs=1)

In [ ]:
print_example()

In [ ]:
print_example()

In [ ]:
model.save_weights('data/char_rnn.h5')

In [ ]: