Auto Generate Text for <<默读>>


In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')


WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

WARNING (theano.sandbox.cuda): Ignoring call to use(1), GPU number 0 is already in use.

In [2]:
%matplotlib inline
import utils;
from utils import *
from keras.layers import TimeDistributed, Activation
from keras.callbacks import ModelCheckpoint
from numpy.random import choice


Using Theano backend.

Setup


In [3]:
path = 'text/modu.txt'
text = open(path).read()
text = text.replace(' ', '')
text = text[-200000:]
print('corpus length:', len(text))


corpus length: 200000

In [4]:
!tail {path} -n10










  “每一天都是一个新的日子,走运当然是好的,不过我情愿做到分毫不差,这样,运气来的时候,你就有所准备了。”――《老人与海》

In [5]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars: ', vocab_size)


total chars:  3057

Sometimes it's useful to have a zero value in the dataset, e.g. for padding


In [6]:
chars.insert(0, "\0")
''.join(chars[:16])


Out[6]:
'\x00\n!%-0123456789='

In [7]:
char_indices = dict((c, i) for i,c in enumerate(chars))
indices_char = dict((i, c) for i,c in enumerate(chars))
idx = [char_indices[c] for c in text]

In [8]:
idx[:10]


Out[8]:
[3053, 315, 2707, 453, 1189, 2673, 2847, 2020, 989, 122]

In [9]:
''.join(indices_char[i] for i in idx[:20])


Out[9]:
',冲过去把车门砸开了,刚把人拖出来,那边'

Our LSTM RNN!

Now, we will try to implement the typical structure of RNN - i.e. the rolled one.

That is, we cannot use c1, c2, c.... Instead, we will need an array of inputs all at once.


In [10]:
seq_length = 100
dataX = []
dataY = []
for i in range(0, len(idx) - seq_length, 1):
    seq_in = idx[i:i+seq_length]
    seq_out = idx[i+seq_length]
    dataX.append(seq_in)
    dataY.append(seq_out)
n_patterns = len(dataX)
n_patterns


Out[10]:
199900

Now that we have prepared our training data we need to transform it so that is it suitable for use with Keras.

First we must transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network

Next, we need to rescale the integers to [0, 1] to make the patterns easiers to learn by the LSTM network that uses the sigmoid activation function by default

Finally, we need to convert the output patterns into one-hot encoding. This is so that we can configure the network to predict the probability of each of the 47 different characters in the vocabulary (an easier representation) rather than trying to force it to predict precisely the next character.


In [11]:
X = np.reshape(dataX, (n_patterns, seq_length, 1))
print(X.shape)
X = X / float(vocab_size)
y = np_utils.to_categorical(dataY)


(199900, 100, 1)

In [12]:
print(y.shape)


(199900, 3057)

We can now define our LSTM model. Here we define a single hidden LSTM layer with 256 memory units. The network uses dropout with a probability of 20. The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the 3000+ characters between 0 and 1.


In [13]:
model = Sequential()
model.add(LSTM(512, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam())

The network is slow to train (about 300 seconds per epoch on an Nvidia K520 GPU). Because of the slowness and because of our optimization requirements, we will use model checkpointing to record all of the network weights to file each time an improvement in loss is observed at the end of the epoch. We will use the best set of weights (lowest loss) to instantiate our generative model in the next section.


In [14]:
# define the checkpoint
filepath = "weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

In [15]:
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
lstm_1 (LSTM)                    (None, 512)           1052672     lstm_input_1[0][0]               
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 512)           0           lstm_1[0][0]                     
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 3057)          1568241     dropout_1[0][0]                  
====================================================================================================
Total params: 2,620,913
Trainable params: 2,620,913
Non-trainable params: 0
____________________________________________________________________________________________________

In [16]:
model.fit(X, y, nb_epoch=4, batch_size=256, callbacks=callbacks_list)


WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
Epoch 1/4
199680/199900 [============================>.] - ETA: 0s - loss: 6.1176Epoch 00000: loss improved from inf to 6.11768, saving model to weights-improvement-00-6.1177.hdf5
199900/199900 [==============================] - 423s - loss: 6.1177   
Epoch 2/4
199680/199900 [============================>.] - ETA: 0s - loss: 5.8995Epoch 00001: loss improved from 6.11768 to 5.89950, saving model to weights-improvement-01-5.8995.hdf5
199900/199900 [==============================] - 422s - loss: 5.8995   
Epoch 3/4
199680/199900 [============================>.] - ETA: 0s - loss: 5.7929Epoch 00002: loss improved from 5.89950 to 5.79290, saving model to weights-improvement-02-5.7929.hdf5
199900/199900 [==============================] - 422s - loss: 5.7929   
Epoch 4/4
199680/199900 [============================>.] - ETA: 0s - loss: 5.7023Epoch 00003: loss improved from 5.79290 to 5.70215, saving model to weights-improvement-03-5.7022.hdf5
199900/199900 [==============================] - 423s - loss: 5.7022   
Out[16]:
<keras.callbacks.History at 0x7f8ce9570ac8>

In [22]:
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
# start=-1
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([indices_char[value] for value in pattern]), "\"")


Seed:
" 叫……
作者有话要说:  注:“坏嘎嘎是好人削成的”――《骆驼祥子》老舍
另外,133章中郎乔日志的“1月16日”修改成“1月6日”,因为后面杀手证词中和老张接触的时间是11日,时间上有点小bug=w "

In [ ]:
# generate characters
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = indices_char[index]
    seq_in = [indices_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print "\nDone."

In [ ]:


In [ ]:


In [ ]:


In [ ]:

Stateful model with keras

stateful=True means that at end of each sequence, don't reset the hidden activations to 0, but leave them as they are. And also make sure that you pass shuffle=False when you train the model.

A stateful model is easy to create (just add "stateful=True") but harder to train. We had to add batchnorm and use LSTM to get reasonable results.

When using stateful in keras, you have to also add 'batch_input_shape' to the first layer, and fix the batch size there.


In [64]:
bs=64

In [65]:
model=Sequential([
        Embedding(vocab_size, n_fac, input_length=cs, batch_input_shape=(bs,cs)),
        BatchNormalization(),
        LSTM(n_hidden, return_sequences=True, stateful=True),
        TimeDistributed(Dense(vocab_size, activation='softmax')),
    ])


/home/ubuntu/anaconda2/envs/py36/lib/python3.6/site-packages/keras/engine/topology.py:368: UserWarning: The `regularizers` property of layers/models is deprecated. Regularization losses are now managed via the `losses` layer/model property.
  warnings.warn('The `regularizers` property of '

In [66]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())

Since we're using a fixed batch shape, we have to ensure our inputs and outputs are a even multiple of the batch size.


In [67]:
mx = len(x_rnn)//bs*bs

In [68]:
model.fit(x_rnn[:mx], y_rnn[:mx], batch_size=bs, nb_epoch=10, shuffle=False)


Epoch 1/10
102272/102272 [==============================] - 86s - loss: 4.9404    
Epoch 2/10
102272/102272 [==============================] - 84s - loss: 4.2808    
Epoch 3/10
102272/102272 [==============================] - 84s - loss: 4.0796    
Epoch 4/10
102272/102272 [==============================] - 83s - loss: 3.9579    
Epoch 5/10
102272/102272 [==============================] - 83s - loss: 3.8693    
Epoch 6/10
102272/102272 [==============================] - 84s - loss: 3.7999    
Epoch 7/10
102272/102272 [==============================] - 84s - loss: 3.7424    
Epoch 8/10
102272/102272 [==============================] - 84s - loss: 3.6937    
Epoch 9/10
102272/102272 [==============================] - 84s - loss: 3.6515    
Epoch 10/10
102272/102272 [==============================] - 84s - loss: 3.6142    
Out[68]:
<keras.callbacks.History at 0x7fc6476da7b8>

Test model


In [69]:
def get_next_keras(inp):
    idxs = [char_indices[c] for c in inp]
    # np.newaxis is used to add 1 more dimention
    arrs = np.array(idxs)[np.newaxis, :]
    p = model.predict(arrs)[0]
    return chars[np.argmax(p)]

In [73]:
model.predict(x_rnn[-64:])[0]


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
theano/scan_module/scan_perform.pyx in theano.scan_module.scan_perform.perform (/home/ubuntu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.6.2-64/scan_perform/mod.cpp:4490)()

ValueError: dimension mismatch in args to gemm (64,256)x(256,256)->(32,256)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    883             outputs =\
--> 884                 self.fn() if output_subset is None else\
    885                 self.fn(output_subset=output_subset)

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/scan_module/scan_op.py in rval(p, i, o, n, allow_gc)
    988                  allow_gc=allow_gc):
--> 989             r = p(n, [x[0] for x in i], o)
    990             for o in node.outputs:

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/scan_module/scan_op.py in p(node, args, outs)
    977                                                 outs,
--> 978                                                 self, node)
    979         except (ImportError, theano.gof.cmodule.MissingGXX):

theano/scan_module/scan_perform.pyx in theano.scan_module.scan_perform.perform (/home/ubuntu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.6.2-64/scan_perform/mod.cpp:4606)()

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 

~/anaconda2/envs/py36/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value

theano/scan_module/scan_perform.pyx in theano.scan_module.scan_perform.perform (/home/ubuntu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.6.2-64/scan_perform/mod.cpp:4490)()

ValueError: dimension mismatch in args to gemm (64,256)x(256,256)->(32,256)
Apply node that caused the error: GpuGemm{no_inplace}(GpuSubtensor{::, int64::}.0, TensorConstant{0.20000000298023224}, <CudaNdarrayType(float32, matrix)>, lstm_4_U_o_copy[cuda], TensorConstant{0.20000000298023224})
Toposort index: 5
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]
Inputs shapes: [(32, 256), (), (64, 256), (256, 256), ()]
Inputs strides: [(8192, 1), (), (256, 1), (256, 1), ()]
Inputs values: ['not shown', array(0.20000000298023224, dtype=float32), 'not shown', 'not shown', array(0.20000000298023224, dtype=float32)]
Outputs clients: [[GpuElemwise{Composite{(clip((i0 + i1), i2, i3) * tanh(i4))},no_inplace}(CudaNdarrayConstant{[[ 0.5]]}, GpuGemm{no_inplace}.0, CudaNdarrayConstant{[[ 0.]]}, CudaNdarrayConstant{[[ 1.]]}, GpuElemwise{Composite{((clip((i0 + i1), i2, i3) * i4) + (clip((i0 + i5), i2, i3) * tanh(i6)))},no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-73-582c6a6027f4> in <module>()
----> 1 model.predict(x_rnn[-64:])[0]

~/anaconda2/envs/py36/lib/python3.6/site-packages/keras/models.py in predict(self, x, batch_size, verbose)
    714         if self.model is None:
    715             self.build()
--> 716         return self.model.predict(x, batch_size=batch_size, verbose=verbose)
    717 
    718     def predict_on_batch(self, x):

~/anaconda2/envs/py36/lib/python3.6/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose)
   1217         f = self.predict_function
   1218         return self._predict_loop(f, ins,
-> 1219                                   batch_size=batch_size, verbose=verbose)
   1220 
   1221     def train_on_batch(self, x, y,

~/anaconda2/envs/py36/lib/python3.6/site-packages/keras/engine/training.py in _predict_loop(self, f, ins, batch_size, verbose)
    895                 ins_batch = slice_X(ins, batch_ids)
    896 
--> 897             batch_outs = f(ins_batch)
    898             if not isinstance(batch_outs, list):
    899                 batch_outs = [batch_outs]

~/anaconda2/envs/py36/lib/python3.6/site-packages/keras/backend/theano_backend.py in __call__(self, inputs)
    917     def __call__(self, inputs):
    918         assert isinstance(inputs, (list, tuple))
--> 919         return self.function(*inputs)
    920 
    921 

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    896                     node=self.fn.nodes[self.fn.position_of_error],
    897                     thunk=thunk,
--> 898                     storage_map=getattr(self.fn, 'storage_map', None))
    899             else:
    900                 # old-style linkers raise their own exceptions

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

~/anaconda2/envs/py36/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    882         try:
    883             outputs =\
--> 884                 self.fn() if output_subset is None else\
    885                 self.fn(output_subset=output_subset)
    886         except Exception:

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/scan_module/scan_op.py in rval(p, i, o, n, allow_gc)
    987         def rval(p=p, i=node_input_storage, o=node_output_storage, n=node,
    988                  allow_gc=allow_gc):
--> 989             r = p(n, [x[0] for x in i], o)
    990             for o in node.outputs:
    991                 compute_map[o][0] = True

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/scan_module/scan_op.py in p(node, args, outs)
    976                                                 args,
    977                                                 outs,
--> 978                                                 self, node)
    979         except (ImportError, theano.gof.cmodule.MissingGXX):
    980             p = self.execute

theano/scan_module/scan_perform.pyx in theano.scan_module.scan_perform.perform (/home/ubuntu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.6.2-64/scan_perform/mod.cpp:4606)()

~/anaconda2/envs/py36/lib/python3.6/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

~/anaconda2/envs/py36/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

theano/scan_module/scan_perform.pyx in theano.scan_module.scan_perform.perform (/home/ubuntu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-stretch-sid-x86_64-3.6.2-64/scan_perform/mod.cpp:4490)()

ValueError: dimension mismatch in args to gemm (64,256)x(256,256)->(32,256)
Apply node that caused the error: GpuGemm{no_inplace}(GpuSubtensor{::, int64::}.0, TensorConstant{0.20000000298023224}, <CudaNdarrayType(float32, matrix)>, lstm_4_U_o_copy[cuda], TensorConstant{0.20000000298023224})
Toposort index: 5
Inputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]
Inputs shapes: [(32, 256), (), (64, 256), (256, 256), ()]
Inputs strides: [(8192, 1), (), (256, 1), (256, 1), ()]
Inputs values: ['not shown', array(0.20000000298023224, dtype=float32), 'not shown', 'not shown', array(0.20000000298023224, dtype=float32)]
Outputs clients: [[GpuElemwise{Composite{(clip((i0 + i1), i2, i3) * tanh(i4))},no_inplace}(CudaNdarrayConstant{[[ 0.5]]}, GpuGemm{no_inplace}.0, CudaNdarrayConstant{[[ 0.]]}, CudaNdarrayConstant{[[ 1.]]}, GpuElemwise{Composite{((clip((i0 + i1), i2, i3) * i4) + (clip((i0 + i5), i2, i3) * tanh(i6)))},no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Apply node that caused the error: forall_inplace,gpu,scan_fn}(TensorConstant{8}, GpuSubtensor{int64:int64:int8}.0, GpuIncSubtensor{InplaceSet;:int64:}.0, GpuIncSubtensor{InplaceSet;:int64:}.0, TensorConstant{8}, lstm_4_U_o, lstm_4_U_f, lstm_4_U_i, lstm_4_U_c)
Toposort index: 84
Inputs types: [TensorType(int64, scalar), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), TensorType(int64, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(), (8, 32, 1024), (2, 64, 256), (2, 64, 256), (), (256, 256), (256, 256), (256, 256), (256, 256)]
Inputs strides: [(), (1024, 8192, 1), (16384, 256, 1), (16384, 256, 1), (), (256, 1), (256, 1), (256, 1), (256, 1)]
Inputs values: [array(8), 'not shown', 'not shown', 'not shown', array(8), 'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[GpuSubtensor{int64}(forall_inplace,gpu,scan_fn}.0, Constant{1})], [GpuSubtensor{int64}(forall_inplace,gpu,scan_fn}.1, Constant{1})], [for{gpu,scan_fn}(TensorConstant{8}, forall_inplace,gpu,scan_fn}.2, TensorConstant{8}, dense_4_W, GpuDimShuffle{x,0}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

In [ ]: