In [1]:
%pylab inline
figsize(10,5)
matplotlib.rcParams["image.interpolation"] = "none"
matplotlib.rcParams["image.cmap"] = "afmhot"
In [2]:
import clstm
import h5py
Let's start by getting the MNIST dataset. This version of MNIST in HDF5 represents images in a sequence format suitable for training with clstm command line models.
In [3]:
!test -f mnist_seq.h5 || curl http://www.tmbdev.net/ocrdata-hdf5/mnist_seq.h5 > mnist_seq.h5 || rm -f mnist_seq.h5
In HDF5 data files for CLSTM, row t represents the input vector at time step t. For MNIST, we scan through the original image left-to-right over time.
Image storage in HDF5 would have to be a rank 3 doubly ragged array, but HDF5 supports only rank 2 arrays. We therefore store image dimensions in a separate array.
In [4]:
h5 = h5py.File("mnist_seq.h5","r")
imshow(h5["images"][0].reshape(*h5["images_dims"][0]))
print h5["images"].shape
Let's use a bidirectional LSTM and a fiarly high learning rate.
In [5]:
net = clstm.make_net_init("bidi","ninput=28:nhidden=10:noutput=11")
net.setLearningRate(1e-2,0.9)
print clstm.network_info(net)
The class labels in the dataset are such that digit 0 has been assigned class 10, since class 0 is reserved for epsilon states in CTC alignment.
In [6]:
print [chr(c) for c in h5["codec"]]
In [7]:
index = 0
xs = array(h5["images"][index].reshape(28,28,1),'f')
cls = h5["transcripts"][index][0]
print cls
imshow(xs.reshape(28,28).T,cmap=cm.gray)
Out[7]:
Forward propagation is quite simple: we take the input data and put it into the input sequence of the network, call the forward method, and take the result out of the output sequence.
Note that all sequences (including xs) in clstm are of rank 3, with indexes giving the time step, the feature dimension, and the batch index, in order.
The output from the network is a vector of posterior probabilities at each time step.
In [8]:
net.inputs.aset(xs)
net.forward()
pred = net.outputs.array()
imshow(pred.reshape(28,11).T, interpolation='none')
Out[8]:
We now construct a "target" array and perform CTC alignment with the output.
In [9]:
target = zeros((3,11),'f')
target[0,0] = 1
target[2,0] = 1
target[1,cls] = 1
seq = clstm.Sequence()
seq.aset(target.reshape(3,11,1))
aligned = clstm.Sequence()
clstm.seq_ctc_align(aligned,net.outputs,seq)
aligned = aligned.array()
imshow(aligned.reshape(28,11).T, interpolation='none')
Out[9]:
Next, we take the aligned output, subtract the actual output, set that as the output deltas, and the propagate the error backwards and update.
In [10]:
deltas = aligned - net.outputs.array()
net.d_outputs.aset(deltas)
net.backward()
net.update()
If we repeat these steps over and over again, we eventually end up with a trained network.
In [11]:
for i in range(60000):
index = int(rand()*60000)
xs = array(h5["images"][index].reshape(28,28,1),'f')
cls = h5["transcripts"][index][0]
net.inputs.aset(xs)
net.forward()
pred = net.outputs.array()
target = zeros((3,11),'f')
target[0,0] = 1
target[2,0] = 1
target[1,cls] = 1
seq = clstm.Sequence()
seq.aset(target.reshape(3,11,1))
aligned = clstm.Sequence()
clstm.seq_ctc_align(aligned,net.outputs,seq)
aligned = aligned.array()
deltas = aligned - net.outputs.array()
net.d_outputs.aset(deltas)
net.backward()
net.update()
In [16]:
figsize(5,10)
subplot(211,aspect=1)
imshow(xs.reshape(28,28).T)
subplot(212,aspect=1)
imshow(pred.reshape(28,11).T, interpolation='none', vmin=0, vmax=1)
Out[16]: