1. Import data

We use data from the standard MNIST set



In [1]:

    
require 'torch'
require 'nn'
require 'optim'
mnist = require 'mnist'



In [2]:

    
fullset = mnist.traindataset()
testset = mnist.testdataset()



In [3]:

    
fullset









    Out[3]:





{
  data : ByteTensor - size: 60000x28x28
  size : 60000
  label : ByteTensor - size: 60000
}

We inspect the data just to get an idea of the content



In [4]:

    
itorch.image(fullset.data[1])









    












    Out[4]:



In [5]:

    
fullset.label[1]









    Out[5]:





5

We can split the full dataset into a trainin component and a validation component, which will be used to train hyperparameters.

While doing so, we convert the dataset to double



In [6]:

    
trainset = {
    size = 50000,
    data = fullset.data[{{1,50000}}]:double(),
    label = fullset.label[{{1,50000}}]
}



In [7]:

    
validationset = {
    size = 10000,
    data = fullset.data[{{50001,60000}}]:double(),
    label = fullset.label[{{50001,60000}}]
}

2. Create the model

We use a model with a single hidden layer, using a hyperbolic tangent activation, and a softmax output. We also use a first layer to reshape the input - which is a 28x28 square - to fit into the linear layer



In [8]:

    
model = nn.Sequential()



In [9]:

    
model:add(nn.Reshape(28*28))
model:add(nn.Linear(28*28, 30))
model:add(nn.Tanh())
model:add(nn.Linear(30, 10))
model:add(nn.LogSoftMax())

We also define a loss function, using the negative log likelihood criterion



In [10]:

    
criterion = nn.ClassNLLCriterion()

As explained in the documentation, the NLL criterion require the output of the neural network to contain log-probabilities of each class, and this is the reason for our use of LogSoftMax above.

3. Define the descent algorithm

We will make use of the optim package to train the network. optim contains several optimization algorithms. All of these algorithms assume the same parameters:

a closure that computes the loss, and its gradient wrt to x, given a point x
a point x
some parameters, which are algorithm-specific

We define a step function that performs training for a single epoch and returns the current loss value



In [11]:

    
sgd_params = {
   learningRate = 1e-2,
   learningRateDecay = 1e-4,
   weightDecay = 1e-3,
   momentum = 1e-4
}



In [12]:

    
x, dl_dx = model:getParameters()



In [13]:

    
step = function(batch_size)
    local current_loss = 0
    local count = 0
    local shuffle = torch.randperm(trainset.size)
    batch_size = batch_size or 200
    
    for t = 1,trainset.size,batch_size do
        -- setup inputs and targets for this mini-batch
        local size = math.min(t + batch_size - 1, trainset.size) - t
        local inputs = torch.Tensor(size, 28, 28)
        local targets = torch.Tensor(size)
        for i = 1,size do
            local input = trainset.data[shuffle[i+t]]
            local target = trainset.label[shuffle[i+t]]
            -- if target == 0 then target = 10 end
            inputs[i] = input
            targets[i] = target
        end
        targets:add(1)
        
        local feval = function(x_new)
            -- reset data
            if x ~= x_new then x:copy(x_new) end
            dl_dx:zero()

            -- perform mini-batch gradient descent
            local loss = criterion:forward(model:forward(inputs), targets)
            model:backward(inputs, criterion:backward(model.output, targets))

            return loss, dl_dx
        end
        
        _, fs = optim.sgd(feval, x, sgd_params)
        -- fs is a table containing value of the loss function
        -- (just 1 value for the SGD optimization)
        count = count + 1
        current_loss = current_loss + fs[1]
    end

    -- normalize loss
    return current_loss / count
end

Before starting the training, we also need to be able to evaluate accuracy on a separate dataset, in order to define when to stop



In [14]:

    
eval = function(dataset, batch_size)
    local count = 0
    batch_size = batch_size or 200
    
    for i = 1,dataset.size,batch_size do
        local size = math.min(i + batch_size - 1, dataset.size) - i
        local inputs = dataset.data[{{i,i+size-1}}]
        local targets = dataset.label[{{i,i+size-1}}]:long()
        local outputs = model:forward(inputs)
        local _, indices = torch.max(outputs, 2)
        indices:add(-1)
        local guessed_right = indices:eq(targets):sum()
        count = count + guessed_right
    end

    return count / dataset.size
end

4. Train the model

We are now ready to perform the actual training. After each epoch, we evaluate the accuracy on the validation dataset, in order to decide whether to stop



In [15]:

    
max_iters = 30



In [16]:

    
do
    local last_accuracy = 0
    local decreasing = 0
    local threshold = 1 -- how many deacreasing epochs we allow
    for i = 1,max_iters do
        local loss = step()
        print(string.format('Epoch: %d Current loss: %4f', i, loss))
        local accuracy = eval(validationset)
        print(string.format('Accuracy on the validation set: %4f', accuracy))
        if accuracy < last_accuracy then
            if decreasing > threshold then break end
            decreasing = decreasing + 1
        else
            decreasing = 0
        end
        last_accuracy = accuracy
    end
end









    Out[16]:





Epoch: 1 Current loss: 1,195126	







    Out[16]:





Accuracy on the validation set: 0,851100	







    Out[16]:





Epoch: 2 Current loss: 0,671635	







    Out[16]:





Accuracy on the validation set: 0,878200	







    Out[16]:





Epoch: 3 Current loss: 0,539315	







    Out[16]:





Accuracy on the validation set: 0,889300	







    Out[16]:





Epoch: 4 Current loss: 0,481529	







    Out[16]:





Accuracy on the validation set: 0,888900	







    Out[16]:





Epoch: 5 Current loss: 0,445348	







    Out[16]:





Accuracy on the validation set: 0,893900	







    Out[16]:





Epoch: 6 Current loss: 0,415275	







    Out[16]:





Accuracy on the validation set: 0,899900	







    Out[16]:





Epoch: 7 Current loss: 0,396152	







    Out[16]:





Accuracy on the validation set: 0,896600	







    Out[16]:





Epoch: 8 Current loss: 0,384622	







    Out[16]:





Accuracy on the validation set: 0,903400	







    Out[16]:





Epoch: 9 Current loss: 0,366199	







    Out[16]:





Accuracy on the validation set: 0,907000	







    Out[16]:





Epoch: 10 Current loss: 0,361586	







    Out[16]:





Accuracy on the validation set: 0,908600	







    Out[16]:





Epoch: 11 Current loss: 0,353269	







    Out[16]:





Accuracy on the validation set: 0,906900	







    Out[16]:





Epoch: 12 Current loss: 0,356900	







    Out[16]:





Accuracy on the validation set: 0,902900	







    Out[16]:





Epoch: 13 Current loss: 0,335293	







    Out[16]:





Accuracy on the validation set: 0,910800	







    Out[16]:





Epoch: 14 Current loss: 0,333813	







    Out[16]:





Accuracy on the validation set: 0,910300	







    Out[16]:





Epoch: 15 Current loss: 0,327831	







    Out[16]:





Accuracy on the validation set: 0,908000	







    Out[16]:





Epoch: 16 Current loss: 0,326355	







    Out[16]:





Accuracy on the validation set: 0,912100	







    Out[16]:





Epoch: 17 Current loss: 0,318958	







    Out[16]:





Accuracy on the validation set: 0,910500	







    Out[16]:





Epoch: 18 Current loss: 0,315163	







    Out[16]:





Accuracy on the validation set: 0,910900	







    Out[16]:





Epoch: 19 Current loss: 0,306940	







    Out[16]:





Accuracy on the validation set: 0,914300	







    Out[16]:





Epoch: 20 Current loss: 0,307442	







    Out[16]:





Accuracy on the validation set: 0,914800	







    Out[16]:





Epoch: 21 Current loss: 0,297775	







    Out[16]:





Accuracy on the validation set: 0,918100	







    Out[16]:





Epoch: 22 Current loss: 0,294781	







    Out[16]:





Accuracy on the validation set: 0,910700	







    Out[16]:





Epoch: 23 Current loss: 0,293960	







    Out[16]:





Accuracy on the validation set: 0,915300	







    Out[16]:





Epoch: 24 Current loss: 0,292646	







    Out[16]:





Accuracy on the validation set: 0,915300	







    Out[16]:





Epoch: 25 Current loss: 0,289542	







    Out[16]:





Accuracy on the validation set: 0,918000	







    Out[16]:





Epoch: 26 Current loss: 0,284534	







    Out[16]:





Accuracy on the validation set: 0,919300	







    Out[16]:





Epoch: 27 Current loss: 0,280262	







    Out[16]:





Accuracy on the validation set: 0,921800	







    Out[16]:





Epoch: 28 Current loss: 0,276074	







    Out[16]:





Accuracy on the validation set: 0,921200	







    Out[16]:





Epoch: 29 Current loss: 0,274100	







    Out[16]:





Accuracy on the validation set: 0,919800	







    Out[16]:





Epoch: 30 Current loss: 0,282132	







    Out[16]:





Accuracy on the validation set: 0,918500

Let us test the model accuracy on the test set



In [17]:

    
testset.data = testset.data:double()



In [18]:

    
eval(testset)









    Out[18]:





0,9121

5. Saving and restoring the model

The paths module can be used to manipulate filesystem paths



In [19]:

    
paths = require 'paths'



In [20]:

    
filename = paths.concat(paths.cwd(), 'model.net')



In [21]:

    
filename









    Out[21]:





/home/papillon/playground/itorch/model.net

We can then save our model to file like this



In [22]:

    
help(torch.save)









    Out[22]:





+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++	
torch.save(filename, object [, format])


Writes object into a file named filename . The format can be set
to ascii or binary (default is binary). Binary format is platform
dependent, but typically more compact and faster to read/write. The ASCII
format is platform-independent, and should be used to share data structures
across platforms.
 -- arbitrary object:
obj = {
   mat = torch.randn(10,10),
   name = '10',
   test = {
      entry = 1
   }
}
-- save to disk:
torch.save('test.dat', obj) 	
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



In [23]:

    
torch.save(filename, model)









    Out[23]:

Let us check that restoring from file works as expected



In [24]:

    
help(torch.load)









    Out[24]:





+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++	
[object] torch.load(filename [, format])


Reads object from a file named filename . The format can be set
to ascii or binary (default is binary). Binary format is platform
dependent, but typically more compact and faster to read/write. The ASCII
format is platform-independent, and should be used to share data structures
across platforms.
 -- given serialized object from section above, reload:
obj = torch.load('test.dat')
print(obj)
-- will print:
-- {[mat]  = DoubleTensor - size: 10x10
--  [name] = string : &quot;10&quot;
--  [test] = table - size: 0} 	
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



In [25]:

    
model1 = torch.load(filename)

We redefine our evaluation function to use the loaded model



In [26]:

    
eval1 = function(dataset)
   local count = 0
   for i = 1,dataset.size do
      local output = model1:forward(dataset.data[i])
      local _, index = torch.max(output, 1) -- max index
      local digit = index[1] % 10
      if digit == dataset.label[i] then count = count + 1 end
   end

   return count / dataset.size
end



In [27]:

    
eval1(testset)









    Out[27]:





0,0085