There are few packages which can help with implementation of RNNs and their need for high performance calculations. I like caffe the most but it can be chalenging especially when it comes to adding new code as you need to deal with C++ and Cuda. There is also Theano, but I am not a great fun with heavy computational tree optimisation especially during evaluation stage. There is also Torch based on Lua which is ... well I don't know what Torch can do at this stage ...

Hence this post will be about implementing linear regression using Cuda Tensors and Torch 7. The example is loosely based on Torch7 and iTorch demos.

In [2]:
require 'cutorch';
require 'cunn';
require 'optim';

torch.setdefaulttensortype( 'torch.FloatTensor' )
logger = optim.Logger( paths.concat('.', 'train.log') )

For this exercise we will use fairly large table

In [3]:
x_len = 1000000
x_width = 2

X = torch.CudaTensor( x_len, x_width ):normal()
A = torch.CudaTensor{ {1}, {2} }
Y = X, A ) + torch.CudaTensor( x_len, 1 ):normal( 3.0, 1.0 )

Let's define linear layer to express our regression. NN package will take care of gradient derivation as well as forward and backward passes

In [4]:
lin_layer = nn.Linear( (#X)[2], (#Y)[2] )
model = nn.Sequential()
model:add( lin_layer )
criterion = nn.MSECriterion()
params, dl_dparams = model:getParameters()

In [5]:
sgd_params = {
    learningRate = 1e-3,
    learningRateDecay = 1e-4,
    weightDecay = 0,
    momentum = 0
epochs = 100
batch_size = 50000

In [6]:
function train( X, Y )
    local current_loss = 0

    -- mini input / target
    local inputs = torch.CudaTensor( batch_size, x_width )
    local targets = torch.CudaTensor( batch_size )
    -- we won't use shuffle over here as for loop is too slow in lua
    -- instead we will start from a random offset
    local offset = math.floor( torch.uniform( 0, batch_size-1 ) )
    -- for each mini batch
    for t = 1,(#X)[1], batch_size do

        local x_start = t + offset
        local x_end = math.min( t + offset + batch_size - 1, (#X)[1] )
        inputs[ {{1, x_end - x_start + 1}} ] = X[ {{x_start, x_end}} ]:clone()
        targets[ {{1, x_end - x_start + 1}} ] = Y[ {{x_start, x_end}} ]:clone()
        -- eval function to minimise 
        feval = function( params_new )
            -- clean up 

            if params ~= params_new then
                params:copy( params_new )

            -- reset gradients (gradients are always accumulated, to accomodate batch methods)

            -- evaluate the loss function and its derivative wrt x
            local outputs = model:forward( inputs )
            local loss = criterion:forward( outputs, targets )
            local backprop = criterion:backward( outputs, targets )
            model:backward( inputs, backprop )

            -- return loss and dloss/dparams
            return loss, dl_dparams

        -- run SGD
        _, fs = optim.sgd( feval, params, sgd_params )
        current_loss = current_loss + fs[1]
    current_loss = current_loss / batch_size
    logger:add{['training_error'] = current_loss }
    return current_loss

In [7]:
time = sys.clock()
local cumm_loss = 0.
for i = 1, epochs do
    cumm_loss = train( X, Y )

print( 'Final loss = ' .. cumm_loss )

-- time taken
time = sys.clock() - time
print( "Time per epoch = " .. (time / epochs) .. '[s]')

Final loss = 0.00040390299797058	
Time per epoch = 0.15690538883209[s]	

Let's take a look at recovered parameters. They should be close to matrix A + mean of noise ( 3 ):


In [8]:
print( params )


[torch.CudaTensor of size 3]

Not bad. Here's the chart of MSE as a function of epoch

In [9]:
Plot = require 'itorch.Plot'

for name, list in pairs( logger.symbols ) do
    y = torch.Tensor( list )
    x = torch.linspace( 1, #list, #list )
    plot = Plot():line( x, y ,'blue', name ):legend(true):title('MSE'):draw()