使用Torch来构建RNN网络

在Torch中可以采用nngraph组件来构建RNN,通过nngraph可以构造复杂结构的多输入、多输出的有向网络。


In [1]:
require('nngraph')


Out[1]:
true	

构造多输入到单一输出

这里构造一个两个向量输入,一个向量输出的简单网络,如下所示:


In [2]:
-- 构造神经网络每个独立的组件
linearLayer1 = nn.Linear(3,2)
linearLayer2 = nn.Linear(2,2)
addLayer = nn.CAddTable()
tanhLayer = nn.Tanh()
linearLayer3 = nn.Linear(2,1)

In [3]:
-- 构造多输入到单一输出的网络
local inNode1 = linearLayer1()    -- 空括号表示,输入由runtime决定
local inNode2 = linearLayer2()
local addNode = addLayer({inNode1,inNode2})  -- ()表示输入Node
local tanhNode = tanhLayer(addNode)
local outNode = linearLayer3(tanhNode)

model = nn.gModule({inNode1, inNode2}, {outNode})

In [4]:
x1 = torch.Tensor({0.1, 1.5, -1.0})
x2 = torch.Tensor({-1, 0})
local y = model:forward({x1,x2})
print(y)


Out[4]:
0.01 *
 6.7211
[torch.DoubleTensor of size 1]


In [5]:
-- 手动检验一下
local l1 = linearLayer1:forward(x1)
local l2 = linearLayer2:forward(x2)
local add = addLayer:forward({l1,l2})
local yp = tanhLayer:forward(add)
local y = linearLayer3:forward(yp)
print(y)


Out[5]:
0.01 *
 6.7211
[torch.DoubleTensor of size 1]

多输入到多输出

和之前的例子一样,只是增加一个输出,构造的网路如下所示:


In [6]:
-- 构造多输入到多输出的网络
local inNode1 = linearLayer1()    -- 空括号表示,输入由runtime决定
local inNode2 = linearLayer2()
local addNode = addLayer({inNode1,inNode2})  -- ()表示输入Node
local tanhNode = tanhLayer(addNode)
local outNode = linearLayer3(tanhNode)

model = nn.gModule({inNode1, inNode2}, {outNode, addNode})    -- 输出增加一项

In [7]:
y = model:forward({x1,x2})
print(y)


Out[7]:
{
  1 : DoubleTensor - size: 1
  2 : DoubleTensor - size: 2
}

In [8]:
print(y[1], y[2])


Out[8]:
0.01 *
 6.7211
[torch.DoubleTensor of size 1]

-0.3817
-0.2642
[torch.DoubleTensor of size 2]

构造递归结构

由于我们得到了中间变量的输出,因此可以把中间变量给输出到下一次的输入中。


In [9]:
local h0 = torch.rand(2)
local x_t0 = torch.rand(3)
local out_t1 = model:forward({x_t0, h0})

print(out_t1[1])

local x_t1 = torch.rand(3)
local h1 = out_t1[2]                -- 得到h1项
local out_t2 = model:forward({x_t1, h1}) 

print(out_t2[1])


Out[9]:
-0.1551
[torch.DoubleTensor of size 1]

-0.6919
[torch.DoubleTensor of size 1]

通过共享权重的克隆网络,实现BPTT算法

为了能够通过nn模块提供的BP算法,实现BP through time算法,一般的做法是通过共享W矩阵的多个"克隆"网络,通过多次的BP来进行计算。

1. 首先构造一个共享W(包括b)的神经网络组

按应用的序列长度T来定义,神经网络组


In [10]:
function createRNNCell()
    local xInput = nn.Identity()()
    local prev_h = nn.Identity()()
    
    local x2h = nn.Linear(2, 4)(xInput)
    local h2h = nn.Linear(4, 4)(prev_h)
    
    local new_h = nn.CAddTable()({x2h,h2h})
    local yout = nn.Sigmoid()(nn.Linear(4,1)(new_h))
    
    return nn.gModule({xInput, prev_h}, {yout, new_h})
end

-- 创建RNN组
sequenceLength = 5
sequenceRNN = {}
for i=1,sequenceLength do
  sequenceRNN[i] = createRNNCell()  
end

-- 共享参数
local sharedPar,sharedGrad = sequenceRNN[1]:parameters()
for i=2,sequenceLength do
  local cloneParams, cloneGradParams = sequenceRNN[i]:parameters()
  for j=1,#sharedPar do
      cloneParams[j]:set(sharedPar[j])
      cloneGradParams[j]:set(sharedGrad[j])
  end
end
collectgarbage()


local temp,_ = sequenceRNN[2]:parameters()
print( sharedPar[1] )

sharedPar[1][{1,2}] = -1

print( temp[1])


Out[10]:
-0.5674  0.2732
 0.3461 -0.1795
-0.1179 -0.3103
-0.4642  0.5598
[torch.DoubleTensor of size 4x2]

-0.5674 -1.0000
 0.3461 -0.1795
-0.1179 -0.3103
-0.4642  0.5598
[torch.DoubleTensor of size 4x2]

2. 前向计算

准备输入和输出样本


In [11]:
H0 = torch.rand(4)     -- h0 初始化
Xs = {}                -- 输入序列   
for i=1,#sequenceRNN do Xs[i] = torch.rand(2) end
Ys = torch.Tensor({1.0});  -- 目标值
criterion = nn.MSECriterion()  --构造一个简单二次回归

In [14]:
forwardRNN = function()
  local loss = 0.0
  local prev_h = H0

  for i=1,#sequenceRNN do
    sequenceRNN[i]:evaluate()
    local outs = sequenceRNN[i]:forward({Xs[i], prev_h})
    loss = loss +  criterion:forward(outs[1], Ys)   
    prev_h = outs[2]
  end
    
  return loss
end

print(  forwardRNN() )


Out[14]:
1.1404420811755	

3.通过逐层BP计算实现BPTT算法


In [16]:
BackPropThroughTime = function()
  local loss = 0.0
  local prev_h = H0

    
  -- forward
  hs = {}
  outs = {}
  for i=1,#sequenceRNN do
    sequenceRNN[i]:training()
    local ys = sequenceRNN[i]:forward({Xs[i], prev_h})
    hs[i] = ys[2]
    outs[i] = ys[1]
    prev_h = ys[2]     
  end
  
    
  --backward
  -- 首先将grad置位为0  
  local _,grad = sequenceRNN[1]:parameters()
  for i=1,#grad do
    grad[1]:zero()
  end
  
  hs[0] = h0
  dh = torch.zeros(4)
  for i=#sequenceRNN,1,-1 do
    local dy = criterion:backward(outs[i], Ys)
    local dout = sequenceRNN[i]:backward({Xs[i], hs[i-1]}, {dy, dh})
    dh = dout[2]
  end
  
  return grad
end

bpttGrad = BackPropThroughTime()

4.用Number Checker进行校验


In [18]:
local pra,_ = sequenceRNN[1]:parameters()

pra[1][1][1] = pra[1][1][1] + 0.0001
local lossRight = forwardRNN()
pra[1][1][1] = pra[1][1][1] - 0.0002
local lossLeft = forwardRNN()

local dw11 = (lossRight - lossLeft) / (2*0.0001)

print (lossRight, lossLeft, dw11)

print ( bpttGrad[1] )


Out[18]:
1.1404314635073	1.1404377979398	-0.031672162723595	
-0.0317 -0.0438
 0.1004  0.0745
 0.3385  0.2773
-0.1741 -0.1248
[torch.DoubleTensor of size 4x2]


In [ ]: