Part1: 机器学习框架 Torch 介绍

Torch 是一个以机器学习为核心的科学计算框架。由于采用了高效的脚本语言LuaJIT，同时提供了一个灵活的交互计算环境，因此，Torch 特别适合用来开发机器学习相关任务。此外Torch对C/CUDA支持非常好，因此也能很好的胜任大规模的计算任务，如深度学习的训练。

关键的特征：

一个强大的张量库（Tensor,N-dimensional array），支持各种下标、切片访问以及变换
完整的线性代数支持，类似于Matlab的接口
集成了state of art的神经网络、概率图模型实现
集成常规的数值优化计算
友好的交互计算，可视化支持
非常容易集成C语言开发本地代码
对CUDA支持非常友好
容易移植到iOS, Android等终端平台

$ th

  ______             __   |  Torch7                                   
 /_  __/__  ________/ /   |  Scientific computing for Lua.         
  / / / _ \/ __/ __/ _ \  |                                           
 /_/  \___/_/  \__/_//_/  |  https://github.com/torch   
                          |  http://torch.ch

1. 基础的矩阵计算演示

下面是最常用的矩阵计算函数:

rand() which creates tensor drawn from uniform distribution
t() which transposes a tensor (note it returns a new view)
dot() which performs a dot product between two tensors
eye() which returns a identity matrix



In [1]:

    
-- 构造一个5x5的随机矩阵
N = 5
A = torch.rand(N, N)
print(A)









    Out[1]:





 0.0734  0.7825  0.7946  0.3120  0.6766
 0.5306  0.2862  0.0987  0.6700  0.5826
 0.4186  0.2620  0.1156  0.7416  0.3870
 0.3132  0.3228  0.0061  0.9173  0.1982
 0.1515  0.0401  0.1978  0.1964  0.1802
[torch.DoubleTensor of size 5x5]

以下代码输出对称矩阵 $$ A = A * A^T $$



In [2]:

    
A = A*A:t()
print(A)









    Out[2]:





 1.8044  0.9445  0.8209  0.7008  0.3829
 0.9445  1.1614  1.0309  0.9892  0.3480
 0.8209  1.0309  0.9571  0.9734  0.3122
 0.7008  0.9892  0.9734  1.0831  0.2775
 0.3829  0.3480  0.3122  0.2775  0.1348
[torch.DoubleTensor of size 5x5]

下面代码实现计算：矩阵乘向量，即列向量的线性组合。

$$ B = A * v $$



In [3]:

    
v = torch.rand(5,1)
B = A*v
print("v=")
print(v)
print("B=")
print(B)









    Out[3]:





v=	
 0.3119
 0.4010
 0.3277
 0.2618
 0.9361
[torch.DoubleTensor of size 5x1]

B=	
 1.7524
 1.6828
 1.5301
 1.4775
 0.5601
[torch.DoubleTensor of size 5x1]

计算反矩阵：

$$ C = A^{-1} $$



In [4]:

    
C = torch.inverse(A)
print(C)
print(C*A)









    Out[4]:





     8.1091   -181.8864    440.3903   -186.1445   -190.4053
  -181.8864   4957.2267 -11984.0207   5067.9718   5046.0608
   440.3903 -11984.0207  29020.9692 -12283.7823 -12250.0796
  -186.1445   5067.9718 -12283.7823   5203.8056   5187.0017
  -190.4053   5046.0608 -12250.0796   5187.0017   5219.6247
[torch.DoubleTensor of size 5x5]

 1.0000 -0.0000  0.0000 -0.0000  0.0000
-0.0000  1.0000 -0.0000  0.0000 -0.0000
-0.0000 -0.0000  1.0000 -0.0000 -0.0000
-0.0000 -0.0000  0.0000  1.0000 -0.0000
 0.0000  0.0000  0.0000  0.0000  1.0000
[torch.DoubleTensor of size 5x5]

2. 可视化演示



In [1]:

    
require('image')
itorch.image(image.lena())



In [6]:

    
require ('nn')
-- 利用神经网络构造一个随机的3x3x3滤波器，对RGB图像进行进行滤波操作
m=nn.SpatialConvolution(3,1,3,3)
n=m:forward(image.lena())
itorch.image(n)



In [7]:

    
Plot = require 'itorch.Plot'
x1 = torch.randn(40):mul(100)
y1 = torch.randn(40):mul(100)
x2 = torch.randn(40):mul(100)
y2 = torch.randn(40):mul(100)
x3 = torch.randn(40):mul(200)
y3 = torch.randn(40):mul(200)
plot = Plot():circle(x1, y1, 'red', 'hi'):circle(x2, y2, 'blue', 'bye'):draw()
plot:circle(x3,y3,'green', 'yolo'):redraw()
plot:title('Scatter Plot Demo'):redraw()
plot:xaxis('length'):yaxis('width'):redraw()
plot:legend(true)
plot:redraw()

3. 梯度下降算法演示

Torch支持多种数值计算优化算法，包括SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp等等。

This package contains several optimization routines for Torch. Each optimization algorithm is based on the same interface:

x*, {f}, ... = optim.method(func, x, state)

where:

func: a user-defined closure that respects this API: f, df/dx = func(x)
x: the current parameter vector (a 1D torch.Tensor)
state: a table of parameters, and state variables, dependent upon the algorithm
x*: the new parameter vector that minimizes f, x* = argmin_x f(x)
{f}: a table of all f values, in the order they've been evaluated (for some simple algorithms, like SGD, #f == 1)

这里设计一个一维的线性回归，即直线拟合的例子。



In [8]:

    
-- 构造训练样本
N = 32
x = {}
y = {}
for i=1, N do
    x[i] = (math.random() - 0.5) * 20
    y[i] = 0.7*x[i] + 5.0 + (math.random()-0.5) 
end

Plot = require 'itorch.Plot'
local plot = Plot()
plot:circle(x,y,'black', 'yolo'):draw()
plot:title('直线拟合'):redraw()



In [9]:

    
require('optim')

-- 纪录输出日志
batchLog = {}
batchLog.value = {}
batchLog.seq = {}

parameter = torch.Tensor(2)
parameter[1] = 0
parameter[2] = 0

-- 首先构造 func(x)
function batchFunc(inParameter) 
  
  local sum = 0.0
  local deltaP = torch.Tensor(2)
    
  deltaP[1] = 0.0
  deltaP[2] = 0.0
  for i=1,#x do
    sum = sum + math.pow(inParameter[1] * x[i] + inParameter[2] - y[i],2)
    deltaP[1] = deltaP[1] + (inParameter[1] * x[i] + inParameter[2] - y[i]) * x[i]
    deltaP[2] = deltaP[2] + (inParameter[1] * x[i] + inParameter[2] - y[i])
  end
  sum = 0.5 * sum / #x
  deltaP = deltaP / #x

  batchLog.value[#batchLog.value+1] = sum
  batchLog.seq[#batchLog.seq+1] = #batchLog.seq+1
    
  return sum , deltaP
end


local state = {
   learningRate = 1.0e-2,
}

for i = 1,500 do
  optim.sgd(batchFunc, parameter ,state)
end

local plot = Plot()
plot:line(batchLog.seq, batchLog.value,'black', 'yolo'):draw()
plot:title('BGD'):redraw()



In [13]:

    
-- 绘制拟合出来的直线
drawResultLine = function()
  local resultValue = {}
  local resultSeq = {}
  for i=-10,10,0.1 do
    resultSeq[#resultSeq+1] = i
    resultValue[#resultValue+1] = i*parameter[1] + parameter[2]
  end
  local plot = Plot()
  plot:circle(x,y,'red', 'yolo'):draw()
  plot:line(resultSeq, resultValue,'black', 'yolo'):redraw()
  plot:title('直线拟合'):redraw()
    
end
drawResultLine()

由上面的曲线可以看出，设置learningRate非常重要，下面演示一下SGD算法。



In [12]:

    
require('optim')

-- 纪录输出日志
sgdLog = {}
sgdLog.value = {}
sgdLog.seq = {}

parameter[1] = 0
parameter[2] = 0

local sgdNumber = 0

-- 首先构造 func(x)
function sgdFunc(inParameter) 
  
  local sum = 0.0
  local deltaP = torch.Tensor(2)
    
    
  sgdNumber = (sgdNumber + 1) % #x
  local i = sgdNumber + 1
    
  sum = 0.5 * math.pow(inParameter[1] * x[i] + inParameter[2] - y[i],2)
  deltaP[1] = (inParameter[1] * x[i] + inParameter[2] - y[i]) * x[i]
  deltaP[2] = (inParameter[1] * x[i] + inParameter[2] - y[i])
    
  sgdLog.value[#sgdLog.value+1] = sum
  sgdLog.seq[#sgdLog.seq+1] = #sgdLog.seq+1
    
  return sum , deltaP
end


local state = {
   learningRate = 1.0e-2,
}

for i = 1,200 do
  optim.sgd(sgdFunc, parameter ,state)
end

local plot = Plot()
plot:line(sgdLog.seq, sgdLog.value,'black', 'yolo'):draw()
plot:title('SGD'):redraw()

drawResultLine()