Part1: 机器学习框架 Torch 介绍

Torch 是一个以机器学习为核心的科学计算框架。由于采用了高效的脚本语言LuaJIT,同时提供了一个灵活的交互计算环境,因此,Torch 特别适合用来开发机器学习相关任务。此外Torch对C/CUDA支持非常好,因此也能很好的胜任大规模的计算任务,如深度学习的训练。

关键的特征:

  • 一个强大的张量库(Tensor,N-dimensional array),支持各种下标、切片访问以及变换
  • 完整的线性代数支持,类似于Matlab的接口
  • 集成了state of art的神经网络、概率图模型实现
  • 集成常规的数值优化计算
  • 友好的交互计算,可视化支持
  • 非常容易集成C语言开发本地代码
  • 对CUDA支持非常友好
  • 容易移植到iOS, Android等终端平台

$ th

  ______             __   |  Torch7                                   
 /_  __/__  ________/ /   |  Scientific computing for Lua.         
  / / / _ \/ __/ __/ _ \  |                                           
 /_/  \___/_/  \__/_//_/  |  https://github.com/torch   
                          |  http://torch.ch

1. 基础的矩阵计算演示

下面是最常用的矩阵计算函数:

  • rand() which creates tensor drawn from uniform distribution
  • t() which transposes a tensor (note it returns a new view)
  • dot() which performs a dot product between two tensors
  • eye() which returns a identity matrix

In [1]:
-- 构造一个5x5的随机矩阵
N = 5
A = torch.rand(N, N)
print(A)


Out[1]:
 0.0734  0.7825  0.7946  0.3120  0.6766
 0.5306  0.2862  0.0987  0.6700  0.5826
 0.4186  0.2620  0.1156  0.7416  0.3870
 0.3132  0.3228  0.0061  0.9173  0.1982
 0.1515  0.0401  0.1978  0.1964  0.1802
[torch.DoubleTensor of size 5x5]

以下代码输出对称矩阵 $$ A = A * A^T $$


In [2]:
A = A*A:t()
print(A)


Out[2]:
 1.8044  0.9445  0.8209  0.7008  0.3829
 0.9445  1.1614  1.0309  0.9892  0.3480
 0.8209  1.0309  0.9571  0.9734  0.3122
 0.7008  0.9892  0.9734  1.0831  0.2775
 0.3829  0.3480  0.3122  0.2775  0.1348
[torch.DoubleTensor of size 5x5]

下面代码实现计算:矩阵乘向量,即列向量的线性组合。

$$ B = A * v $$

In [3]:
v = torch.rand(5,1)
B = A*v
print("v=")
print(v)
print("B=")
print(B)


Out[3]:
v=	
 0.3119
 0.4010
 0.3277
 0.2618
 0.9361
[torch.DoubleTensor of size 5x1]

B=	
 1.7524
 1.6828
 1.5301
 1.4775
 0.5601
[torch.DoubleTensor of size 5x1]

计算反矩阵:

$$ C = A^{-1} $$

In [4]:
C = torch.inverse(A)
print(C)
print(C*A)


Out[4]:
     8.1091   -181.8864    440.3903   -186.1445   -190.4053
  -181.8864   4957.2267 -11984.0207   5067.9718   5046.0608
   440.3903 -11984.0207  29020.9692 -12283.7823 -12250.0796
  -186.1445   5067.9718 -12283.7823   5203.8056   5187.0017
  -190.4053   5046.0608 -12250.0796   5187.0017   5219.6247
[torch.DoubleTensor of size 5x5]

 1.0000 -0.0000  0.0000 -0.0000  0.0000
-0.0000  1.0000 -0.0000  0.0000 -0.0000
-0.0000 -0.0000  1.0000 -0.0000 -0.0000
-0.0000 -0.0000  0.0000  1.0000 -0.0000
 0.0000  0.0000  0.0000  0.0000  1.0000
[torch.DoubleTensor of size 5x5]

2. 可视化演示


In [1]:
require('image')
itorch.image(image.lena())



In [6]:
require ('nn')
-- 利用神经网络构造一个随机的3x3x3滤波器,对RGB图像进行进行滤波操作
m=nn.SpatialConvolution(3,1,3,3)
n=m:forward(image.lena())
itorch.image(n)



In [7]:
Plot = require 'itorch.Plot'
x1 = torch.randn(40):mul(100)
y1 = torch.randn(40):mul(100)
x2 = torch.randn(40):mul(100)
y2 = torch.randn(40):mul(100)
x3 = torch.randn(40):mul(200)
y3 = torch.randn(40):mul(200)
plot = Plot():circle(x1, y1, 'red', 'hi'):circle(x2, y2, 'blue', 'bye'):draw()
plot:circle(x3,y3,'green', 'yolo'):redraw()
plot:title('Scatter Plot Demo'):redraw()
plot:xaxis('length'):yaxis('width'):redraw()
plot:legend(true)
plot:redraw()


3. 梯度下降算法演示

Torch支持多种数值计算优化算法,包括SGD, Adagrad, Conjugate-Gradient, LBFGS, RProp等等。

This package contains several optimization routines for Torch. Each optimization algorithm is based on the same interface:

x*, {f}, ... = optim.method(func, x, state)

where:

  • func: a user-defined closure that respects this API: f, df/dx = func(x)
  • x: the current parameter vector (a 1D torch.Tensor)
  • state: a table of parameters, and state variables, dependent upon the algorithm
  • x*: the new parameter vector that minimizes f, x* = argmin_x f(x)
  • {f}: a table of all f values, in the order they've been evaluated (for some simple algorithms, like SGD, #f == 1)

这里设计一个一维的线性回归,即直线拟合的例子。


In [8]:
-- 构造训练样本
N = 32
x = {}
y = {}
for i=1, N do
    x[i] = (math.random() - 0.5) * 20
    y[i] = 0.7*x[i] + 5.0 + (math.random()-0.5) 
end

Plot = require 'itorch.Plot'
local plot = Plot()
plot:circle(x,y,'black', 'yolo'):draw()
plot:title('直线拟合'):redraw()



In [9]:
require('optim')

-- 纪录输出日志
batchLog = {}
batchLog.value = {}
batchLog.seq = {}

parameter = torch.Tensor(2)
parameter[1] = 0
parameter[2] = 0

-- 首先构造 func(x)
function batchFunc(inParameter) 
  
  local sum = 0.0
  local deltaP = torch.Tensor(2)
    
  deltaP[1] = 0.0
  deltaP[2] = 0.0
  for i=1,#x do
    sum = sum + math.pow(inParameter[1] * x[i] + inParameter[2] - y[i],2)
    deltaP[1] = deltaP[1] + (inParameter[1] * x[i] + inParameter[2] - y[i]) * x[i]
    deltaP[2] = deltaP[2] + (inParameter[1] * x[i] + inParameter[2] - y[i])
  end
  sum = 0.5 * sum / #x
  deltaP = deltaP / #x

  batchLog.value[#batchLog.value+1] = sum
  batchLog.seq[#batchLog.seq+1] = #batchLog.seq+1
    
  return sum , deltaP
end


local state = {
   learningRate = 1.0e-2,
}

for i = 1,500 do
  optim.sgd(batchFunc, parameter ,state)
end

local plot = Plot()
plot:line(batchLog.seq, batchLog.value,'black', 'yolo'):draw()
plot:title('BGD'):redraw()



In [13]:
-- 绘制拟合出来的直线
drawResultLine = function()
  local resultValue = {}
  local resultSeq = {}
  for i=-10,10,0.1 do
    resultSeq[#resultSeq+1] = i
    resultValue[#resultValue+1] = i*parameter[1] + parameter[2]
  end
  local plot = Plot()
  plot:circle(x,y,'red', 'yolo'):draw()
  plot:line(resultSeq, resultValue,'black', 'yolo'):redraw()
  plot:title('直线拟合'):redraw()
    
end
drawResultLine()


由上面的曲线可以看出,设置learningRate非常重要,下面演示一下SGD算法。


In [12]:
require('optim')

-- 纪录输出日志
sgdLog = {}
sgdLog.value = {}
sgdLog.seq = {}

parameter[1] = 0
parameter[2] = 0

local sgdNumber = 0

-- 首先构造 func(x)
function sgdFunc(inParameter) 
  
  local sum = 0.0
  local deltaP = torch.Tensor(2)
    
    
  sgdNumber = (sgdNumber + 1) % #x
  local i = sgdNumber + 1
    
  sum = 0.5 * math.pow(inParameter[1] * x[i] + inParameter[2] - y[i],2)
  deltaP[1] = (inParameter[1] * x[i] + inParameter[2] - y[i]) * x[i]
  deltaP[2] = (inParameter[1] * x[i] + inParameter[2] - y[i])
    
  sgdLog.value[#sgdLog.value+1] = sum
  sgdLog.seq[#sgdLog.seq+1] = #sgdLog.seq+1
    
  return sum , deltaP
end


local state = {
   learningRate = 1.0e-2,
}

for i = 1,200 do
  optim.sgd(sgdFunc, parameter ,state)
end

local plot = Plot()
plot:line(sgdLog.seq, sgdLog.value,'black', 'yolo'):draw()
plot:title('SGD'):redraw()

drawResultLine()