In [1]:
from __future__ import print_function
import torch as T
import torch.autograd
from torch.autograd import Variable
import numpy as np
神经网络算法实现的核心之一是对代价函数的反向求导,Theano和Tensorflow中都定义了求导的符号函数,同样地,作为深度学习平台,自动求导(autograd
)功能在pytorch中也扮演着核心功能,不同的是,pytorch的动态图功能使其更灵活(define by run), 比如甚至在每次迭代中都可以通过改变pytorch中Variable的属性,从而使其加入亦或退出反向求导图,这个功能在某些应用中会特别使用,比如在训练的后半段,我们需要更新的只有后层的参数,那么只需要在前层参数的Variable设置成不要求求导。
autograd.Variable
是pytorch设计理念的核心之一,它将tensor封装成Variable,并支持绝大多数tensor上的操作,同时赋予其两个极其重要的属性:requires_grad
和volatile
,Variable还具有三个属性: .data
用于存储Variable的数值,.grad
也为变量,用于存储Variable的导数值,.grad_fn
(creator) 是生成该Variable的函数,当Variable为用户自定义时,其值为"None". 细节参见源码Variable
In [13]:
x = Variable(T.ones(2,2), requires_grad=True)
print x
In [14]:
y = T.exp(x + 2)
yy = T.exp(-x-2)
print y
In [15]:
z = (y + yy)/2
out = z.mean()
print z, out
In [16]:
make_dot(out)
Out[16]:
In [29]:
out.backward(T.FloatTensor(1), retain_graph=True)
In [30]:
x.grad
Out[30]:
In [31]:
T.randn(1,1)
Out[31]:
In [44]:
from __future__ import print_function
xx = Variable(torch.randn(1,1), requires_grad = True)
print(xx)
yy = 3*xx
zz = yy**2
#yy.register_hook(print)
zz.backward(T.FloatTensor([0.1]))
print(xx.grad)
A simple numpy implementation of one hidden layer neural network.
In this implementation, for each update of $w_i$, both the forward and backward passes need to computed.
In [4]:
# y_pred = w2*(relu(w1*x))
# loss = 0.5*sum (y_pred - y)^2
import numpy as np
N, D_in, D_hidden, D_out = 50, 40, 100, 10
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
w1 = np.random.randn(D_in, D_hidden)
w2 = np.random.randn(D_hidden, D_out)
learning_rate = 0.0001
for t in range(100):
### 前向通道
h = x.dot(w1) #50x40 and 40x100 produce 50x100
h_relu = np.maximum(h, 0) #this has to be np.maximum as it takes two input arrays and do element-wise max, 50x100
y_pred = h_relu.dot(w2) #50x100 and 100x10 produce 50x10
#print y_pred.shape
### 误差函数
loss = 0.5 * np.sum(np.square(y_pred - y))
### 反向通道
grad_y_pred = y_pred - y #50x10
grad_w2 = h_relu.T.dot(grad_y_pred) #50x100 and 50x10 should produce 100x10, so transpose h_relu
grad_h_relu = grad_y_pred.dot(w2.T) #50x10 and 100x10 should produce 50x100, so transpose w2
grad_h = grad_h_relu.copy() #make a copy of
grad_h[grad_h < 0] = 0 #
grad_w1 = x.T.dot(grad_h) #50x100 and 50x40 should produce 40x100
w1 = w1 - learning_rate * grad_w1
w2 = w2 - learning_rate * grad_w2
with very slight modifications, we could end up with the implementation of the same algorithm in PyTorch
In [7]:
import torch
N, D_in, D_hidden, D_out = 50, 40, 100, 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, D_hidden)
w2 = torch.randn(D_hidden, D_out)
learning_rate = 0.0001
for t in range(100):
h = x.mm(w1) #50x40 and 40x100 produce 50x100
#h = x.matmul(w1) #50x40 and 40x100 produce 50x100, matmul for checking
h_relu = h.clamp(min=0) #this has to be np.maximum as it takes two input arrays and do element-wise max, 50x100
y_pred = h_relu.mm(w2) #50x100 and 100x10 produce 50x10
#print y_pred.shape
loss = 0.5 * (y_pred - y).pow(2).sum()
grad_y_pred = y_pred - y #50x10
grad_w2 = h_relu.t().mm(grad_y_pred) #50x100 and 50x10 should produce 100x10, so transpose h_relu
grad_h_relu = grad_y_pred.mm(w2.t()) #50x10 and 100x10 should produce 50x100, so transpose w2
grad_h = grad_h_relu.clone() #make a copy
grad_h[grad_h < 0] = 0 #
grad_w1 = x.t().mm(grad_h) #50x100 and 50x40 should produce 40x100
w1 = w1 - learning_rate * grad_w1
w2 = w2 - learning_rate * grad_w2
Now with the autograd functionality in PyTorch, we could see the ease of doing backpropagation, calculating gradients for two layers networks is not a big deal but it becomes much more complicated when the number of layers grows.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
PyTorch provided torch.dot()
, torch.mm()
, torch.matmul()
, *
, for basic matrix multiplication. It is worthy of noting the differences among them, torch.dot(a, b)
gives the inner product of 1-D vectors $a$ and $b$, torch.mm(a, b)
gives the matrix multiplication of 2-D matrices, and torch.matmul()
operates on two tensors. It seems that torch.matmul()
can replace both torch.dot()
and torch.mm()
, but not vice versa. Finally *
simply calculates the elementwise products, i.e. the Hadamard product.
torch.bmm(A, B)
: batch matrix multiplication for 3D tensors, $A_{b\times n\times p}$ and $B_{b\times p\times m}$ will produce 3D tensor of shape $b\times n\times m$
torch.baddbmm(A, B)
:
torch.addbmm(A, B)
:
torch.addmm(A, B)
:
In [ ]:
In [ ]:
In [ ]: