In [1]:
from __future__ import print_function
import torch as T
import torch.autograd
from torch.autograd import Variable
import numpy as np

神经网络算法实现的核心之一是对代价函数的反向求导,Theano和Tensorflow中都定义了求导的符号函数,同样地,作为深度学习平台,自动求导(autograd)功能在pytorch中也扮演着核心功能,不同的是,pytorch的动态图功能使其更灵活(define by run), 比如甚至在每次迭代中都可以通过改变pytorch中Variable的属性,从而使其加入亦或退出反向求导图,这个功能在某些应用中会特别使用,比如在训练的后半段,我们需要更新的只有后层的参数,那么只需要在前层参数的Variable设置成不要求求导。

变量

autograd.Variable是pytorch设计理念的核心之一,它将tensor封装成Variable,并支持绝大多数tensor上的操作,同时赋予其两个极其重要的属性:requires_gradvolatile,Variable还具有三个属性: .data用于存储Variable的数值,.grad也为变量,用于存储Variable的导数值,.grad_fn (creator) 是生成该Variable的函数,当Variable为用户自定义时,其值为"None". 细节参见源码Variable


In [13]:
x = Variable(T.ones(2,2), requires_grad=True)
print x


Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]


In [14]:
y = T.exp(x + 2)
yy = T.exp(-x-2)
print y


Variable containing:
 20.0855  20.0855
 20.0855  20.0855
[torch.FloatTensor of size 2x2]


In [15]:
z = (y + yy)/2
out = z.mean()
print z, out


Variable containing:
 10.0677  10.0677
 10.0677  10.0677
[torch.FloatTensor of size 2x2]
 Variable containing:
 10.0677
[torch.FloatTensor of size 1]


In [16]:
make_dot(out)


Out[16]:
%3 140396231155792 MeanBackward

In [29]:
out.backward(T.FloatTensor(1), retain_graph=True)

In [30]:
x.grad


Out[30]:
Variable containing:
-1.2072e+21 -1.2072e+21
-1.2072e+21 -1.2072e+21
[torch.FloatTensor of size 2x2]

In [31]:
T.randn(1,1)


Out[31]:
-0.7466
[torch.FloatTensor of size 1x1]

In [44]:
from __future__ import print_function
xx = Variable(torch.randn(1,1), requires_grad = True)
print(xx)
yy = 3*xx
zz = yy**2

#yy.register_hook(print)
zz.backward(T.FloatTensor([0.1]))
print(xx.grad)


Variable containing:
 1.1988
[torch.FloatTensor of size 1x1]

Variable containing:
 2.1578
[torch.FloatTensor of size 1x1]

A simple numpy implementation of one hidden layer neural network.

In this implementation, for each update of $w_i$, both the forward and backward passes need to be computed.


In [4]:
# y_pred = w2*(relu(w1*x))
# loss = 0.5*sum (y_pred - y)^2
import numpy as np

N, D_in, D_hidden, D_out = 50, 40, 100, 10

x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, D_hidden)
w2 = np.random.randn(D_hidden, D_out)

learning_rate = 0.0001
for t in range(100):
    ### 前向通道
    h = x.dot(w1) #50x40 and 40x100 produce 50x100
    h_relu = np.maximum(h, 0)  #this has to be np.maximum as it takes two input arrays and do element-wise max, 50x100
    y_pred = h_relu.dot(w2) #50x100 and 100x10 produce 50x10
    #print y_pred.shape
    
    ### 误差函数
    loss = 0.5 * np.sum(np.square(y_pred - y))
    
    
    ### 反向通道
    grad_y_pred = y_pred - y #50x10
    grad_w2 = h_relu.T.dot(grad_y_pred) #50x100 and 50x10 should produce 100x10, so transpose h_relu
    grad_h_relu = grad_y_pred.dot(w2.T) #50x10 and 100x10 should produce 50x100, so transpose w2
    grad_h = grad_h_relu.copy() #make a copy of 
    grad_h[h < 0] = 0      #
    grad_w1 = x.T.dot(grad_h)     #50x100 and 50x40 should produce 40x100
    
    w1 = w1 - learning_rate * grad_w1
    w2 = w2 - learning_rate * grad_w2

with very slight modifications, we could end up with the implementation of the same algorithm in PyTorch


In [1]:
import torch

N, D_in, D_hidden, D_out = 50, 40, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, D_hidden)
w2 = torch.randn(D_hidden, D_out)

learning_rate = 0.0001
for t in range(100):
    h = x.mm(w1) #50x40 and 40x100 produce 50x100
    #h = x.matmul(w1) #50x40 and 40x100 produce 50x100, matmul for checking
    h_relu = h.clamp(min=0)  #this has to be np.maximum as it takes two input arrays and do element-wise max, 50x100
    y_pred = h_relu.mm(w2) #50x100 and 100x10 produce 50x10
    #print y_pred.shape
    
    loss = 0.5 * (y_pred - y).pow(2).sum()
    
    grad_y_pred = y_pred - y #50x10
    grad_w2 = h_relu.t().mm(grad_y_pred) #50x100 and 50x10 should produce 100x10, so transpose h_relu
    grad_h_relu = grad_y_pred.mm(w2.t()) #50x10 and 100x10 should produce 50x100, so transpose w2
    grad_h = grad_h_relu.clone() #make a copy
    grad_h[grad_h < 0] = 0      #
    grad_w1 = x.t().mm(grad_h)     #50x100 and 50x40 should produce 40x100
    
    w1 = w1 - learning_rate * grad_w1
    w2 = w2 - learning_rate * grad_w2

Now with the autograd functionality in PyTorch, we could see the ease of doing backpropagation, calculating gradients for two layers networks is not a big deal but it becomes much more complicated when the number of layers grows.


In [3]:
import torch
from torch.autograd import Variable, backward
N, D_in, D_hidden, D_out = 50, 40, 100, 10

x = Variable(torch.randn(N, D_in), requires_grad=False)
y = Variable(torch.randn(N, D_out), requires_grad=False)

w1 = Variable(torch.randn(D_in, D_hidden), requires_grad=True)
w2 = Variable(torch.randn(D_hidden, D_out), requires_grad=True)

learning_rate = 0.0001
for t in range(100):
    
    y_pred = x.mm(w1).clamp(min=0).mm(w2) #50x40 40x100 100x10 --> 50x10
    loss = 0.5 * (y_pred - y).pow(2).sum()
    
    loss.backward()
    
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data
    
    w1.grad.data.zero_()
    w2.grad.data.zero_()

In [ ]:


In [ ]:


In [ ]:


In [ ]:

Basic matrix multiplication in Pytorch

PyTorch provided

torch.dot(),

torch.mm(),

torch.matmul(),

*,

for basic matrix multiplication. It is worthy of noting the differences among them, torch.dot(a, b) gives the inner product of 1-D vectors $a$ and $b$, torch.mm(a, b) gives the matrix multiplication of 2-D matrices, and torch.matmul() operates on two tensors. It seems that torch.matmul() can replace both torch.dot() and torch.mm(), but not vice versa. Finally * simply calculates the elementwise products, i.e. the Hadamard product.

Advanced matrix multiplication

torch.bmm(A, B): batch matrix multiplication for 3D tensors, $A_{b\times n\times p}$ and $B_{b\times p\times m}$ will produce 3D tensor of shape $b\times n\times m$

torch.baddbmm(A, B):

torch.addbmm(A, B):

torch.addmm(A, B):


In [3]:
import tensorflow as tf
import numpy as np

N, D_in, D_hidden, D_out = 50, 40, 100, 10

x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

w1 = tf.Variable(tf.random_normal((D_in, D_hidden)))
w2 = tf.Variable(tf.random_normal((D_hidden, D_out)))

y_pred = tf.matmul(tf.maximum(tf.matmul(x, w1), tf.zeros(1), w2))

loss = tf.reduce_sum((y - y_pred) ** 2)

grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

learning_rate = 0.001
new_w1, new_w2 = w1.assign(w1 - learning_rate * grad_w1), w2.assign(w2 - learning_rate * grad_w2)

with tf.session() as sess:
    sess.run(tf.global_variables_initializer())
    x_value = np.random.randn(N, D_in)
    y_value = np.random.randn(N, D_out)
    
    for i in xrange(100):
        loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                    feed_dict={x: x_value, y: y_value})


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-18df65899eb8> in <module>()
     10 w2 = tf.Variable(tf.random_normal((D_hidden, D_out)))
     11 
---> 12 y_pred = tf.matmul(tf.maximum(tf.matmul(x, w1), tf.zeros(1), w2))
     13 
     14 loss = tf.reduce_sum((y - y_pred) ** 2)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.pyc in maximum(x, y, name)
   1490     A `Tensor`. Has the same type as `x`.
   1491   """
-> 1492   result = _op_def_lib.apply_op("Maximum", x=x, y=y, name=name)
   1493   return result
   1494 

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.pyc in apply_op(self, op_type_name, name, **keywords)
    372     inputs = []
    373     input_types = []
--> 374     with g.as_default(), ops.name_scope(name) as scope:
    375 
    376       # Perform input type inference

/usr/lib/python2.7/contextlib.pyc in __enter__(self)
     15     def __enter__(self):
     16         try:
---> 17             return self.gen.next()
     18         except StopIteration:
     19             raise RuntimeError("generator didn't yield")

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in name_scope(name, default_name, values)
   4054     values = []
   4055   g = _get_graph_from_inputs(values)
-> 4056   with g.as_default(), g.name_scope(n) as scope:
   4057     yield scope
   4058 # pylint: enable=g-doc-return-or-yield

/usr/lib/python2.7/contextlib.pyc in __enter__(self)
     15     def __enter__(self):
     16         try:
---> 17             return self.gen.next()
     18         except StopIteration:
     19             raise RuntimeError("generator didn't yield")

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in name_scope(self, name)
   2750         # Scopes created in the root must match the more restrictive
   2751         # op name regex, which constrains the initial character.
-> 2752         if not _VALID_OP_NAME_REGEX.match(name):
   2753           raise ValueError("'%s' is not a valid scope name" % name)
   2754     try:

TypeError: expected string or buffer

In [ ]:


In [ ]: