为了更好地熟悉PyTorch和对比其与其他框架的区别,将官网上的例程自己都写一遍并做更详细的注释。例程中,只快速实现两层神经网络的核心部分,因此训练数据是随机生成的,而且只实现了对参数的更新调整,未涉及对代价函数的优化过程。完成全套例程,会对神经网络的前向通道及反向传播有更好的理解。具体实现的方法有:

  1. 利用Numpy实现 (CPU)
  2. 利用PyTorch的tensor实现 (CPU和GPU)
  3. 利用PyTorch的autograd模块实现
  4. 利用Tensorflow实现,对比静态图与动态图的区别

1. Numpy实现


In [3]:
import numpy as np

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

#对输入层和输出层的参数进行初始化
w1 = np.random.randn(D_in, D_hidden)
w2 = np.random.randn(D_hidden, D_out)

learning_rate = 0.001

#循环更新参数,每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    h_linear = x.dot(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = np.maximum(h_linear, 0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.dot(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * np.sum(np.square(y_pred - y))  #sum squared error as loss
    
    # 反向求导
    grad_y_pred = y_pred - y   #10x5
    grad_w2 = h_relu.T.dot(grad_y_pred)    #30x10 and 10x5 produce the dimension of w2: 30x5
    grad_h_relu = grad_y_pred.dot(w2.T)      #30x5 and 10x5 produce the dimension of h_relu: 10x30
    grad_h = grad_h_relu.copy()
    grad_h[h_linear < 0] = 0     #替代针对隐含层导数中的负数为零
    grad_w1 = x.T.dot(grad_h)    #20x10 and 10x30 produce 20x30 
    
    #梯度下降法更新参数
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

2. PyTorch的tensor实现

只需将numpy的程序稍作调整就能实现tensor的实现,从而是程序能够部署到GPU上运算。


In [ ]:
import torch as T

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据
x = T.randn(N, D_in)
y = T.randn(N, D_out)

#对输入层和输出层的参数进行初始化
w1 = T.randn(D_in, D_hidden)
w2 = T.randn(D_hidden, D_out)

learning_rate = 0.001

#循环更新参数,每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    #mm should also work as x is a matrix. The matrix multiplication will be summarized in another post
    h_linear = x.matmul(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = h_linear.clamp(min=0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.matmul(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * (y_pred - y).pow(2).sum() #sum squared error as loss
    
    # 反向求导
    grad_y_pred = y_pred - y   #10x5
    grad_w2 = h_relu.t().mm(grad_y_pred)    #30x10 and 10x5 produce the dimension of w2: 30x5
    grad_h_relu = grad_y_pred.dot(w2.t())      #30x5 and 10x5 produce the dimension of h_relu: 10x30
    grad_h = grad_h_relu.clone()
    grad_h[h_linear < 0] = 0     #替代针对隐含层导数中的负数为零
    grad_w1 = x.t().mm(grad_h)    #20x10 and 10x30 produce 20x30 
    
    #梯度下降法更新参数
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

3. 利用PyTorch的Tensor和autograd实现

两层网络的反向求导比较容易,但如果层数加多,在手动求导就会变得很复杂。因此深度学习平台都提供了自动求导功能,PyTorch的Autograd中的自动求导功能可以使反向求导简捷且灵活。要注意的是计算图的构建需要用autograd中的Variable将需要并入计算图中的变量进行封装,并设置相关属性。


In [2]:
import torch as T
from torch.autograd import Variable

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据, 并用Variable对输入输出进行封装,同时在计算图形中不要求求导
x = Variable(T.randn(N, D_in), requires_grad=False)
y = Variable(T.randn(N, D_out), requires_grad=False)

#对输入层和输出层的参数进行初始化,并用Variable封装,同时要求求导
w1 = Variable(T.randn(D_in, D_hidden), requires_grad=True)
w2 = Variable(T.randn(D_hidden, D_out), requires_grad=True)

learning_rate = 0.001

#循环更新参数,每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    #mm should also work as x is a matrix. The matrix multiplication will be summarized in another post
    h_linear = x.matmul(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = h_linear.clamp(min=0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.matmul(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * (y_pred - y).pow(2).sum() #sum squared error as loss
    
    loss.backward()
    
    
    #梯度下降法更新参数
    w1.data -= learning_rate * w1.grad.data  #note that we are updating the 'data' of Variable w1
    w2.data -= learning_rate * w2.grad.data
    
    #PyTorch中,将grad中的值在循环中进行累积,当不须此操作时,应清零
    w1.grad.data.zero_()
    w2.grad.data.zero_()

4. 利用tensorflow实现

tensorflow和PyTorch的核心区别在于,前者使用静态图,即在实际导入数据前就已经把完整的计算图形定义了,在训练过程对此图形进行参数更新;而后者作为动态图,在每个循环中都可以改变计算图形,比如添加或减少图形中的计算点。


In [ ]:


In [ ]:


In [ ]:


In [ ]: