为了更好地熟悉PyTorch和对比其与其他框架的区别，将官网上的例程自己都写一遍并做更详细的注释。例程中，只快速实现两层神经网络的核心部分，因此训练数据是随机生成的，而且只实现了对参数的更新调整，未涉及对代价函数的优化过程。完成全套例程，会对神经网络的前向通道及反向传播有更好的理解。具体实现的方法有：

利用Numpy实现 (CPU)
利用PyTorch的tensor实现 (CPU和GPU)
利用PyTorch的autograd模块实现
利用Tensorflow实现，对比静态图与动态图的区别

1. Numpy实现



In [3]:

    
import numpy as np

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

#对输入层和输出层的参数进行初始化
w1 = np.random.randn(D_in, D_hidden)
w2 = np.random.randn(D_hidden, D_out)

learning_rate = 0.001

#循环更新参数，每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    h_linear = x.dot(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = np.maximum(h_linear, 0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.dot(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * np.sum(np.square(y_pred - y))  #sum squared error as loss
    
    # 反向求导
    grad_y_pred = y_pred - y   #10x5
    grad_w2 = h_relu.T.dot(grad_y_pred)    #30x10 and 10x5 produce the dimension of w2: 30x5
    grad_h_relu = grad_y_pred.dot(w2.T)      #30x5 and 10x5 produce the dimension of h_relu: 10x30
    grad_h = grad_h_relu.copy()
    grad_h[h_linear < 0] = 0     #替代针对隐含层导数中的负数为零
    grad_w1 = x.T.dot(grad_h)    #20x10 and 10x30 produce 20x30 
    
    #梯度下降法更新参数
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

2. PyTorch的tensor实现

只需将numpy的程序稍作调整就能实现tensor的实现，从而是程序能够部署到GPU上运算。



In [ ]:

    
import torch as T

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据
x = T.randn(N, D_in)
y = T.randn(N, D_out)

#对输入层和输出层的参数进行初始化
w1 = T.randn(D_in, D_hidden)
w2 = T.randn(D_hidden, D_out)

learning_rate = 0.001

#循环更新参数，每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    #mm should also work as x is a matrix. The matrix multiplication will be summarized in another post
    h_linear = x.matmul(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = h_linear.clamp(min=0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.matmul(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * (y_pred - y).pow(2).sum() #sum squared error as loss
    
    # 反向求导
    grad_y_pred = y_pred - y   #10x5
    grad_w2 = h_relu.t().mm(grad_y_pred)    #30x10 and 10x5 produce the dimension of w2: 30x5
    grad_h_relu = grad_y_pred.dot(w2.t())      #30x5 and 10x5 produce the dimension of h_relu: 10x30
    grad_h = grad_h_relu.clone()
    grad_h[h_linear < 0] = 0     #替代针对隐含层导数中的负数为零
    grad_w1 = x.t().mm(grad_h)    #20x10 and 10x30 produce 20x30 
    
    #梯度下降法更新参数
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

3. 利用PyTorch的Tensor和autograd实现

两层网络的反向求导比较容易，但如果层数加多，在手动求导就会变得很复杂。因此深度学习平台都提供了自动求导功能，PyTorch的Autograd中的自动求导功能可以使反向求导简捷且灵活。要注意的是计算图的构建需要用autograd中的Variable将需要并入计算图中的变量进行封装，并设置相关属性。



In [2]:

    
import torch as T
from torch.autograd import Variable

#先定义网络结构: batch_size, Input Dimension, Hidden Dimension, Output Dimension 
N, D_in, D_hidden, D_out = 10, 20, 30, 5  

#随机生成输入和输出数据, 并用Variable对输入输出进行封装，同时在计算图形中不要求求导
x = Variable(T.randn(N, D_in), requires_grad=False)
y = Variable(T.randn(N, D_out), requires_grad=False)

#对输入层和输出层的参数进行初始化，并用Variable封装，同时要求求导
w1 = Variable(T.randn(D_in, D_hidden), requires_grad=True)
w2 = Variable(T.randn(D_hidden, D_out), requires_grad=True)

learning_rate = 0.001

#循环更新参数，每个循环前向和反向各计算一次
for i in xrange(50):
    
    # 计算前向通道
    #mm should also work as x is a matrix. The matrix multiplication will be summarized in another post
    h_linear = x.matmul(w1)     #10x20 and 20x30 produce 10x30, which is the shape of h_linear
    h_relu = h_linear.clamp(min=0) #note one have to use np.maximum but not np.max, 10x30
    y_pred = h_relu.matmul(w2)  #10x30 and 30x5 produce 10x5
    
    #定义代价函数
    loss = 0.5 * (y_pred - y).pow(2).sum() #sum squared error as loss
    
    loss.backward()
    
    
    #梯度下降法更新参数
    w1.data -= learning_rate * w1.grad.data  #note that we are updating the 'data' of Variable w1
    w2.data -= learning_rate * w2.grad.data
    
    #PyTorch中，将grad中的值在循环中进行累积，当不须此操作时，应清零
    w1.grad.data.zero_()
    w2.grad.data.zero_()

4. 利用tensorflow实现

tensorflow和PyTorch的核心区别在于，前者使用静态图，即在实际导入数据前就已经把完整的计算图形定义了，在训练过程对此图形进行参数更新；而后者作为动态图，在每个循环中都可以改变计算图形，比如添加或减少图形中的计算点。



In [ ]:



In [ ]:



In [ ]:



In [ ]: