梯度下降算法Demo

实例:

$$y = 3x_1 + 4x_2$$
$x_1$ $x_2$ $y$
1 4 19
2 5 26
5 1 19
4 2 29

参数函数: $h(\theta_1, \theta_2) = \theta_1x_1 + \theta_2x_2$

损失函数: $J(\theta) = \dfrac{1}{2m} \sum\limits_{i=1}^m{\big[ h(x^i) - y^i \big]^2}$ 其中$\dfrac{1}{2}$ 为了计算偏导数消除它

目标使得损失函数值最小, 求偏导:

$\dfrac{\partial{J(\theta)}}{\partial{\theta_j}} = \dfrac{1}{m}\sum\limits_{i=1}^{m}\big[h_\theta(x^i) - y^i \big] x_j^i $

m: 样本的数量

i: 第i个样本,一般放到右上方标记

j: 参数的序号: $\theta_0, \theta_1$

由于是要最小化损失函数,所以参数θ按其负梯度方向来更新:

$\theta_j' = \theta_j - \alpha \dfrac{\partial{J(\theta)}}{\partial{\theta_j}}$ $\alpha$是learning rate


In [1]:
## 全局变量

import random 

xs = [[1,4], [2,5], [5,1], [4,2]]
ys = [19,26,19,20] 

eps = 0.0001  # 精度
max_iters = 10000 
step_size = 0.01 # learning rate

In [2]:
## BGD(Batch gradient descent)批量梯度下降法:每次迭代使用所有的样本

# m = 4 (所有样本)
def bgd():
    iter_count = 0
    theta = [1,1]
    m = 4
    while (True):
        err1sum = 0
        err2sum = 0
        loss = 0
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            err1sum += (pred_y - ys[i]) * xs[i][0]
            err2sum += (pred_y - ys[i]) * xs[i][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
bgd()


loss =  9.428456066652548e-05
iter_count: 97
y = 3.00*x1 + 4.00*x2

In [3]:
## SGD(Stochastic gradientdescent)随机梯度下降法:每次迭代使用一组样本

def sgd():
    iter_count = 0
    theta = [1,1]
    m = 1
    while (True):
        loss = 0
        i = random.randint(0, 3)
        pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
        err1sum = (pred_y - ys[i]) * xs[i][0]
        err2sum = (pred_y - ys[i]) * xs[i][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
sgd()


loss =  9.603971783298928e-05
iter_count: 99
y = 3.00*x1 + 4.00*x2

In [4]:
## MBGD(Mini-batch gradient descent)小批量梯度下降:每次迭代使用b组样本
def mbgd():
    iter_count = 0
    theta = [1,1]
    m = 2 # 取两个样本
    while (True):
        loss = 0
        err1sum = 0
        err2sum = 0
        i = random.randint(0, 3)
        j = (i + 1) % 4
        pred_yi = theta[0]*xs[i][0] + theta[1]*xs[i][1]
        pred_yj = theta[0]*xs[j][0] + theta[1]*xs[j][1]
        err1sum += (pred_yi - ys[i]) * xs[i][0]
        err2sum += (pred_yi - ys[i]) * xs[i][1]
        err1sum += (pred_yj - ys[j]) * xs[j][0]
        err2sum += (pred_yj - ys[j]) * xs[j][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
mbgd()


loss =  9.814308420142368e-05
iter_count: 105
y = 3.00*x1 + 4.00*x2