实例:
$$y = 3x_1 + 4x_2$$| $x_1$ | $x_2$ | $y$ | 
|---|---|---|
| 1 | 4 | 19 | 
| 2 | 5 | 26 | 
| 5 | 1 | 19 | 
| 4 | 2 | 29 | 
参数函数: $h(\theta_1, \theta_2) = \theta_1x_1 + \theta_2x_2$
损失函数: $J(\theta) = \dfrac{1}{2m} \sum\limits_{i=1}^m{\big[ h(x^i) - y^i \big]^2}$ 其中$\dfrac{1}{2}$ 为了计算偏导数消除它
目标使得损失函数值最小, 求偏导:
$\dfrac{\partial{J(\theta)}}{\partial{\theta_j}} = \dfrac{1}{m}\sum\limits_{i=1}^{m}\big[h_\theta(x^i) - y^i \big] x_j^i $
m: 样本的数量
i: 第i个样本,一般放到右上方标记
j: 参数的序号: $\theta_0, \theta_1$
由于是要最小化损失函数,所以参数θ按其负梯度方向来更新:
$\theta_j' = \theta_j - \alpha \dfrac{\partial{J(\theta)}}{\partial{\theta_j}}$ $\alpha$是learning rate
In [1]:
    
## 全局变量
import random 
xs = [[1,4], [2,5], [5,1], [4,2]]
ys = [19,26,19,20] 
eps = 0.0001  # 精度
max_iters = 10000 
step_size = 0.01 # learning rate
    
In [2]:
    
## BGD(Batch gradient descent)批量梯度下降法:每次迭代使用所有的样本
# m = 4 (所有样本)
def bgd():
    iter_count = 0
    theta = [1,1]
    m = 4
    while (True):
        err1sum = 0
        err2sum = 0
        loss = 0
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            err1sum += (pred_y - ys[i]) * xs[i][0]
            err2sum += (pred_y - ys[i]) * xs[i][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
bgd()
    
    
In [3]:
    
## SGD(Stochastic gradientdescent)随机梯度下降法:每次迭代使用一组样本
def sgd():
    iter_count = 0
    theta = [1,1]
    m = 1
    while (True):
        loss = 0
        i = random.randint(0, 3)
        pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
        err1sum = (pred_y - ys[i]) * xs[i][0]
        err2sum = (pred_y - ys[i]) * xs[i][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
sgd()
    
    
In [4]:
    
## MBGD(Mini-batch gradient descent)小批量梯度下降:每次迭代使用b组样本
def mbgd():
    iter_count = 0
    theta = [1,1]
    m = 2 # 取两个样本
    while (True):
        loss = 0
        err1sum = 0
        err2sum = 0
        i = random.randint(0, 3)
        j = (i + 1) % 4
        pred_yi = theta[0]*xs[i][0] + theta[1]*xs[i][1]
        pred_yj = theta[0]*xs[j][0] + theta[1]*xs[j][1]
        err1sum += (pred_yi - ys[i]) * xs[i][0]
        err2sum += (pred_yi - ys[i]) * xs[i][1]
        err1sum += (pred_yj - ys[j]) * xs[j][0]
        err2sum += (pred_yj - ys[j]) * xs[j][1]
        theta[0] = theta[0] - step_size * err1sum / m
        theta[1] = theta[1] - step_size * err2sum / m
        
        iter_count += 1
        
        # 损失函数
        for i in range(4):
            pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
            loss += (1/(2*m)) * (pred_y - ys[i])**2
        if (loss < eps or iter_count > max_iters):
            print("loss = ", loss)
            break
    print("iter_count:", iter_count)
    print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
    
mbgd()