实例:
$$y = 3x_1 + 4x_2$$$x_1$ | $x_2$ | $y$ |
---|---|---|
1 | 4 | 19 |
2 | 5 | 26 |
5 | 1 | 19 |
4 | 2 | 29 |
参数函数: $h(\theta_1, \theta_2) = \theta_1x_1 + \theta_2x_2$
损失函数: $J(\theta) = \dfrac{1}{2m} \sum\limits_{i=1}^m{\big[ h(x^i) - y^i \big]^2}$ 其中$\dfrac{1}{2}$ 为了计算偏导数消除它
目标使得损失函数值最小, 求偏导:
$\dfrac{\partial{J(\theta)}}{\partial{\theta_j}} = \dfrac{1}{m}\sum\limits_{i=1}^{m}\big[h_\theta(x^i) - y^i \big] x_j^i $
m: 样本的数量
i: 第i个样本,一般放到右上方标记
j: 参数的序号: $\theta_0, \theta_1$
由于是要最小化损失函数,所以参数θ按其负梯度方向来更新:
$\theta_j' = \theta_j - \alpha \dfrac{\partial{J(\theta)}}{\partial{\theta_j}}$ $\alpha$是learning rate
In [1]:
## 全局变量
import random
xs = [[1,4], [2,5], [5,1], [4,2]]
ys = [19,26,19,20]
eps = 0.0001 # 精度
max_iters = 10000
step_size = 0.01 # learning rate
In [2]:
## BGD(Batch gradient descent)批量梯度下降法:每次迭代使用所有的样本
# m = 4 (所有样本)
def bgd():
iter_count = 0
theta = [1,1]
m = 4
while (True):
err1sum = 0
err2sum = 0
loss = 0
for i in range(4):
pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
err1sum += (pred_y - ys[i]) * xs[i][0]
err2sum += (pred_y - ys[i]) * xs[i][1]
theta[0] = theta[0] - step_size * err1sum / m
theta[1] = theta[1] - step_size * err2sum / m
iter_count += 1
# 损失函数
for i in range(4):
pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
loss += (1/(2*m)) * (pred_y - ys[i])**2
if (loss < eps or iter_count > max_iters):
print("loss = ", loss)
break
print("iter_count:", iter_count)
print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
bgd()
In [3]:
## SGD(Stochastic gradientdescent)随机梯度下降法:每次迭代使用一组样本
def sgd():
iter_count = 0
theta = [1,1]
m = 1
while (True):
loss = 0
i = random.randint(0, 3)
pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
err1sum = (pred_y - ys[i]) * xs[i][0]
err2sum = (pred_y - ys[i]) * xs[i][1]
theta[0] = theta[0] - step_size * err1sum / m
theta[1] = theta[1] - step_size * err2sum / m
iter_count += 1
# 损失函数
for i in range(4):
pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
loss += (1/(2*m)) * (pred_y - ys[i])**2
if (loss < eps or iter_count > max_iters):
print("loss = ", loss)
break
print("iter_count:", iter_count)
print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
sgd()
In [4]:
## MBGD(Mini-batch gradient descent)小批量梯度下降:每次迭代使用b组样本
def mbgd():
iter_count = 0
theta = [1,1]
m = 2 # 取两个样本
while (True):
loss = 0
err1sum = 0
err2sum = 0
i = random.randint(0, 3)
j = (i + 1) % 4
pred_yi = theta[0]*xs[i][0] + theta[1]*xs[i][1]
pred_yj = theta[0]*xs[j][0] + theta[1]*xs[j][1]
err1sum += (pred_yi - ys[i]) * xs[i][0]
err2sum += (pred_yi - ys[i]) * xs[i][1]
err1sum += (pred_yj - ys[j]) * xs[j][0]
err2sum += (pred_yj - ys[j]) * xs[j][1]
theta[0] = theta[0] - step_size * err1sum / m
theta[1] = theta[1] - step_size * err2sum / m
iter_count += 1
# 损失函数
for i in range(4):
pred_y = theta[0]*xs[i][0] + theta[1]*xs[i][1]
loss += (1/(2*m)) * (pred_y - ys[i])**2
if (loss < eps or iter_count > max_iters):
print("loss = ", loss)
break
print("iter_count:", iter_count)
print("y = %.2f*x1 + %.2f*x2" % (theta[0], theta[1]))
mbgd()