Gradient Descent (спускане по градиента)

$$\displaystyle \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} = \lim_{\Delta x \to 0} \frac{\Delta y}{\Delta x} = f'(x)$$
  • Линейна апроксимация на функция чрез формулата на Тейлър
$$f(x+p) = f(x) + p \, f'(x + \theta p) ,\hspace{1pc} \theta \in (0, 1) ,\hspace{1pc} x,p \in \mathbb{R}^N$$
  • Квадратична апроксимация на функция чрез формулата на Тейлър
$$f(x+p) = f(x) + p \, f'(x) + \frac{1}{2}p^T f''(x + \theta p)p ,\hspace{1pc} \theta \in (0, 1) ,\hspace{1pc} x,p \in \mathbb{R}^N$$
- Нютонови методи
- Trust Region (области на доверие)
- изискват f(x) да е два пъти диференцируема

  • Gradient Descent = итеративен метод, който изгражда редица от точки $\{x_k\} \to x^*$ по правилото:
$$x_{k+1} = x_k + \alpha_k p_k$$
$p_k = -f'(x_k)^T$ e посоката (на най-бързо спускане, hence gradient descent)
$\alpha_k > 0 \in \mathbb{R}$ е стъпкатa
  • Line Search (линейно търсене на оптимална стъпка)
    • Условия на Волф $$ \begin{equation*} \begin{aligned} & \underset{\alpha_k > 0 \, \in \, \mathbb{R}}{\text{minimize}} & f(x_k + \alpha_k p_k) \\ \end{aligned} \end{equation*} $$
$$ \begin{equation*} \begin{aligned} & f(x_{k+1}) \leq f(x_k) + c_1 \alpha_k f'(x_k) p_k & Armijo \\ & f'(x_{k+1}) p_k \geq c_2 f'(x_k) p_k & curvature \\ & 0 < c_1 < c_2 < 1 \end{aligned} \end{equation*} $$

In [1]:
from collections import Counter
from functools import reduce
import math, random

def dot(v, w):
    """v_1 * w_1 + ... + v_n * w_n"""
    return sum(v_i * w_i for v_i, w_i in zip(v, w))

def sum_of_squares(v):
    """v_1 * v_1 + ... + v_n * v_n"""
    return dot(v, v)

def squared_distance(v, w):
    return sum_of_squares(vector_subtract(v, w))

def distance(v, w):
    return math.sqrt(squared_distance(v, w))

def vector_subtract(v, w):
    """subtracts two vectors componentwise"""
    return [v_i - w_i for v_i, w_i in zip(v,w)]
def scalar_multiply(c, v):
    return [c * v_i for v_i in v]

def sum_of_squares(v):
    """computes the sum of squared elements in v"""
    return sum(v_i ** 2 for v_i in v)

def difference_quotient(f, x, h):
    return (f(x + h) - f(x)) / h

def partial_difference_quotient(f, v, i, h):
    # add h to just the i-th element of v
    w = [v_j + (h if j == i else 0)
         for j, v_j in enumerate(v)]

    return (f(w) - f(v)) / h

def estimate_gradient(f, v, h=0.00001):
    return [partial_difference_quotient(f, v, i, h)
            for i, _ in enumerate(v)]

def step(v, direction, step_size):
    """move step_size in the direction from v"""
    return [v_i + step_size * direction_i
            for v_i, direction_i in zip(v, direction)]

def sum_of_squares_gradient(v):
    return [2 * v_i for v_i in v]

def rosenbrock(v, a=1, b=100):
    return (a - v[0])**2 + b*(v[1] - v[0]**2)**2

def rosenbrock_gradient(v, a=1, b=100):
    return [-2*a + 4*b*v[0]**3 - 4*b*v[0]*v[1] + 2*v[0], 2*b*(v[1] - v[0]**2)]

def safe(f):
    """define a new function that wraps f and return it"""
    def safe_f(*args, **kwargs):
        try:
            return f(*args, **kwargs)
        except:
            return float('inf')         # this means "infinity" in Python
    return safe_f

def minimize_batch(target_fn, gradient_fn, theta_0, tolerance=0.000001):
    """use gradient descent to find theta that minimizes target function"""

    step_sizes = [100, 10, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001]

    theta = theta_0                           # set theta to initial value
    target_fn = safe(target_fn)               # safe version of target_fn
    value = target_fn(theta)                  # value we're minimizing

    while True:
        gradient = gradient_fn(theta)
        next_thetas = [step(theta, gradient, -step_size)
                       for step_size in step_sizes]

        # choose the one that minimizes the error function
        next_theta = min(next_thetas, key=target_fn)
        next_value = target_fn(next_theta)

        # stop if we're "converging"
        if abs(value - next_value) < tolerance:
            return theta
        else:
            theta, value = next_theta, next_value

def negate(f):
    """return a function that for any input x returns -f(x)"""
    return lambda *args, **kwargs: -f(*args, **kwargs)

def negate_all(f):
    """the same when f returns a list of numbers"""
    return lambda *args, **kwargs: [-y for y in f(*args, **kwargs)]

def maximize_batch(target_fn, gradient_fn, theta_0, tolerance=0.000001):
    return minimize_batch(negate(target_fn),
                          negate_all(gradient_fn),
                          theta_0,
                          tolerance)

print("using the gradient")

v = [random.randint(-10,10) for i in range(3)]

tolerance = 0.0000001

while True:
    #print v, sum_of_squares(v)
    gradient = sum_of_squares_gradient(v)   # compute the gradient at v
    next_v = step(v, gradient, -0.01)       # take a negative gradient step
    if distance(next_v, v) < tolerance:     # stop if we're converging
        break
    v = next_v                              # continue if we're not

print("minimum v", v)
print("minimum value", sum_of_squares(v))
print()


print("using minimize_batch")

v = [random.randint(-10,10) for i in range(3)]

v = minimize_batch(sum_of_squares, sum_of_squares_gradient, v)

print("minimum v", v)
print("minimum value", sum_of_squares(v))

v = minimize_batch(rosenbrock, rosenbrock_gradient, [0.1, 1.4])
print("\nRosenbrock minimum batch point", v)
print("Rosenbrock minimum batch value", rosenbrock(v))


using the gradient
minimum v [2.222617517893592e-06, -2.963490023858121e-06, 3.33392627684039e-06]
minimum value 2.4837366171760904e-11

using minimize_batch
minimum v [-0.0007788445287802246, -0.0007788445287802246, -0.001038459371706966]
minimum value 2.2915954667078066e-06

Rosenbrock minimum batch point [0.9799559824593174, 0.9601685501281534]
Rosenbrock minimum batch value 0.00040387028777924636