Chapter 13: Mathematical optimization: finding minima of functions

13.1 Knowing your problem

Dimensionality of the problem: The scale of an optimization problem is pretty much set by the dimensionality of the problem, i.e. the number of scalar variables on which the search is performed

Optimizing convex functions is easy. Optimizing non-convex functions can be very hard

Optimizing smooth functions is easier

Noisy gradients: Many optimization methods rely on gradients of the objective function. If the gradient function is not given, they are computed numerically, which induces errors. In such situation, even if the object function is not noisy, a gradient-based optimization may be a noisy optimization.

Optimizations under constraints. For example, $-1 < x_1 < 1$

13.2 A review of the different optimizers

13.2.1 1D Optimization

Let's find the minimum of the scalar function $f(x) = exp[(x - 0.7)^2]$


In [14]:
from scipy import optimize
import numpy as np

In [15]:
def f(x):
    return -np.exp(-(x - 0.7) ** 2)

In [16]:
optimize.minimize_scalar(f)


Out[16]:
     fun: -1.0
    nfev: 13
     nit: 9
 success: True
       x: 0.6999999997839409

13.2.2 Gradient based methods

Gradient descent basically consists in taking small steps in the direction of the gradient, that is the direction of the steepest descent.

Conjugate gradient descent


In [17]:
def f(x):  # The rosenbrock function
    return .5 * (1 - x[0]) ** 2 + (x[1] - x[0]**2)**2

In [18]:
optimize.minimize(f, [2, -1], method="CG")


Out[18]:
     fun: 1.6503729082243953e-11
     jac: array([-6.15347610e-06,  2.53804028e-07])
 message: 'Optimization terminated successfully.'
    nfev: 108
     nit: 13
    njev: 27
  status: 0
 success: True
       x: array([0.99999426, 0.99998863])

Gradient methods need the Jacobian of the function. They can compute it numerically, but will perform better if you can pass them the gradient


In [19]:
def jacobian(x):
    return np.array((-2*.5*(1-x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))

In [20]:
optimize.minimize(f, [2, -1], method="CG", jac=jacobian)


Out[20]:
     fun: 1.0912121775174348e-11
     jac: array([-5.25283405e-06,  2.92893689e-07])
 message: 'Optimization terminated successfully.'
    nfev: 27
     nit: 13
    njev: 27
  status: 0
 success: True
       x: array([0.99999533, 0.99999081])

Note that the function has only been evaluated 27 times, compared to 108 without the gradient

13.2.3 Newton and quasi-newton methods


In [21]:
def f(x):  # The rosenbrock function
    return .5 * (1 - x[0]) ** 2 + (x[1] - x[0]**2)**2

In [22]:
def jacobian(x):
    return np.array((-2*.5*(1-x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))

In [23]:
optimize.minimize(f, [2, -1], method="Newton-CG", jac=jacobian)


Out[23]:
     fun: 1.9199019645744888e-15
     jac: array([ 1.14154539e-07, -8.15795229e-08])
 message: 'Optimization terminated successfully.'
    nfev: 11
    nhev: 0
     nit: 10
    njev: 52
  status: 0
 success: True
       x: array([0.99999994, 0.99999987])

Need lest function evaluations but more gradient evaluations, as it uses it to approximate the Hessian. Let's compute the Hessian and pass it to the algo


In [24]:
def hessian(x):
    return np.array(((1 - 4*x[1] + 12*x[0]**2, -4*x[0]), (-4*x[0],2)))

In [25]:
optimize.minimize(f, [2, -1], method="Newton-CG", jac=jacobian, hess=hessian)


Out[25]:
     fun: 1.6277298383706738e-15
     jac: array([ 1.11044158e-07, -7.78093352e-08])
 message: 'Optimization terminated successfully.'
    nfev: 11
    nhev: 10
     nit: 10
    njev: 20
  status: 0
 success: True
       x: array([0.99999994, 0.99999988])

In [26]:
optimize.brute


Out[26]:
<function scipy.optimize.optimize.brute>

13.6.1 Minimizing the norm of a vector function


In [28]:
def f(x):
    return np.arctan(x) - np.arctan(np.linspace(0, 1, len(x)))

x0 = np.zeros(10)

optimize.leastsq(f, x0)


Out[28]:
(array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
        0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ]), 2)

In [29]:
def g(x):
    return np.sum(f(x) ** 2)

optimize.minimize(g, x0, method="BFGS")


Out[29]:
      fun: 2.6940806765226745e-11
 hess_inv: array([[ 1.00000000e+00,  1.51476375e-06, -9.52938038e-07,
         8.10489577e-07, -8.70991890e-07, -2.38913216e-07,
         7.65778362e-08, -1.22683179e-07, -2.78963914e-07,
         3.30632792e-07],
       [ 1.51476375e-06,  5.22614114e-01,  4.31229121e-03,
        -2.77309862e-03, -2.20157681e-03, -2.43622607e-03,
        -5.48382371e-03, -6.66806633e-04, -6.81066028e-04,
        -6.35640504e-03],
       [-9.52938038e-07,  4.31229121e-03,  5.64454482e-01,
         3.44359095e-03,  7.50395426e-03, -1.94770178e-02,
        -8.05017387e-03, -7.92117628e-03, -9.80727145e-03,
         3.50617432e-03],
       [ 8.10489577e-07, -2.77309862e-03,  3.44359095e-03,
         6.24376042e-01, -5.23908109e-03,  2.09245054e-02,
        -9.17188524e-03,  2.33290991e-03, -1.94598581e-03,
        -9.59957240e-03],
       [-8.70991890e-07, -2.20157681e-03,  7.50395426e-03,
        -5.23908109e-03,  7.06384459e-01, -2.96921361e-02,
        -7.30974852e-03, -1.98820765e-02,  4.53601351e-02,
         3.35150308e-03],
       [-2.38913216e-07, -2.43622607e-03, -1.94770178e-02,
         2.09245054e-02, -2.96921361e-02,  9.82326555e-01,
        -2.74635555e-02, -7.74369474e-03,  4.05006892e-02,
        -1.09859854e-02],
       [ 7.65778362e-08, -5.48382371e-03, -8.05017387e-03,
        -9.17188524e-03, -7.30974852e-03, -2.74635555e-02,
         9.86851227e-01,  6.99212440e-03,  1.86241067e-02,
         7.19597227e-03],
       [-1.22683179e-07, -6.66806633e-04, -7.92117628e-03,
         2.33290991e-03, -1.98820765e-02, -7.74369474e-03,
         6.99212440e-03,  1.05286148e+00,  1.25196163e-01,
        -4.02652049e-02],
       [-2.78963914e-07, -6.81066028e-04, -9.80727145e-03,
        -1.94598581e-03,  4.53601351e-02,  4.05006892e-02,
         1.86241067e-02,  1.25196163e-01,  1.34158406e+00,
        -7.03041146e-02],
       [ 3.30632792e-07, -6.35640504e-03,  3.50617432e-03,
        -9.59957240e-03,  3.35150308e-03, -1.09859854e-02,
         7.19597227e-03, -4.02652049e-02, -7.03041146e-02,
         1.92714741e+00]])
      jac: array([ 1.32042360e-10, -1.57257183e-07,  1.23570752e-06, -5.33389991e-07,
       -1.53039138e-06, -3.48637014e-06, -4.53222693e-07, -3.17692241e-06,
        4.09300089e-06,  5.73595439e-07])
  message: 'Optimization terminated successfully.'
     nfev: 144
      nit: 11
     njev: 12
   status: 0
  success: True
        x: array([-7.38455942e-09,  1.11111023e-01,  2.22222895e-01,  3.33332997e-01,
        4.44443340e-01,  5.55552563e-01,  6.66666186e-01,  7.77773679e-01,
        8.88895440e-01,  1.00000114e+00])

In [31]:
f(x0)


Out[31]:
array([ 0.        , -0.11065722, -0.21866895, -0.32175055, -0.41822433,
       -0.5070985 , -0.5880026 , -0.66104317, -0.72664234, -0.78539816])

BFGS needs more function calls, and gives a less precise result.


In [ ]: