Also, see on the GitHub course page.
where $S \subseteq \mathbb{R}^n$, $f_j: S \rightarrow \mathbb{R}, \; j = 0,\ldots,m$, $g_k: S \rightarrow \mathbb{R}, \; k=1,\ldots,p$
All functions are at least continuous.
Important fact: nonlinear optimization problem in its general form is
numerically intractable!
if $x^*$ is a local minimum point of the differentiable function $f(x)$, then $$ f'(x^*) = 0 $$
if $x^*$ is a local minimum point of the twice differentiable function $f(x)$, then
$$ f'(x^*) = 0 \quad \text{и} \quad f''(x^*) \succeq 0 $$Assume $f(x)$ is twice differentiable function and $x^*$ satisfies the following condition
$$ f'(x^*) = 0 \quad f''(x^*) \succ 0, $$then $x^*$ is a strict local minimum point of function $f(x)$.
Remark: check that you can prove these claims!
But we don't know $x^*$!
Then $$ \|x_{k+1} - x_k \| = \|x_{k+1} - x_k + x^* - x^* \| \leq \|x_{k+1} - x^* \| + \| x_k - x^* \| \leq 2\varepsilon $$
The same is true for the convergence in $f$, but sometimes $f^*$ can be estimated!
Remark: better practise is to use relative difference in argument and functional, for example $\dfrac{\|x_{k+1} - x_k \|_2}{\| x_k \|_2}$
Definition: oracle is some abstact machine that responses on the sequaential method requests
OOP analogy:
Black box concept
Questions:
In this course we consider only line search methods!
However someiemes the concept of trust region methods will be helpful.
For given class of problems one can compare the following quantities:
Linear (geometric progression) $$ \| x_{k+1} - x^* \|_2 \leq Cq^k, $$ where $q \in (0, 1)$ and $ 0 < C < \infty$
Superlinear $$ \| x_{k+1} - x^* \|_2 \leq Cq^{k^p}, $$ where $q \in (0, 1)$, $ 0 < C < \infty$ and $p > 1$
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
USE_COLAB = False
if not USE_COLAB:
plt.rc("text", usetex=True)
import numpy as np
C = 10
alpha = -0.5
q = 0.9
num_iter = 7
sublinear = np.array([C * k**alpha for k in range(1, num_iter + 1)])
linear = np.array([C * q**k for k in range(1, num_iter + 1)])
superlinear = np.array([C * q**(k**2) for k in range(1, num_iter + 1)])
quadratic = np.array([C * q**(2**k) for k in range(1, num_iter + 1)])
plt.figure(figsize=(12,8))
plt.semilogy(np.arange(1, num_iter+1), sublinear,
label=r"Sublinear, $\alpha = -0.5$")
plt.semilogy(np.arange(1, num_iter+1), superlinear,
label=r"Superlinear, $q = 0.5, p=2$")
plt.semilogy(np.arange(1, num_iter+1), linear,
label=r"Linear, $q = 0.5$")
plt.semilogy(np.arange(1, num_iter+1), quadratic,
label=r"Quadratic, $q = 0.5$")
plt.xlabel("Number of iteration, $k$", fontsize=24)
plt.ylabel("Error rate upper bound", fontsize=24)
plt.legend(loc="best", fontsize=28)
plt.xticks(fontsize = 28)
_ = plt.yticks(fontsize = 28)
What answers give convergence theorems?
What answers do not give convergence theorem?
Main point: it is necessary to use common sense and be careful in usage of convergence theorem!
Zero order method: oracle returns only objective function $f(x)$
Fist order method : oracle returns objective function $f(x)$ and its gradient $f'(x)$
Second order method: oracle returns objective function $f(x)$, its gradient $f'(x)$ and its hessian $f''(x)$.
Q: do methods of higher order exist?
Idea from the first term CS course: divide given interval $[a,b]$ on two equal parts till minimum of the unimodal function is not found
Denite by $N$ the number of computations of function $f$, then one can perform $K = \frac{N - 1}{2}$ iterations and the following estimate holds: $$ |x_{K+1} - x^*| \leq \frac{b_{K+1} - a_{K+1}}{2} = \left( \frac{1}{2} \right)^{\frac{N-1}{2}} (b - a) \approx 0.5^{K} (b - a) $$
In [2]:
def binary_search(f, a, b, epsilon, callback=None):
c = (a + b) / 2.0
while abs(b - a) > epsilon:
# Check left subsegment
y = (a + c) / 2.0
if f(y) <= f(c):
b = c
c = y
else:
# Check right subsegment
z = (b + c) / 2.0
if f(c) <= f(z):
a = y
b = z
else:
a = c
c = z
if callback is not None:
callback(a, b)
return c
In [3]:
def my_callback(a, b, left_bound, right_bound, approximation):
left_bound.append(a)
right_bound.append(b)
approximation.append((a + b) / 2.0)
In [4]:
# %matplotlib inline
import numpy as np
# import matplotlib.pyplot as plt
left_boud_bs = []
right_bound_bs = []
approximation_bs = []
callback_bs = lambda a, b: my_callback(a, b,
left_boud_bs, right_bound_bs, approximation_bs)
# Target unimodal function on given segment
f = lambda x: (x - 2) * x * (x + 2)**2 # np.power(x+2, 2)
# f = lambda x: -np.sin(x)
x_true = -2
# x_true = np.pi / 2.0
a = -3
b = -1.5
epsilon = 1e-8
x_opt = binary_search(f, a, b, epsilon, callback_bs)
print(np.abs(x_opt - x_true))
plt.figure(figsize=(10,6))
plt.plot(np.linspace(a,b), f(np.linspace(a,b)))
plt.title("Objective function", fontsize=20)
plt.xticks(fontsize=20)
_ = plt.yticks(fontsize=20)
Idea: divide interval $[a,b]$ not on two eqwual parts, but in the golden ratio.
Estimate convergence speed like in bisection method: $$ |x_{K+1} - x^*| \leq b_{K+1} - a_{K+1} = \left( \frac{1}{\tau} \right)^{N-1} (b - a) \approx 0.618^K(b-a), $$ where $\tau = \frac{\sqrt{5} + 1}{2}$.
In [5]:
def golden_search(f, a, b, tol=1e-5, callback=None):
tau = (np.sqrt(5) + 1) / 2.0
y = a + (b - a) / tau**2
z = a + (b - a) / tau
while b - a > tol:
if f(y) <= f(z):
b = z
z = y
y = a + (b - a) / tau**2
else:
a = y
y = z
z = a + (b - a) / tau
if callback is not None:
callback(a, b)
return (a + b) / 2.0
In [6]:
left_boud_gs = []
right_bound_gs = []
approximation_gs = []
cb_gs = lambda a, b: my_callback(a, b, left_boud_gs, right_bound_gs, approximation_gs)
x_gs = golden_search(f, a, b, epsilon, cb_gs)
print(f(x_opt))
print(f(x_gs))
print(np.abs(x_opt - x_true))
In [7]:
plt.figure(figsize=(10,6))
plt.semilogy(np.arange(1, len(approximation_bs) + 1), np.abs(x_true - np.array(approximation_bs, dtype=np.float64)), label="Binary search")
plt.semilogy(np.arange(1, len(approximation_gs) + 1), np.abs(x_true - np.array(approximation_gs, dtype=np.float64)), label="Golden search")
plt.xlabel(r"Number of iterations, $k$", fontsize=24)
plt.ylabel("Error rate upper bound", fontsize=24)
plt.legend(loc="best", fontsize=24)
plt.xticks(fontsize = 24)
_ = plt.yticks(fontsize = 24)
In [8]:
%timeit binary_search(f, a, b, epsilon)
%timeit golden_search(f, a, b, epsilon)
In [9]:
f = lambda x: np.sin(np.sin(np.sin(np.sqrt(x))))
x_true = (3 * np.pi / 2)**2
a = 2
b = 60
epsilon = 1e-8
plt.plot(np.linspace(a,b), f(np.linspace(a,b)))
plt.xticks(fontsize = 24)
_ = plt.yticks(fontsize = 24)
plt.title("Objective function", fontsize=20)
Out[9]:
In [10]:
left_boud_bs = []
right_bound_bs = []
approximation_bs = []
callback_bs = lambda a, b: my_callback(a, b,
left_boud_bs, right_bound_bs, approximation_bs)
x_opt = binary_search(f, a, b, epsilon, callback_bs)
print(np.abs(x_opt - x_true))
In [11]:
left_boud_gs = []
right_bound_gs = []
approximation_gs = []
cb_gs = lambda a, b: my_callback(a, b, left_boud_gs, right_bound_gs, approximation_gs)
x_gs = golden_search(f, a, b, epsilon, cb_gs)
print(np.abs(x_opt - x_true))
In [12]:
plt.figure(figsize=(8,6))
plt.semilogy(np.abs(x_true - np.array(approximation_bs, dtype=np.float64)), label="Binary")
plt.semilogy(np.abs(x_true - np.array(approximation_gs, dtype=np.float64)), label="Golden")
plt.legend(fontsize=24)
plt.xticks(fontsize=24)
_ = plt.yticks(fontsize=24)
plt.xlabel(r"Number of iterations, $k$", fontsize=24)
plt.ylabel("Error rate upper bound", fontsize=24)
Out[12]:
In [13]:
%timeit binary_search(f, a, b, epsilon)
%timeit golden_search(f, a, b, epsilon)