Relation between dual and conjugate functions \begin{equation*} \begin{split} g(\mu) & = -b^{\top}\mu + \inf_x(f(x) + \mu^{\top}Ax) \\ & = -b^{\top}\mu - \sup_x((-A^{\top}\mu)^{\top}x -f(x)) \\ & = -b^{\top}\mu - f^*(-A^{\top}\mu) \end{split} \end{equation*}
Dual problem
$$ \max_\mu -b^{\top}\mu - f^*(-A^{\top}\mu) $$Approach 1: find conjugate function and
solve unconstrained optimization problem
Issues
or
$$ \begin{bmatrix} f' & A^{\top} \\ A & 0 \end{bmatrix} \begin{bmatrix} x^{\\*} \\ \mu^{\\*} \end{bmatrix} = \begin{bmatrix} 0 \\ b \end{bmatrix} $$Approach 2: solve generally non-linear system with Newton method.
Q: in what case the system becomes linear?
From the optimality condition follows
$$ \begin{bmatrix} f''(x) & A^{\top} \\ A & 0 \end{bmatrix} \begin{bmatrix} v \\ w \end{bmatrix} = \begin{bmatrix} -f'(x) \\ 0 \end{bmatrix} $$Newton direction $v$ is defined only for non-singular matrix!
Q: how direction $w$ can be interpreted?
Exercise.
Estimate number of iterations required for convergence of
Newton method for quadratic objective and linear equality constraints.
Important note: initial point has to lie inside the feasible set!
def NewtonEqualityFeasible(f, gradf, hessf, A, b,
stop_crit, line_search, x0,
tol):
x = x0
n = x.shape[0]
while True:
newton_matrix = [[hessf(x), A.T], [A, 0]]
rhs = [-gradf(x), 0]
w = solve_lin_sys(newton_matrix, rhs)
h = w[:n]
if stop_crit(x, h, gradf(x), **kwargs) < tol:
break
alpha = line_search(x, h, f, gradf(x), **kwargs)
x = x + alpha * h
return x
Estimate the following difference
$$ f(x) - \inf_v(\hat{f}(x + v) \; | \; A(x+v) = b), $$where $\hat{f}$ is quadratic approximation of function $f$.
To do this multiply both side by $h^{\top}$ from the left
$$ \langle h^{\top} \rvert \cdot \quad f''(x)h + A^{\top}w = -f'(x) $$and use constraint $Ah = 0$
$$ h^{\top}f''(x)h = -f'(x)^{\top}h $$Then
$$ \inf_v(\hat{f}(x + v) \; | \; A(x+v) = b) = f(x) - \frac{1}{2}h^{\top}f''(x)h $$Summary: value of $h^{\top}f''(x)h$ is the most natural stopping criterion of Newton method.
Convergence of the equality constrained Newton method is equivalent
to convergence classical Newton method for unconstrained optimization problem.
Theorem Assume the following conditions hold
Then, Newton method converges to the pair $(x^*, \mu^*)$
where $r_p(x, \mu) = Ax - b$ and $r_d(x, \mu) = f'(x) + A^{\top}\mu$
or more detailed
$$ \begin{bmatrix} f''(x) & A^{\top}\\ A & 0 \end{bmatrix} \begin{bmatrix} z_p\\ z_d \end{bmatrix} = - \begin{bmatrix} r_p(x, \mu)\\ r_d(x, \mu) \end{bmatrix} = - \begin{bmatrix} f'(x) + A^{\top}\mu\\ Ax - b \end{bmatrix} $$def NewtonEqualityInfeasible(f, grad, hessf, A, b,
stop_crit, line_search, x0,
mu0, tol):
x = x0
mu = mu0
n = x.shape[0]
while True:
z_p, z_d = ComputeNewtonStep(hessf(x), A, b)
if stop_crit(x, z_p, z_d, grad(x), **kwargs) < tol:
break
alpha = line_search(x, z_p, z_d,
f, grad(x), **kwargs)
x = x + alpha * z_p
mu = mu + alpha * z_d
return x
def linesearch(r, x, mu, z_p, z_d, c, beta):
alpha = 1
while norm(r(x + alpha * z_p, mu + alpha * z_d)) >=
(1 - c * alpha) * norm(r(x, mu)):
alpha *= beta
return alpha
The theorem result is similar to the case of feasible starting point
Theorem. Assume that
Then convergence is
where $f_i$ are convex and twice smoothly differentiable, $A \in \mathbb{R}^{p \times n}$ and $\mathrm{rank} \; A = p < n$.
Assume that the problem is strictly feasible, i.e. Slater conditions are satisfied.
where $I_-$ is an indicator function
$$ I_-(u) = \begin{cases} 0, & u \leq 0\\ \infty, & u > 0 \end{cases} $$Issue. Now objective function is not differentiable.
In [3]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rc("text", usetex=True)
import numpy as np
x = np.linspace(-2, 0, 100000, endpoint=False)
plt.figure(figsize=(10, 6))
for t in [0.1, 0.5, 1, 1.5, 2]:
plt.plot(x, -t * np.log(-x), label=r"$t = " + str(t) + "$")
plt.legend(fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
_ = plt.xlabel("$u$", fontsize=26)
is called logarithmic barier.
Its domain is a set of points such that the inequality constraints are strictly feasible.
Exercise. Find gradiebnt and hessian of $\phi(x)$
where $\lambda = \lambda^*(t)$ and $\mu = \mu^*$.
Dual function $g(\lambda^*(t), \mu^*)$ is finite and is represented in the following way \begin{equation*} \begin{split} g(\lambda^*(t), \mu^*) & = f_0(x^*(t)) + \sum_{i=1}^m \lambda^*_i(t)f_i(x^*(t)) + (\mu^*)^{\top}(Ax^*(t) - b)\\ & = f_0(x^*(t)) - mt \end{split} \end{equation*}
Duality gap
Simple method to find strictly feasible point
\begin{equation*} \begin{split} & \min s\\ \text{s.t. } & f_i(x) \leq s\\ & Ax = b \end{split} \end{equation*}