Seminar 20

Interior point method

Reminder

Projected gradient descent
Frank-Wolfe method

Convex optimization problem with equality constraints

\begin{equation*} \begin{split} &\min f(x) \\ \text{s.t. } & Ax = b, \end{split} \end{equation*}

where $f$ is convex and twice differentiable, $A \in \mathbb{R}^{p \times n}$ and $\mathrm{rank} \; A = p < n$

Dual problem

Relation between dual and conjugate functions \begin{equation*} \begin{split} g(\mu) & = -b^{\top}\mu + \inf_x(f(x) + \mu^{\top}Ax) \\ & = -b^{\top}\mu - \sup_x((-A^{\top}\mu)^{\top}x -f(x)) \\ & = -b^{\top}\mu - f^*(-A^{\top}\mu) \end{split} \end{equation*}

Dual problem

$$ \max_\mu -b^{\top}\mu - f^*(-A^{\top}\mu) $$

Approach 1: find conjugate function and

solve unconstrained optimization problem

Issues

it may be not so easy to find solution of the primal problem from the dual one
conjugate function $f^*$ has to be twice differentiable for fast solving of dual problem, but this is not always hold

Optimality conditions

$Ax^* = b$
$f'(x^*) + A^{\top}\mu^* = 0$

$$ \begin{bmatrix} f' & A^{\top} \\ A & 0 \end{bmatrix} \begin{bmatrix} x^{\\*} \\ \mu^{\\*} \end{bmatrix} = \begin{bmatrix} 0 \\ b \end{bmatrix} $$

Approach 2: solve generally non-linear system with Newton method.

Q: in what case the system becomes linear?

Newton method for convex optimization problem with equality constraints

\begin{equation*} \begin{split} & \min_v f(x) + f'(x)^{\top}v + \frac{1}{2}v^{\top}f''(x)v\\ \text{s.t. } & A(x + v) = b \end{split} \end{equation*}

From the optimality condition follows

$$ \begin{bmatrix} f''(x) & A^{\top} \\ A & 0 \end{bmatrix} \begin{bmatrix} v \\ w \end{bmatrix} = \begin{bmatrix} -f'(x) \\ 0 \end{bmatrix} $$

Newton direction $v$ is defined only for non-singular matrix!

Q: how direction $w$ can be interpreted?

Exercise.

Estimate number of iterations required for convergence of

Newton method for quadratic objective and linear equality constraints.

Linearization of optimality conditions

$A(x + v) = b \rightarrow Av = 0$
$f'(x + v) + A^{\top}w \approx f'(x) + f''(x)v + A^{\top}w = 0$

$f''(x)v + A^{\top}w = -f'(x)$

Pseudocode

Important note: initial point has to lie inside the feasible set!

def NewtonEqualityFeasible(f, gradf, hessf, A, b, 

                           stop_crit, line_search, x0, 

                           tol):

    x = x0

    n = x.shape[0]

    while True:

        newton_matrix = [[hessf(x), A.T], [A, 0]]

        rhs = [-gradf(x), 0]

        w = solve_lin_sys(newton_matrix, rhs)

        h = w[:n]

        if stop_crit(x, h, gradf(x), **kwargs) < tol:

            break

        alpha = line_search(x, h, f, gradf(x), **kwargs)

        x = x + alpha * h

    return x

Stopping criterion

Estimate the following difference

$$ f(x) - \inf_v(\hat{f}(x + v) \; | \; A(x+v) = b), $$

where $\hat{f}$ is quadratic approximation of function $f$.

To do this multiply both side by $h^{\top}$ from the left

$$ \langle h^{\top} \rvert \cdot \quad f''(x)h + A^{\top}w = -f'(x) $$

and use constraint $Ah = 0$

$$ h^{\top}f''(x)h = -f'(x)^{\top}h $$

Then

$$ \inf_v(\hat{f}(x + v) \; | \; A(x+v) = b) = f(x) - \frac{1}{2}h^{\top}f''(x)h $$

Summary: value of $h^{\top}f''(x)h$ is the most natural stopping criterion of Newton method.

Convergence theorem

Convergence of the equality constrained Newton method is equivalent

to convergence classical Newton method for unconstrained optimization problem.

Theorem Assume the following conditions hold

level set $S = \{ x \; | \; x \in D(f), \; f(x) \leq f(x_0), \; Ax = b \}$ is closed and $x_0 \in D(f), \; Ax_0 = b$
for any $x \in S$ and $\tilde{x} \in S$ hessian $f''(x)$ is Lipschitz
in the set $S$ $\|f''(x)\|_2 \leq M $ and norm of the inverse matrix of the KKT system is bounded above

Then, Newton method converges to the pair $(x^*, \mu^*)$

linearly when iterands far from the solution
quadratically in sufficiently small neighbourhood of the solution

Infeasible starting point

Newton method requires that starting point is feasible
But what to do if this requirement is violated? Example of hard case is the problem in which domain of the objective function is not $\mathbb{R}^n$.
Assume starting point is infeasible, then KKT conditions can be written as

$$ \begin{bmatrix} f''(x) & A^{\top}\\ A & 0 \end{bmatrix} \begin{bmatrix} v\\ w \end{bmatrix} = - \begin{bmatrix} f'(x)\\ {\color{red}{Ax - b}} \end{bmatrix} $$

If $x$ is feasible, then the system is equivalent to the system for Newton method with feasible starting point

Primal-dual interpretation

Method is called primal-dual, if every iteration updates both primal and dual variables
In particular, re-write optimality condition in the following form

$$ r(x^*, \mu^*) = (r_p(x^*, \mu^*), r_d(x^*, \mu^*)) = 0, $$

where $r_p(x, \mu) = Ax - b$ and $r_d(x, \mu) = f'(x) + A^{\top}\mu$

Solve system with Newton method:

$$ r(y + z) \approx r(y) + Dr(y)z = 0 $$

Primal-dual direction in Newton method is defined as the solution of the following system

$$ Dr(y)z = -r(y) $$

or more detailed

$$ \begin{bmatrix} f''(x) & A^{\top}\\ A & 0 \end{bmatrix} \begin{bmatrix} z_p\\ z_d \end{bmatrix} = - \begin{bmatrix} r_p(x, \mu)\\ r_d(x, \mu) \end{bmatrix} = - \begin{bmatrix} f'(x) + A^{\top}\mu\\ Ax - b \end{bmatrix} $$

Replace $z_d^+ = \mu + z_d$ and obtain

$$ \begin{bmatrix} f''(x) & A^{\top}\\ A & 0 \end{bmatrix} \begin{bmatrix} z_p\\ z_d^+ \end{bmatrix} = - \begin{bmatrix} f'(x)\\ Ax - b \end{bmatrix} $$

This system is equivalent to the previous one in the following notation

$$ v = z_p \qquad w = z_d^+ = \mu + z_d $$

Newton method gives direction for update primal variable and updated value for dual variable

Pseudocode

def NewtonEqualityInfeasible(f, grad, hessf, A, b, 

                             stop_crit, line_search, x0, 

                             mu0, tol):

    x = x0

    mu = mu0

    n = x.shape[0]

    while True:

        z_p, z_d = ComputeNewtonStep(hessf(x), A, b)

        if stop_crit(x, z_p, z_d, grad(x), **kwargs) < tol:

            break

        alpha = line_search(x, z_p, z_d, 

                            f, grad(x), **kwargs)

        x = x + alpha * z_p

        mu = mu + alpha * z_d

    return x

Stopping criterion and backtracking

Update $r_p$ after step $z_p$

$$ A(x + \alpha z_p) - b = [A(x + z_p) = b] = Ax + \alpha(b - Ax) - b = (1 - \alpha)(Ax - b) $$

Total update after $k$ steps

$$ r^{(k)} = \prod_{i=0}^{k-1}(1 - \alpha^{(i)})r^{(0)} $$

Stopping criterion: $Ax = b$ and $\|r(x, \mu)\|_2 \leq \varepsilon$

Backtracking: $c \in (0, 1/2)$, $\beta = (0, 1)$

def linesearch(r, x, mu, z_p, z_d, c, beta):
  alpha = 1
  while norm(r(x + alpha * z_p, mu + alpha * z_d)) >= 
  (1 - c * alpha) * norm(r(x, mu)): 
      alpha *= beta   
  return alpha

Convergence theorem

The theorem result is similar to the case of feasible starting point

Theorem. Assume that

sublevel set $S = \{(x, \mu) \; | \; x \in D(f), \; \| r(x, \mu) \|_2 \leq \| r(x_0, \mu_0) \|_2 \}$ is closed
in the set $S$ norm of the inverse KKT matrix is bounded
hessian is Lipschitz in $S$.

Then convergence is

linear far from the solution and
quadratic in the sufficiently small neighbourhood.

General convex optimization problem

\begin{equation*} \begin{split} & \min_{x \in \mathbb{R}^n} f_0(x)\\ \text{s.t. } & f_i (x) \leq 0 \qquad i=1,\ldots,m\\ & Ax = b, \end{split} \end{equation*}

where $f_i$ are convex and twice smoothly differentiable, $A \in \mathbb{R}^{p \times n}$ and $\mathrm{rank} \; A = p < n$.

Assume that the problem is strictly feasible, i.e. Slater conditions are satisfied.

Optimality conditions

Primal feasibility

$$ Ax^* = b, \; f_i(x^*) \leq 0, \; i = 1,\ldots,m $$

Dual feasibility

$$ \lambda^* \geq 0 $$

Lagrangian stationarity

$$ f'_0(x^*) + \sum_{i=1}^m \lambda^*_if'_i(x^*) + A^{\top}\mu^* = 0 $$

Complementary slackness condition

$$ \lambda^*_i f_i(x^*) = 0, \qquad i = 1,\ldots, m $$

Idea

Reduce the problem with inequality constraints to sequence of equality constrained problems
Use methods for solving equality constrained problems

\begin{equation*} \begin{split} & \min f_0(x) + \sum_{i=1}^m I_-(f_i(x))\\ \text{s.t. } & Ax = b, \end{split} \end{equation*}

where $I_-$ is an indicator function

$$ I_-(u) = \begin{cases} 0, & u \leq 0\\ \infty, & u > 0 \end{cases} $$

Issue. Now objective function is not differentiable.

Logarithmic barier

Idea. Approximate function $I_-(u)$ with function

$$ \hat{I}_-(u) = -t\log(-u), $$

where $t > 0$ is fixed parameter.

Functions $I_-(u)$ and $\hat{I}_-(u)$ are convex and non-decreasing
But $\hat{I}_-(u)$ is differentiable and approximates $I_-(u)$ while $t \to 0$



In [3]:

    
%matplotlib inline
import matplotlib.pyplot as plt
plt.rc("text", usetex=True)
import numpy as np

x = np.linspace(-2, 0, 100000, endpoint=False)
plt.figure(figsize=(10, 6))
for t in [0.1, 0.5, 1, 1.5, 2]:
    plt.plot(x, -t * np.log(-x), label=r"$t = " + str(t) + "$")
plt.legend(fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
_ = plt.xlabel("$u$", fontsize=26)

"Constrained" problem

\begin{equation*} \begin{split} & \min f_0(x) + \sum_{i=1}^m -t \log(-f_i(x))\\ \text{s.t. } & Ax = b, \end{split} \end{equation*}

The problem is still convex
Function

$$ \phi(x) = -\sum\limits_{i=1}^m \log(-f_i(x)) $$

is called logarithmic barier.

Its domain is a set of points such that the inequality constraints are strictly feasible.

Exercise. Find gradiebnt and hessian of $\phi(x)$

Central path

For every $t > 0$ "constrained" problem has unique solution $x^*(t)$.

Definition. Sequence $x^*(t)$ for $t > 0$ is formed central path.

Optimality conditions for "constrained" problem

Primal feasibility

$$ Ax^*(t) = b, \; f_i(x^*) < 0, \; i = 1,\ldots,m $$

Lagrangian stationarity

\begin{equation*} \begin{split} & f'_0(x^*(t)) + \phi'(x^*(t)) + A^{\top}\hat{\mu} = \\ & = f'_0(x^*(t)) - t\sum_{i=1}^m \frac{f_i'(x^*(t))}{f_i(x^*(t))} + A^{\top}\hat{\mu} = 0 \end{split} \end{equation*}

Introduce the following notation

$$ \lambda^*_i(t) = -\frac{t}{f_i(x^*(t))} \; i=1,\ldots,m \text{ и } \mu^* = \hat{\mu} $$

Then optimality condition can be re-written as

$$ f'_0(x^*(t)) + \sum_{i=1}^m \lambda^*_i(t)f_i'(x^*(t)) + A^{\top}\mu^* = 0 $$

Then $x^*(t)$ is minimizer of the following Lagrangian

$$ L = f_0(x) + \sum_{i=1}^m \lambda_if_i(x) + \mu^{\top}(Ax - b) $$

where $\lambda = \lambda^*(t)$ and $\mu = \mu^*$.

Duality gap

Dual function $g(\lambda^*(t), \mu^*)$ is finite and is represented in the following way \begin{equation*} \begin{split} g(\lambda^*(t), \mu^*) & = f_0(x^*(t)) + \sum_{i=1}^m \lambda^*_i(t)f_i(x^*(t)) + (\mu^*)^{\top}(Ax^*(t) - b)\\ & = f_0(x^*(t)) - mt \end{split} \end{equation*}
Duality gap

$$ f_0(x^*(t)) - p^* \leq mt $$

While $t \to 0$ duality gap is 0 and central path converges to the solution of the original problem

KKT interpretation

Optimality conditions for "constrained" problem is equivalent to optimality conditions for original problem if

$$ -\lambda_i f_i(x) = 0 \Rightarrow - \lambda_i f_i(x) = t \quad i = 1,\ldots, m $$

Physical interpretation

Assume that we do not have equality constraints
Consider classical particle in the field of forces
Every constraint $f_i(x) \leq 0$ corresponds to the following force

$$ F_i(x) = -\nabla(-\log(-f_i(x))) = \frac{f'_i(x)}{f_i(x)} $$

Objective function corresponds to some force, too

$$ F_0(x) = -\frac{f'_0(x)}{t} $$

Every point from the central path $x^*(t)$ is a equilibrium state of particle where sum of forces is zero
While decreasing $t$ forces $F_0(x)$ dominates forces $F_i(x)$ and particle aims to get state which corresponding to optimal value of objective
As far as forces $F_i(x)$ go to infinity when particle is close to the bounds, particle will never leave feasible region

Barier method

$x_0$ has to be feasible
$t_0 > 0$ is initial value of parameter
$\alpha \in (0, 1)$ is multiplier for decreasing $t_0$

def BarrierMethod(f, x0, t0, tol, alpha, **kwargs):

    x = x0

    t = t0

    while True:

        x = SolveBarrierProblem(f, t, x, **kwargs)

        if m * t < tol:

            break

        t *= alpha

    return x

Parameters selection

Multiplier $\alpha$
- In the case of $\alpha \sim 1$, "constrained" problem is solved after small number of iterations, but central path consists of large number of points
- In the case $\alpha \sim 10^{-5}$ the situation is completely different: large number of iterations for solving "constrained" problems, but small number of points for central path
Initialization $t_0$
- Alternatives are similar to parameter $\alpha$
- Parameter $t_0$ affects the initial point in central path

Almost convergence theorem

As it was shown above, while $t \to 0$ barrier method converges to the solution of the original problem
Convergence speed is directly affected by parameters $\alpha$ and $t_0$, as was shown above
The main difficulty is fast solving auxilliary problems with Newton methods

Problem of finding feasible starting point

Barier method requires feasible starting point
Method is splitted in two phases
- The first phase gives feasible starting point
- The second phase uses this starting point to run barier method

The first phase method

Simple method to find strictly feasible point

\begin{equation*} \begin{split} & \min s\\ \text{s.t. } & f_i(x) \leq s\\ & Ax = b \end{split} \end{equation*}

this problem always has strictly feasible starting point
if $s^* < 0$, then $x^*$ is strictly feasible and can be used in barier method
if $s^* > 0$, then original problem is infeasible and feasible set is empty

Sum of inconsistencies

\begin{equation*} \begin{split} & \min s_1 + \ldots + s_m\\ \text{s.t. } & f_i(x) \leq s_i\\ & Ax = b\\ & s \geq 0 \end{split} \end{equation*}

optimal objective value is 0 and it attains in the case of consistency of inequality constraints
if the problem is infeasible, it is possible to identify what constraints are violated, i.e. $s_i > 0$

Second phase

After starting point $x_0$ is found, one can run standard Newton method for equality constrained problem

Primal-dual method

It is similar to barier method, but

every iteration updates both primal and dual variables
Newton direction is taken from the modified KKT system
iterands in primal-dual method may be not feasible
it works even if the problem is not strictly feasible

Recap

Newton method for convex optimization problem with equality constraint
The case of infeasible initial point
Primal barier method
Primal-dual method