Original problem can be reformulated as unconstrained optimization problem
$$ \min_x Q(x, \mu), $$where
$$ Q(x, \mu) = f(x) + \frac{\mu}{2}\sum\limits_{i=1}^mg^2_i(x), \quad \mu > 0 $$def QudraticPenaltyEquality(Q, gradQ, x0, get_mu, get_tau, **kwargs):
while True:
# Stop when norm of gradient of Q is less than current tau
x = MinimizeQ(Q, gradQ, x0, get_tau)
if global_cnvergence(x, Q, **kwargs):
break
mu = get_mu()
Q, gradQ = UpdateQ(mu)
x0 = UpdateStartPoint(x, Q)
return x
Theorem 1. Assume that for every $\mu$ unconstrained optimization problem has finite global solution. Then the sequence of the subproblems solutions converges to global solution of the original problem while $\mu \to \infty$
Theorem 2. Assume $\tau_k \to 0$ and $\mu_k \to \infty$ and $\| Q'(x^*_k, \mu_k) \| \leq \tau_k$. Then
For any subsequence $x^*_k \to x^*, \; k \in \mathcal{C}$ the following holds
$$ \lim_{k \in \mathcal{C}} \mu_k g_i(x_k) = \lambda_i^* $$for all $i = 1,\ldots,m$, where $\lambda_i^*$ is Lagrange multipliers that satisfied KKT.
Example. \begin{equation*} \begin{split} & \min -5x_1^2 + x_2^2\\ \text{s.t. }& x_1 = 1 \end{split} \end{equation*} Penalty function $$ Q(x, \mu) = -5x_1^2 + x_2^2 + \frac{\mu}{2}(x_1 - 1)^2 $$ Note While $\mu < 10$ function $Q(x, \mu)$ is unbounded below for $x$ and subproblem does not have finite solution
where $g'(x)$ is Jacobian of the vector-function representing equality constraints.
From theorem 2, the following approximation holds near the optimum point
$$ Q''(x, \mu) \approx L''(x, \lambda^*) + \mu(g'(x))^{\top} g'(x) $$Summary: some of the eigenvalues of hessian $Q''(x, \mu)$ are of the order $\mu$, therefore increasing $\mu$ leads to singularity of hessian
Corollary: search direction in the Newton method is very inaccurate!
Let us introduce new variable $\xi = \mu g'(x) p$ and re-write the system in the form
$$ \begin{bmatrix} f''(x) + \mu \sum\limits_{i=1}^m g_i(x)g_i''(x) & g'(x)\\ (g'(x))^{\top} & -\frac{1}{\mu} I \end{bmatrix} = \begin{bmatrix} -Q'(x, \mu) \\ 0 \end{bmatrix} $$where $\mu > 0$
In [1]:
import cvxpy
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rc("text", usetex=True)
num_iters = 10
n = 20
m = 10
A = np.random.randn(m, n)
b = np.random.randn(m, 1)
# Initialize problem
x = cvxpy.Variable(shape=(n, 1))
f = cvxpy.norm(x, 2)
# Solve with CVXPY.
cvxpy.Problem(cvxpy.Minimize(f), [A*x == b]).solve()
cvxpy_f = f.value
print ("Optimal value from CVXPY =", f.value)
In [2]:
# Solve with method of augmented Lagrangian
resid = A*x - b
mu = cvxpy.Parameter(value=1, nonneg=True)
penal_f = f + (mu/2.)*cvxpy.sum_squares(resid)
res_conv = []
f_conv = []
for t in range(num_iters):
cvxpy.Problem(cvxpy.Minimize(penal_f)).solve()
mu.value = mu.value * 2
res_conv.append(np.linalg.norm(resid.value))
f_conv.append(f.value)
print("Optimal value from method of augmented Lagrangian =", f.value)
In [3]:
plt.figure(figsize=(8, 6))
fontsize=20
plt.plot(f_conv, label="Penalty")
plt.plot(np.ones(num_iters) * cvxpy_f, label="CVXPy")
plt.legend(fontsize=fontsize)
plt.xlabel("Number of iterations, $k$", fontsize=fontsize)
plt.ylabel("Objective, $f(x_k)$", fontsize=fontsize)
plt.xticks(fontsize=fontsize)
_ = plt.yticks(fontsize=fontsize)
In [4]:
plt.figure(figsize=(8, 6))
fontsize=20
plt.semilogy(res_conv)
plt.xlabel("Number of iterations, $k$", fontsize=fontsize)
plt.ylabel("Norm of residuals, $\|Ax_k - b\|_2$", fontsize=fontsize)
plt.xticks(fontsize=fontsize)
_ = plt.yticks(fontsize=fontsize)
Pro
Contra
Motivation: in the penalty method solutions of the subproblems can violate constraints and it is known only that
$$ g_i(x^*_k) \approx \frac{\lambda^*}{\mu_k} \to 0, \quad \mu_k \to \infty $$Is it possible to fix $Q(x, \mu)$ in a way to prevent constraints violation?
Idea: add penalty not to the objective function, but to the Lagrangian. It is analogue of the primal-dual method, because one iteration performs updating both primal and dual variables.
Necessary optimality condition for $M(x_k, \lambda^k, \mu_k)$ $$ f'(x_k) + \sum\limits_{i=1}^m (\lambda^k_i + \mu_k g_i(x_k) ) g'_i(x_k) \approx 0 $$ From this follows expression for $\lambda^{k+1}$ $$ \lambda^{k+1}_i = \lambda^k_i + \mu_k g_i(x_k) $$
where $\lambda$ are vector of dual variables corresponding to equality constraints, $\nu$ are vector of dual variables corresponding to inequality constraints
In [1]:
import cvxpy
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rc("text", usetex=True)
num_iters = 10
mu = 1.0
n = 20
m = 10
A = np.random.randn(m, n)
b = np.random.randn(m, 1)
# Initialize problem
x = cvxpy.Variable(shape=(n, 1))
f = cvxpy.norm(x, 2)
# Solve with CVXPY.
cvxpy.Problem(cvxpy.Minimize(f), [A*x == b]).solve()
print ("Optimal value from CVXPY =", f.value)
In [2]:
# Solve with method of augmented Lagrangian
resid = A*x - b
y = cvxpy.Parameter(shape=(m, 1))
y.value = np.zeros((m, 1))
aug_lagr = f + y.T*resid + (mu/2.)*cvxpy.sum_squares(resid)
for t in range(num_iters):
cvxpy.Problem(cvxpy.Minimize(aug_lagr)).solve()
y.value += mu*resid.value
print("Optimal value from method of augmented Lagrangian =", f.value)
In [3]:
mus = [0.1, 1, 10, 100]
conv_res = {}
conv_obj = {}
for mu in mus:
conv_res[mu] = np.zeros(num_iters)
conv_obj[mu] = np.zeros(num_iters)
x = cvxpy.Variable(shape=(n, 1))
f = cvxpy.norm(x, 2)
resid = A*x - b
y = cvxpy.Parameter(shape=(m, 1))
y.value = np.zeros((m, 1))
aug_lagr = f + y.T*resid + (mu/2.)*cvxpy.sum_squares(resid)
for t in range(num_iters):
cvxpy.Problem(cvxpy.Minimize(aug_lagr)).solve()
y.value += mu*resid.value
conv_res[mu][t] = np.linalg.norm(resid.value)
conv_obj[mu][t] = aug_lagr.value
In [4]:
plt.figure(figsize=(8, 6))
fontsize=20
for mu in mus:
plt.plot(conv_obj[mu], label=r"$\mu = {}$".format(mu))
plt.legend(fontsize=fontsize)
plt.xlabel("Number of iterations, $k$", fontsize=fontsize)
plt.ylabel("Augmented Lagrangian, $M_k$", fontsize=fontsize)
plt.xticks(fontsize=fontsize)
_ = plt.yticks(fontsize=fontsize)
In [5]:
plt.figure(figsize=(8, 6))
fontsize=20
for mu in mus:
plt.semilogy(conv_res[mu], label=r"$\mu = {}$".format(mu))
plt.legend(fontsize=fontsize)
plt.xlabel("Number of iterations, $k$", fontsize=fontsize)
plt.ylabel("Norm of residuals, $\|Ax_k - b\|_2$", fontsize=fontsize)
plt.xticks(fontsize=fontsize)
_ = plt.yticks(fontsize=fontsize)