I love convex optimization. You tell "qp" to minimize a (non-negative definite) quadratic form over a polyhedron of fiesable points and it does it every time.
This notebook implements a BFGS method using the CVXOPT package with their quadratic program solver.
The QP problem is specified as follows: \begin{align} \mathrm{minimize}\;&\; \frac{1}{2} x^{T} P x + q^{T} x\\ \mathrm{subject\; to}\;&\; G x \leq h\\ \;&\; A x = b \end{align}
The implementation of BFGS here is based on setting up a QP to solve sub-problem of finding a step that will decrease the value of a non-linear cost function (here the rosenbrock function). The Taylor approximation is written as follows: \begin{align} f(x + s) \approx f(x) + \nabla f(x)^{T}\, s + \frac{1}{2} s^{T} \,\nabla\nabla f(x)\, s \end{align}
To use the Taylor approxiamtion formula I need the gradient and hessian. The gradient can be computed, but the hessian is too expensive. Insead BFGS uses an approximate hessian derived from previous calculations of the step, s, and the gradient, $\nabla f(x)$.
In this method I start with assuming that the Hessian is the identity matrix scaled by a large factor $\gamma=1000$. This factor puts restrictions on the size of the initial steps.
\begin{align}
H_{0} = \gamma I
\end{align}
Assuming we have the first step, s{0}, already the Hessian will be updated as follows. \begin{align} y0 &= \nabla f(x{0} + s{0}) - \nabla f(x{0})\ h{01} &= \frac{H{0} s{0}}{\sqrt{s{0}^{T}H{0}s{0}}}\ h{02} &= \frac{y{0}}{\sqrt{y{0}^{T} s{0}}}\ H{1} &= H{0} - h{01}^{T}\,h{01} + h{02}^{T}\,h{02} \end{align} One nessesary condition is positivity of the dot product $y^{T}s>0$. Without this condition we cannot gaurentee that the resulting matrix will be positive definite.
Using the above rule we are gauranteed to generate a positive definite $H$, which we can substitute into Taylor's formula. This gives us a quadratic form to minimize. \begin{align} \Psi_{k}(s) = \nabla f(x_{k})^{T} s_{k} + \frac{1}{2} s_{k}^{T}H_{k}s_{k}. \end{align} Comparing this formula with the notation used by the cvxopt solver, we can see that the matrix $P=H_{k}$ and the vector, $q = \nabla f(x_{k})$. I add positivity constraints on the updated parameter, $x_{k+1} = x_{k} + s_{k}$, through the inequality constraints with $G=-I$ and $h=x_{k}$.
The proceedure assembles the approximate Hessian matrix, which is a dense matrix. A better option would be to represent H in some sparse way. I think this can be done by storing the vectors $h_{k1}$ and $h_{k2}$. I'll need to add unknowns to the problem.
$z_{k, 1}=$using the equality constraints.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
from scipy.optimize import minimize, rosen, rosen_der
from cvxopt import blas, lapack, matrix, solvers
solvers.options['show_progress'] = False
EPS = sp.finfo(float).eps
print("machine epsilon: {0}".format(EPS))
In [2]:
x0 = np.array([1.5, 1.5])#, 0.8, 1.9, 1.2])
y0 = rosen(x0)
dy0 = rosen_der(x0)
res = minimize(rosen, x0, method='nelder-mead',
options={'xtol': 1e-8, 'disp': True})
print(res.keys())
print('\nfun = {0}'.format(res['fun']))
print('\nxmin = {0}'.format(res['x']))
In [3]:
def my_descent(x0, max_steps=100, alpha=.0001, f=rosen, df=rosen_der):
"Steepest decent algorithm. "
alpha = .0001
#define old parameters
xnew = x0
fnew = f(x0)
dfnew = df(x0)
xvec = sp.zeros((max_steps+1, x0.size))
xvec[0] = x0
for i in range(max_steps):
xold = xnew
fold = fnew
dfold = dfnew
xnew = xold - alpha * dfold
fnew = f(xnew)
dfnew = df(xnew)
# Store the steps
xvec[i+1] = xold
return xvec
def my_qp(P, q, G, h, full_output=False):
"Interface to cvxopt.solvers.qp with python."
_P = matrix(P)
_q = matrix(q)
_G = matrix(G)
_h = matrix(h)
sol = solvers.qp(_P, _q, _G, _h, abserr=1e-9)
out_x = (sol['x'])
if full_output:
return sol
else:
return sp.array(out_x).reshape(-1)
def update_hessian(H, s, y):
"Return an updated hessian approximation"
R1 = sp.outer(sp.dot(H, s), sp.dot(s, H)) / sp.dot(s, sp.dot(H, s))
R2 = sp.outer(y, y) / sp.dot(y, s)
return H - R1 + R2
def my_bfgs(x0, f=rosen, df=rosen_der, max_steps=20, gamma=1000, abstol=1e-9):
"My coding of a BFGS method."
# Define the starting quadratic problem
P0 = gamma * sp.eye(x0.size)
q0 = df(x0)
G = -sp.eye(x0.size)
h0 = x0
s0 = my_qp(P0, q0, G, h0)
x1 = x0 + s0
q1 = df(x1)
y0 = q1-q0
P1 = update_hessian(P0, s0, y0)
# Loop for several more local problems
xvec = sp.zeros((max_steps+2, x0.size))
xvec[(0, 1), :] = x0, x1
for i in range(max_steps):
x = x1
P = P1
q = q1
h = x1
s = my_qp(P, q, G, h)
x1 = x + s
q1 = df(x1)
y = q1 - q
P1 = update_hessian(P, s, y)
xvec[i+2] = x1
diff = f(x1) - f(x)
if abs(diff) <= abstol:
xvec[i+3:] = x1
break
return xvec
## Solve the problem
xvec_descent = my_descent(x0, max_steps=30)
xvec_bfgs = my_bfgs(x0, max_steps=30)
xtrue_descent = sp.ones_like(xvec_descent)
xtrue_bfgs = sp.ones_like(xvec_bfgs)
err_descent = abs(xvec_descent - xtrue_descent).sum(1)
err_bfgs = abs(xvec_bfgs - xtrue_bfgs).sum(1)
In [4]:
x = sp.linspace(.25, 2.75)
y = sp.linspace(.25, 2.75)
xx, yy = sp.meshgrid(x, y, indexing='ij')
vals = rosen(sp.array([xx, yy]))
levels = sp.linspace(0, 100, 20)
fig = plt.figure(0, (10, 8))
ax0 = fig.add_subplot(2,1,1)
ax1 = fig.add_subplot(2,1,2)
ax0.contourf(xx, yy, vals, levels, cmap='gray')
ax0.plot(xvec_descent[:,0], xvec_descent[:,1], c='magenta', marker='.')
ax0.plot(xvec_bfgs[::1,0], xvec_bfgs[::1,1], c='cyan', marker='.')
ax1.semilogy(err_descent, color='magenta', linewidth=2.0)
ax1.semilogy(err_bfgs, color='cyan', linewidth=2.0)
Out[4]:
In [5]:
xvec_bfgs
Out[5]:
In [ ]:
In [6]:
## Try it out on a large problem
nx = 1000
x0 = .75 + .5 * sp.rand(nx)
xvec_bfgs = my_bfgs(x0, gamma=1000, max_steps=50)
xvec_descent = my_descent(x0, alpha=1.0/1000, max_steps=50)
xtrue = sp.ones(nx)
In [7]:
err_bfgs = abs(xvec_bfgs - xtrue).sum(1)
err_descent = abs(xvec_descent - xtrue).sum(1)
fvals_bfgs = sp.array([rosen(_x) for _x in xvec_bfgs])
fvals_descent = sp.array([rosen(_x) for _x in xvec_descent])
fig = plt.figure(0, (8, 8))
ax0 = fig.add_subplot(2,1,1)
ax1 = fig.add_subplot(2,1,2)
ax0.semilogy(err_bfgs, '-b', linewidth=3, label='bfgs')
ax0.semilogy(err_descent, '-r', linewidth=3, label='descent')
ax0.set_title('Error in x. Problem size, N = {0}'.format(nx))
ax0.legend(loc='best')
ax0.set_xbound(0, xvec_bfgs.shape[0])
ax1.semilogy(fvals_bfgs, '-b', linewidth=3, label='bfgs')
ax1.semilogy(fvals_descent, '-r', linewidth=3, label='descent')
ax1.set_title('Cost function value. Problem size, N = {0}'.format(nx))
ax1.legend(loc='best')
ax1.set_xbound(0, xvec_bfgs.shape[0])
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: