In [57]:
include("../optimizers/linesearch.jl")
include("../utils/functions.jl")
using Gadfly

Constrained optimization problem with linear constraints

The goal of this notebook is to show how to convert a constrained optimization problem with linear constraints to an unconstrained optimization problem.

The problem of interest is $$ \min_x f(x) \text{ subject to } Ax = b,$$ where $A \in \mathbb{R}^{m \times n}$ is the matrix of constraints, $x \in \mathbb{R}^n$ is a vector, and $b \in \mathbb{R}^m$ is a known vector.

For any solution $x_0$ of $Ax = b$, we can write any other solution $x$ in terms of this $x = x_0 + v$, where $v \in \text{ker}(A)$. Thus, this constrained minimization problem reduces to $$\min_{v \in ker(A)} f(x_0 + v),$$ which is an unconstrained optimization problem.

Then, we can use our favorite unconstrained problem to solve this, e.g., Newton line search.


In [109]:
function optimize_linear_constraints(f, A, f_g, f_h, yinit=0)
    """
    Convert constrained optimization problem min f(x) s.t. Ax = b 
        to unconstrained problem min g(y) = f(B*y + x0), where x0 is a solution to Ax = b
        and B is a basis for the kernel of A.
    
    Defaults to using Newton's line search algorithm.
    
    f: constrained objective
    A: constraint matrix
    f_g: gradient of f
    f_h: hessian of f
    yinit: init y with a vector of all yinit values
    """
    
    m, n = size(A)
    b = zeros(m)
    x0 = A \ b
    B = nullspace(A)
    
    y0 = ones(size(B)[2]) .* yinit
    
    g(y) = f(B*y + x0)
    g_g(y) = B' * f_g(B*y + x0)
    g_h(y) = B' * f_h(B*y + x0) * B
    
    yvals = line_search(g, y0, g_g, g_h, "newton", 5000)
    xvals = [B*y + x0 for y in yvals]
    lambdas = [A' \ f_g(x) for x in xvals]
    
    return yvals, xvals, lambdas
end


Out[109]:
optimize_linear_constraints (generic function with 2 methods)

Example

Suppose we have the problem $$ \min f_N(x) \text{ subject to } \sum_{i=1}^N x_i = 0 \text{ and } \sum_{i=1}^N (-1)^i x_i = 0,$$ where $f_N$ is the cute function: $$ f(x) = \sum_{i=1}^{n-4} (-4x_i+3)^2 + (x_i^2 + 2x_{i+1}^2 + 3x_{i+2}^2 + 4 x_{i+3}^2 + 5x_n^2)^2 $$

Suppose $N$ is even. We can express these constraints using the following matrix \begin{align} A = \begin{bmatrix} 1 & 1 & \ldots 1 \\ -1 & 1 & \ldots 1, \end{bmatrix} \end{align} where the number of columns is $n$, the dimension of $x$.


In [110]:
function generate_constraint_matrix(n)
    A = ones(2,n)
    for i in 1:n
        if i % 2 == 1
            A[2,i] = -1
        end
    end
    return A
end


Out[110]:
generate_constraint_matrix (generic function with 1 method)

In [141]:
A = generate_constraint_matrix(30)
b = zeros(2)
x0 = A \ b; B = nullspace(A)
f = cute
f_g = cute_g
f_h = cute_h
g(y) = f(B*y + x0)
g_g(y) = B' * f_g(B*y + x0)
g_h(y) = B' * f_h(B*y + x0) * B


Out[141]:
g_h (generic function with 1 method)

In [142]:
yvals, xvals, lambdas = optimize_linear_constraints(f, A, f_g, f_h, 0);


Using method newton
Number of indefinite fixes 1
Number of iterations 7

In [143]:
niters = length(xvals)
fx = [f(x) for x in xvals]
Gadfly.plot(x=1:niters, y=fx, Geom.line, 
    Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
    Scale.x_log10, Scale.y_log10)


Out[143]:
iteration 10-1.25 10-1.00 10-0.75 10-0.50 10-0.25 100.00 100.25 100.50 100.75 101.00 101.25 101.50 101.75 102.00 102.25 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.24 102.25 102.26 102.27 102.28 102.29 102.30 102.31 102.32 102.33 102.34 102.35 102.36 102.37 102.38 102.39 102.40 102.41 102.42 102.43 102.44 102.248 102.250 102.252 102.254 102.256 102.258 102.260 102.262 102.264 102.266 102.268 102.270 102.272 102.274 102.276 102.278 102.280 102.282 102.284 102.286 102.288 102.290 102.292 102.294 102.296 102.298 102.300 102.302 102.304 102.306 102.308 102.310 102.312 102.314 102.316 102.318 102.320 102.322 102.324 102.326 102.328 102.330 102.332 102.334 102.336 102.338 102.340 102.342 102.344 102.346 102.348 102.350 102.352 102.354 102.356 102.358 102.360 102.362 102.364 102.366 102.368 102.370 102.372 102.374 102.376 102.378 102.380 102.382 102.384 102.386 102.388 102.390 102.392 102.394 102.396 102.398 102.400 102.402 102.404 102.406 102.408 102.410 102.412 102.414 102.416 102.418 102.420 102.422 102.424 102.426 102.428 102.430 102.25 102.30 102.35 102.40 102.45 102.245 102.250 102.255 102.260 102.265 102.270 102.275 102.280 102.285 102.290 102.295 102.300 102.305 102.310 102.315 102.320 102.325 102.330 102.335 102.340 102.345 102.350 102.355 102.360 102.365 102.370 102.375 102.380 102.385 102.390 102.395 102.400 102.405 102.410 102.415 102.420 102.425 102.430 f(x) Value of function

In [144]:
grads = [norm(g_g(y),2) for y in yvals]
ratios = grads[2:niters,:]./grads[1:niters-1,:]
Gadfly.plot(x=1:niters-1, y=ratios, Geom.line, 
Guide.xlabel("iteration"), Guide.ylabel("f_g(x)"), Guide.title("Gradient norm ratios"),
    Scale.x_log10, Scale.y_log10)


Out[144]:
iteration 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 10-9.0 10-8.8 10-8.6 10-8.4 10-8.2 10-8.0 10-7.8 10-7.6 10-7.4 10-7.2 10-7.0 10-6.8 10-6.6 10-6.4 10-6.2 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 10-10 10-5 100 105 1010 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 f_g(x) Gradient norm ratios

In [145]:
Gadfly.plot(x=1:niters, y=grads, Geom.line, 
Guide.xlabel("iteration"), Guide.ylabel("f_g(x)"), Guide.title("Gradient norm"),
    Scale.x_log10, Scale.y_log10)


Out[145]:
iteration 10-1.25 10-1.00 10-0.75 10-0.50 10-0.25 100.00 100.25 100.50 100.75 101.00 101.25 101.50 101.75 102.00 102.25 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 10-30 10-25 10-20 10-15 10-10 10-5 100 105 1010 1015 1020 1025 10-25.0 10-24.5 10-24.0 10-23.5 10-23.0 10-22.5 10-22.0 10-21.5 10-21.0 10-20.5 10-20.0 10-19.5 10-19.0 10-18.5 10-18.0 10-17.5 10-17.0 10-16.5 10-16.0 10-15.5 10-15.0 10-14.5 10-14.0 10-13.5 10-13.0 10-12.5 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 1012.5 1013.0 1013.5 1014.0 1014.5 1015.0 1015.5 1016.0 1016.5 1017.0 1017.5 1018.0 1018.5 1019.0 1019.5 1020.0 10-40 10-20 100 1020 10-25 10-24 10-23 10-22 10-21 10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 f_g(x) Gradient norm

In [108]:
# Lagrange multipliers
lambdas[end]


Out[108]:
2-element Array{Float64,1}:
 -21.6941  
  -0.576006

Now suppose n=100


In [165]:
A = generate_constraint_matrix(100)
x0 = A \ b; B = nullspace(A)
f = cute
f_g = cute_g
f_h = cute_h
g(y) = f(B*y + x0)
g_g(y) = B' * f_g(B*y + x0)
g_h(y) = B' * f_h(B*y + x0) * B

yvals, xvals, lambdas = optimize_linear_constraints(f, A, f_g, f_h, 0);
# Lagrange multipliers
println(lambdas[end])
# Final value
println(f(xvals[end]))


Using method newton
Number of indefinite fixes 1

In [166]:
niters = length(xvals)
fx = [f(x) for x in xvals]
#gy = [g(y) for y in yvals]

Gadfly.plot(x=1:niters, y=fx, Geom.line, 
    Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
    Scale.x_log10, Scale.y_log10)


Out[166]:
iteration 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.895 102.900 102.905 102.910 102.915 102.920 102.925 102.930 102.935 102.940 102.945 102.950 102.955 102.960 102.965 102.899 102.900 102.901 102.902 102.903 102.904 102.905 102.906 102.907 102.908 102.909 102.910 102.911 102.912 102.913 102.914 102.915 102.916 102.917 102.918 102.919 102.920 102.921 102.922 102.923 102.924 102.925 102.926 102.927 102.928 102.929 102.930 102.931 102.932 102.933 102.934 102.935 102.936 102.937 102.938 102.939 102.940 102.941 102.942 102.943 102.944 102.945 102.946 102.947 102.948 102.949 102.950 102.951 102.952 102.953 102.954 102.955 102.956 102.957 102.958 102.959 102.960 102.88 102.90 102.92 102.94 102.96 102.898 102.900 102.902 102.904 102.906 102.908 102.910 102.912 102.914 102.916 102.918 102.920 102.922 102.924 102.926 102.928 102.930 102.932 102.934 102.936 102.938 102.940 102.942 102.944 102.946 102.948 102.950 102.952 102.954 102.956 102.958 102.960 f(x) Value of function

In [167]:
grads = [norm(g_g(y),2) for y in yvals]
ratios = grads[2:niters,:]./grads[1:niters-1,:]
Gadfly.plot(x=1:niters-1, y=ratios, Geom.line, 
Guide.xlabel("iteration"), Guide.ylabel("f_g(x)"), Guide.title("Gradient norm ratios"),
    Scale.x_log10, Scale.y_log10)


Number of iterations 6
[-23.421506254823036,-0.17473514560678483]
835.4634756579534
Out[167]:
iteration 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 10-1 100 101 102 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 10-7.0 10-6.8 10-6.6 10-6.4 10-6.2 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 10-10 10-5 100 105 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 f_g(x) Gradient norm ratios

In [168]:
Gadfly.plot(x=1:niters, y=grads, Geom.line, 
Guide.xlabel("iteration"), Guide.ylabel("f_g(x)"), Guide.title("Gradient norm"),
    Scale.x_log10, Scale.y_log10)


Out[168]:
iteration 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 10-20 10-10 100 1010 1020 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 f_g(x) Gradient norm

In [ ]: