Optimization Problem Definition

An optimization problem has three main elements:

  • design variables
  • objective
  • constraints

Design variables are the variables that we allow the optimizer to change. We denote these variables as $x$, where $x$ is an array (i.e., there can be many design variables). For example if we were designing the shape of the spoiler on a car, the design variables might be things like the length of the spoiler, the distance away from the car body, and parameters that define its curved profile.

The objective is the metric we want to minimize. We usually denote this function by $J$ or sometimes $f$. It is a scalar, meaning a single value (at times it makes sense to have multiple objectives, but the approach is different—we will explore this later in the semester). By convention we say minimize because we often minimize things in engineering, but in other disciplines, like finance, the default convention may be to maximize. This does not mean that the algorithms we explore can can only perform minimization, to maximize you simply minimize the negative of the function ($-J(x)$). In our car example, the objective of our spoiler design may be to maximize (downward) lift.

Constraints exist in almost any engineering problem, they are conditions that we require the optimizer to satisfy. We denote them with $c$, where $c$ is an array because we may have many constraints. By convention we will define constraints in this form: $c(x) \le 0$. This is completely general as a constraint of the form $q(x) \ge 0$ could be rewritten as $c(x) = -q(x) \le 0$. In our car example the constraints may be things like a constraint on bending stress, manufacturing tolerances, etc. Without these constraints, our spoiler may produce a lot of lift but may snap in half or be too difficult to manufacture.

In short, we can write all of our optimization problems in this form:

\begin{align*} \text{minimize} \quad & J(x) \\ \text{with respect to} \quad & x \\ \text{subject to} \quad & c(x) \le 0 \\ \end{align*}

Example Problem

Let's consider a simple problem called the Rosenbrock problem. It has two design variables: $x$ and $y$, and an objecive function of: $$J(x, y) = (1 - x)^2 + 100(y - x^2)^2$$

It's often used as a starting problem for testing optimization algorithms because it is only 2D (so it is easy to visualize), but it has a long valley that is steep in one direction and shallow in the other making it somewhat challenging to optimize. In fact, let's visualize the function space with a contour plot (which we can do because there are only 2 variables). First let's create the function. Note that x will be passed into this function as a vector. This will be necessary later when using the optimization algorithms.


In [ ]:
function [J] = obj(x)

    J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2; 
    
end

We can also do this as an inline function (called anonymous function in matlab) because it is so simple. Anonymous functions functions can only be one-liners):


In [1]:
obj = @(x) (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;

Now let's create a contour plot for that function.


In [6]:
% --- setup grid ---
nx = 200;  % number of points in x-direction
ny = 150;  % number of points in y-direction
x = linspace(-1, 2, nx);  % nx points equally spaced between -5...5
y = linspace(-1, 2, ny);  % ny points equally spaced between -6...6
[X, Y] = ndgrid(x, y);  % 2D array (matrix) of points across x and y
Z = zeros(nx, ny);  % initialize output of size (nx, ny)

% --- evaluate across grid ---
for i = 1:nx
    for j = 1:ny
        Z(i, j) = obj([X(i, j), Y(i, j)]);
    end
end

% --- contour plot ---
figure();  % start a new figure
contour(X, Y, Z, 100);  % using 100 contour lines.  
colorbar();  % add a colorbar
xlabel('x');  % labels for axes
ylabel('y');


Note the flat U-shaped valley. The minimum is actually at (1, 1). A couple of other notes on contour plots:

  • You can also meshgrid instead of ndgrid. meshgrid works exactly the same, except for the size of the array will be (ny, nx) (backwards from ndgrid) . This is intended to mimic the physical space where the columns go along the x-direction. I prefer ndgrid because I think it’s clearer to use x as the first index and y as the second.
  • contourf is the same as contour, except for it creates filled colorbars.

Using fminunc

Let's find the minimum of the Rosenbrock function. This problem is unconstrained, nonlinear, and differentiable. The corresponding matlab algorithm for that type of problem is fminunc. fminunc has a pretty basic function signature (type 'help fminunc' or 'doc fminunc').


In [ ]:
[xopt, fopt, exitflag, output] = fminunc(@obj, x0, options);

We already created the objective function obj earlier. We now need to provide a starting point (I've chosen (0, 0), arbitrarily).


In [1]:
x0 = [0, 0];

Finally, we can set various options. I've set a few common ones below, but check out the documentation on fminunc for other options that you can set.


In [2]:
options = optimoptions(@fminunc,...
    'display', 'iter-detailed', ...  % display information on each iteration
    'MaxIterations', 1000, ...  % maximum number of iterations
    'MaxFunctionEvaluations', 10000, ...  % maximum number of function calls
    'OptimalityTolerance', 1e-6 ...  % convergence tolerance
);

In [3]:
[xopt, fopt, exitflag, output] = fminunc(@obj, x0, options);


Warning: Gradient must be provided for trust-region algorithm; using quasi-newton algorithm instead.
> In fminunc (line 395)
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
     0           3                1                             2
     1          12         0.771192      0.0817341           5.34  
     2          15         0.610658              1           6.73  
     3          18         0.522451              1           7.11  
     4          24         0.261629         0.7075           1.88  
     5          30         0.248996            0.5           3.44  
     6          33         0.207486              1           2.94  
     7          36         0.125351              1            1.5  
     8          39        0.0893498              1           3.93  
     9          42        0.0308666              1           1.23  
    10          48        0.0200762       0.322023           1.95  
    11          51        0.0138484              1           1.57  
    12          54       0.00441549              1          0.303  
    13          60       0.00268684            0.5           1.14  
    14          63      0.000276522              1           0.28  
    15          66      4.21032e-05              1          0.122  
    16          69      1.37272e-06              1        0.00796  
    17          72      7.48614e-07              1         0.0303  
    18          75      3.34372e-09              1        0.00221  
    19          78      6.71309e-11              1       7.22e-05  
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
    20          81      1.94746e-11              1       1.06e-06  

Optimization completed: The first-order optimality measure, 3.522486e-07, is less 
than options.OptimalityTolerance = 1.000000e-06.

Optimization Metric                                          Options
relative norm(gradient) =   3.52e-07             OptimalityTolerance =   1e-06 (selected)

Problem solved! Always read the output message to make sure it actually solved (as opposed to just reached a max iteration limit, or had a convergence failure, etc.). The relevant message is: "Optimization completed: The first-order optimality measure, 3.522486e-07, is less than options.OptimalityTolerance = 1.000000e-06.". Let's print out the optimal x value, and corresponding function value.


In [4]:
xopt
fopt


xopt =

    1.0000    1.0000


fopt =

   1.9475e-11

We have found the optimal solution: $x^* = (1, 1)$, $f^* = 0$.

Using fmincon

Let's now add a couple of constraints. The first, is that the solution must be within the unit disk. $$ x^2 + y^2 \le 1$$ The second is that $$x + 3y \le 5 $$

We need to rewite the constraints in our standard format: \begin{align} x^2 + y^2 -1 &\le 0 \\ x + 3y -5 &\le 0 \end{align}

This problem is constrained, nonlinear, and differentiable. The matlab algorithm that is equipped to solve that type of problem is fmincon. If we look at the documentation for fmincon we can see that it has a function signature like this:


In [ ]:
[xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);

Take a look at the documentation to see what each input and output means (use doc fmincon or help fmincon). First we need to set our starting point at bounds. We will set our bounds as the (-5, -5) to (5, 5) box. As a starting point let's just pick (0, 0) somewhat arbitrarily.


In [8]:
% -------- starting point and bounds ----------
x0 = [0, 0];
ub = [5, 5];  % upper bound, the largest values we will allow x to acheive
lb = [-5, -5];  % lower bound
% ---------------------------------------------

We have no linear ineqality constraints (A, b) or linear equality constraints (Aeq, beq) so we will set those as empty arrays.


In [7]:
% ------ linear constraints ----------------
A = [];
b = [];
Aeq = [];
beq = [];
% ------------------------------------------

Next let's define our objective and constraints in the function below. Note that we need to use the convention that J will be minimized, and c is of the form $c \le 0$


In [ ]:
% ---- Objective and Constraints -------------
function [J, c, ceq] = objcon(x)

    % objective
    J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;
    
    % inequality constraints
    c = zeros(2, 1);  % there are 2 of them
    c(1) = x(1)^2 + x(2)^2 - 1;
    c(2) = x(1) + 3*x(2) - 5;
    
    % equality constraints
    ceq = [];  % there are none

end
% -------------------------------------------

We can set various options. I've included some of the more common ones we might use below.


In [9]:
options = optimoptions(@fmincon,...
    'Algorithm', 'active-set', ...  % choose one of: 'interior-point', 'sqp', 'active-set', 'trust-region-reflective'
    'AlwaysHonorConstraints', 'bounds', ...  % forces interior point algorithm to always honor bounds
    'display', 'iter-detailed', ...  % display more information
    'MaxIter', 1000, ...  % maximum number of iterations
    'MaxFunEvals', 10000, ...  % maximum number of function calls
    'TolCon', 1e-6, ...  % convergence tolerance on constraints
    'TolFun', 1e-6, ...  % convergence tolerance on function value
    'FinDiffType', 'forward', ...  % if finite differencing, can also use central
    'GradObj', 'off', ...  % supply gradients of objective
    'GradConstr', 'off', ...  % supply gradients of constraints
    'DerivativeCheck', 'off', ...  % on if you want to check your supplied gradients against finite differencing
    'Diagnostics', 'on');  % display diagnotic information

Note that fmincon asks for your objective and constraints separately (whereas we have provided them both in one function). In most engineering analyses the objective and constraints are computed in the same script. As a convenience I've written a template for you that does the separation without requiring any extra function calls. Download the template optimize.m from our course website. If you fill it out with what was filled out above we can now run fmincon.


In [13]:
[xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);


____________________________________________________________
   Diagnostic Information

Number of variables: 2

Functions 
Objective:                            obj
Gradient:                             finite-differencing
Hessian:                              finite-differencing (or Quasi-Newton)
Nonlinear constraints:                con
Nonlinear constraints gradient:       finite-differencing

Constraints
Number of nonlinear inequality constraints: 4
Number of nonlinear equality constraints:   0
 
Number of linear inequality constraints:    0
Number of linear equality constraints:      0
Number of lower bound constraints:          2
Number of upper bound constraints:          2

Algorithm selected
   active-set


____________________________________________________________
   End diagnostic information

                                Max     Line search  Directional  First-order 
 Iter F-count        f(x)   constraint   steplength   derivative   optimality Procedure 
    0      3            1            0                                         
    1      9     0.953127            0        0.125           -2         12.5   
    2     16     0.809721            0       0.0625        -2.38         10.9   
    3     21     0.460457            0         0.25        -12.5         4.38   
    4     24     0.355064            0            1        -3.15        0.823   
    5     27     0.320358            0            1       -0.939         4.78   
    6     30     0.279293            0            1        -5.16         2.43   
    7     33       0.2015            0            1       -0.957        0.951   
    8     36     0.201081            0            1       -0.586         9.03   
    9     39     0.131593            0            1        -2.41        0.716   
   10     42     0.102735            0            1        -0.45        0.483   
   11     46    0.0746674            0          0.5       -0.373         2.46   
   12     50     0.063857            0          0.5       -0.287         2.63   
   13     54    0.0526215            0          0.5       -0.627         1.44   
   14     58    0.0485194            0          0.5       -0.453        0.746   
   15     62    0.0469352            0          0.5       -0.339         0.38   
   16     66    0.0462639            0          0.5       -0.279        0.192   
   17     70     0.045959            0          0.5       -0.248       0.0963  Hessian modified  
   18     74    0.0458143            0          0.5       -0.232       0.0482  Hessian modified  
   19     78    0.0457439            0          0.5       -0.224       0.0241  Hessian modified  
   20     82    0.0457092            0          0.5        -0.22       0.0121  Hessian modified  
   21     86     0.045692            0          0.5       -0.218      0.00604  Hessian modified  
   22     90    0.0456834            0          0.5       -0.217      0.00302  Hessian modified  
   23     94    0.0456791            0          0.5       -0.217      0.00151  Hessian modified  
   24     98    0.0456769            0          0.5       -0.217     0.000755  Hessian modified  
   25    102    0.0456759            0          0.5       -0.217     0.000378  Hessian modified  

Optimization stopped because the predicted change in the objective function,
5.349514e-07, is less than options.FunctionTolerance = 1.000000e-06, and the maximum constraint
violation, 2.436318e-11, is less than options.ConstraintTolerance = 1.000000e-06.

Optimization Metric                                                    Options
abs(steplength*directional derivative) =   5.35e-07          FunctionTolerance =   1e-06 (selected)
max(constraint violation) =   2.44e-11                     ConstraintTolerance =   1e-06 (selected)

Active inequalities (to within options.ConstraintTolerance = 1e-06):
  lower      upper     ineqlin   ineqnonlin
                                     1
                                     3
                                     4

We have met the function tolerance and constraint violation. Problem solved! Let's print out the optimal solution, look at the constraints, and also print out how many iterations this took.


In [17]:
xopt
fopt
con(xopt)
output.iterations


xopt =

    0.7864    0.6177


fopt =

    0.0457


ans =

    0.0000
   -2.3605


ans =

    26

Note that the first constraint is active (right at 0 for $c(x) \le 0$). The second constraint is inactive ($< 0$). In other words, if we removed this second constraint it would not change the solution at all.

Finally, let's create a convergence plot. The metric to track is the first order optimality. We will learn later what this means. We could just copy/paste the column in the table above under first-order optimality and plot this, but we will also explore a programmatic approach.

In the documentation for fmincon look for output functions. An output function is of the form:

function stop = outfun(x, optimValues, state)

It is called every iteration. In output function we will just append the firstorderopt to the end of an array.


In [ ]:
function stop = outfun(x, optimValues, state)

    firstorderopt = [firstorderopt, optimValues.firstorderopt];
    stop = false;

end

In the outer scope we need to initilize the array. Also we need to set the options so that the optimizer knows there is an output function.


In [ ]:
firstorderopt = [];
options = optimoptions(options, 'OutputFcn', @outfun);

The full script is shown below:


In [ ]:
function [xopt, fopt, exitflag, output, firstorderopt] = optimize()

    % -------- starting point and bounds ----------
    x0 = [0, 0];
    ub = [5, 5];  % upper bound, the largest values we will allow x to acheive
    lb = [-5, -5];  % lower bound
    % ---------------------------------------------

    % ------ linear constraints ----------------
    A = [];
    b = [];
    Aeq = [];
    beq = [];
    % ------------------------------------------

    % ---- Objective and Constraints -------------
    function [J, c, ceq] = objcon(x)

        % set objective/constraints here
        % objective
        J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;

        % inequality constraints
        c = zeros(2, 1);  % there are 2 of them
        c(1) = x(1)^2 + x(2)^2 - 1;
        c(2) = x(1) + 3*x(2) - 5;

        % equality constraints
        ceq = [];  % there are none

    end
    % -------------------------------------------

    % ----------- options ----------------------------
    options = optimoptions(@fmincon, ...
        'Algorithm', 'active-set', ...  % choose one of: 'interior-point', 'sqp', 'active-set', 'trust-region-reflective'
        'AlwaysHonorConstraints', 'bounds', ...  % forces interior point algorithm to always honor bounds
        'display', 'iter-detailed', ...  % display more information
        'MaxIter', 1000, ...  % maximum number of iterations
        'MaxFunEvals', 10000, ...  % maximum number of function calls
        'TolCon', 1e-6, ...  % convergence tolerance on constraints
        'TolFun', 1e-6, ...  % convergence tolerance on function value
        'FinDiffType', 'forward', ...  % if finite differencing, can also use central
        'GradObj', 'off', ...  % supply gradients of objective
        'GradConstr', 'off', ...  % supply gradients of constraints
        'DerivativeCheck', 'off', ...  % on if you want to check your supplied gradients against finite differencing
        'Diagnostics', 'on');  % display diagnotic information
    % -------------------------------------------

    firstorderopt = [];
    options = optimoptions(options, 'OutputFcn', @outfun);
    
    function stop = outfun(x, optimValues, state)

        firstorderopt = [firstorderopt, optimValues.firstorderopt];
        stop = false;
        
    end

    % -- NOTE: no need to change anything below) --

    % ------- shared variables -----------
    xlast = [];
    Jlast = [];  % last objective
    clast = []; % last nonlinear inequality constraint
    ceqlast = []; % last nonlinear equality constraint
    % --------------------------------------


    % ------ separate obj/con  --------
    function [J] = obj(x)

        % check if computation is necessary
        if ~isequal(x, xlast)
            [Jlast, clast, ceqlast] = objcon(x);
            xlast = x;
        end

        J = Jlast;
    end

    function [c, ceq] = con(x)

        % check if computation is necessary
        if ~isequal(x, xlast)
            [Jlast, clast, ceqlast] = objcon(x);
            xlast = x;
        end

        % set constraints
        c = clast;
        ceq = ceqlast;
    end
    % ------------------------------------------------

    % call fmincon
    [xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);
end

We now call the script and plot the firstorderopt as a function of iteration on a semilog plot.


In [1]:
[xopt, fopt, exitflag, output, firstorderopt] = optimize();

figure;
n = length(firstorderopt);
semilogy(1:n, firstorderopt);
xlabel('iteration');
ylabel('first order optimality')


____________________________________________________________
   Diagnostic Information

Number of variables: 2

Functions 
Objective:                            optimize/obj
Gradient:                             finite-differencing
Hessian:                              finite-differencing (or Quasi-Newton)
Nonlinear constraints:                optimize/con
Nonlinear constraints gradient:       finite-differencing

Constraints
Number of nonlinear inequality constraints: 2
Number of nonlinear equality constraints:   0
 
Number of linear inequality constraints:    0
Number of linear equality constraints:      0
Number of lower bound constraints:          2
Number of upper bound constraints:          2

Algorithm selected
   active-set


____________________________________________________________
   End diagnostic information

                                Max     Line search  Directional  First-order 
 Iter F-count        f(x)   constraint   steplength   derivative   optimality Procedure 
    0      3            1           -1                                         
    1      9     0.953127      -0.9375        0.125           -2         12.5   
    2     16     0.809721      -0.8694       0.0625        -2.38         10.9   
    3     21     0.460457      -0.8504         0.25        -12.5         4.38   
    4     24     0.355064      -0.8103            1        -3.15        0.823   
    5     27     0.320358      -0.7176            1       -0.939         4.78   
    6     30     0.279293      -0.7141            1        -5.16         2.43   
    7     33       0.2015      -0.6024            1       -0.957        0.951   
    8     36     0.201081      -0.2823            1       -0.586         9.03   
    9     39     0.131593      -0.4291            1        -2.41        0.716   
   10     42     0.102735       -0.325            1        -0.45        0.483   
   11     46    0.0746674      -0.1516          0.5       -0.373         2.46   
   12     50     0.063857     -0.07389          0.5       -0.287         2.63   
   13     54    0.0526215     -0.03649          0.5       -0.627         1.44   
   14     58    0.0485194     -0.01813          0.5       -0.453        0.746   
   15     62    0.0469352    -0.009041          0.5       -0.339         0.38   
   16     66    0.0462639    -0.004514          0.5       -0.279        0.192   
   17     70     0.045959    -0.002255          0.5       -0.248       0.0963  Hessian modified  
   18     74    0.0458143    -0.001127          0.5       -0.232       0.0482  Hessian modified  
   19     78    0.0457439   -0.0005635          0.5       -0.224       0.0241  Hessian modified  
   20     82    0.0457092   -0.0002817          0.5        -0.22       0.0121  Hessian modified  
   21     86     0.045692   -0.0001409          0.5       -0.218      0.00604  Hessian modified  
   22     90    0.0456834   -7.043e-05          0.5       -0.217      0.00302  Hessian modified  
   23     94    0.0456791   -3.521e-05          0.5       -0.217      0.00151  Hessian modified  
   24     98    0.0456769   -1.761e-05          0.5       -0.217     0.000755  Hessian modified  
   25    102    0.0456759   -8.804e-06          0.5       -0.217     0.000378  Hessian modified  

Optimization stopped because the predicted change in the objective function,
5.349514e-07, is less than options.FunctionTolerance = 1.000000e-06, and the maximum constraint
violation, 2.436318e-11, is less than options.ConstraintTolerance = 1.000000e-06.

Optimization Metric                                                    Options
abs(steplength*directional derivative) =   5.35e-07          FunctionTolerance =   1e-06 (selected)
max(constraint violation) =   2.44e-11                     ConstraintTolerance =   1e-06 (selected)

Active inequalities (to within options.ConstraintTolerance = 1e-06):
  lower      upper     ineqlin   ineqnonlin
                                     1