An optimization problem has three main elements:

- design variables
- objective
- constraints

**Design variables** are the variables that we allow the optimizer to change. We denote these variables as $x$, where $x$ is an array (i.e., there can be many design variables). For example if we were designing the shape of the spoiler on a car, the design variables might be things like the length of the spoiler, the distance away from the car body, and parameters that define its curved profile.

The **objective** is the metric we want to minimize. We usually denote this function by $J$ or sometimes $f$. It is a scalar, meaning a single value (at times it makes sense to have multiple objectives, but the approach is different—we will explore this later in the semester). By convention we say minimize because we often minimize things in engineering, but in other disciplines, like finance, the default convention may be to maximize. This does not mean that the algorithms we explore can can only perform minimization, to maximize you simply minimize the negative of the function ($-J(x)$). In our car example, the objective of our spoiler design may be to maximize (downward) lift.

**Constraints** exist in almost any engineering problem, they are conditions that we require the optimizer to satisfy. We denote them with $c$, where $c$ is an array because we may have many constraints. By convention we will define constraints in this form: $c(x) \le 0$. This is completely general as a constraint of the form $q(x) \ge 0$ could be rewritten as $c(x) = -q(x) \le 0$. In our car example the constraints may be things like a constraint on bending stress, manufacturing tolerances, etc. Without these constraints, our spoiler may produce a lot of lift but may snap in half or be too difficult to manufacture.

In short, we can write all of our optimization problems in this form:

\begin{align*} \text{minimize} \quad & J(x) \\ \text{with respect to} \quad & x \\ \text{subject to} \quad & c(x) \le 0 \\ \end{align*}Let's consider a simple problem called the Rosenbrock problem. It has two design variables: $x$ and $y$, and an objecive function of: $$J(x, y) = (1 - x)^2 + 100(y - x^2)^2$$

It's often used as a starting problem for testing optimization algorithms because it is only 2D (so it is easy to visualize), but it has a long valley that is steep in one direction and shallow in the other making it somewhat challenging to optimize. In fact, let's visualize the function space with a contour plot (which we can do because there are only 2 variables). First let's create the function. Note that x will be passed into this function as a vector. This will be necessary later when using the optimization algorithms.

```
In [ ]:
```function [J] = obj(x)
J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;
end

```
In [1]:
```obj = @(x) (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;

Now let's create a contour plot for that function.

```
In [6]:
```% --- setup grid ---
nx = 200; % number of points in x-direction
ny = 150; % number of points in y-direction
x = linspace(-1, 2, nx); % nx points equally spaced between -5...5
y = linspace(-1, 2, ny); % ny points equally spaced between -6...6
[X, Y] = ndgrid(x, y); % 2D array (matrix) of points across x and y
Z = zeros(nx, ny); % initialize output of size (nx, ny)
% --- evaluate across grid ---
for i = 1:nx
for j = 1:ny
Z(i, j) = obj([X(i, j), Y(i, j)]);
end
end
% --- contour plot ---
figure(); % start a new figure
contour(X, Y, Z, 100); % using 100 contour lines.
colorbar(); % add a colorbar
xlabel('x'); % labels for axes
ylabel('y');

```
```

Note the flat U-shaped valley. The minimum is actually at (1, 1). A couple of other notes on contour plots:

- You can also meshgrid instead of ndgrid. meshgrid works exactly the same, except for the size of the array will be (ny, nx) (backwards from ndgrid) . This is intended to mimic the physical space where the columns go along the x-direction. I prefer ndgrid because I think it’s clearer to use x as the first index and y as the second.
- contourf is the same as contour, except for it creates filled colorbars.

```
In [ ]:
```[xopt, fopt, exitflag, output] = fminunc(@obj, x0, options);

```
In [1]:
```x0 = [0, 0];

```
In [2]:
```options = optimoptions(@fminunc,...
'display', 'iter-detailed', ... % display information on each iteration
'MaxIterations', 1000, ... % maximum number of iterations
'MaxFunctionEvaluations', 10000, ... % maximum number of function calls
'OptimalityTolerance', 1e-6 ... % convergence tolerance
);

```
In [3]:
```[xopt, fopt, exitflag, output] = fminunc(@obj, x0, options);

```
```

```
In [4]:
```xopt
fopt

```
```

We have found the optimal solution: $x^* = (1, 1)$, $f^* = 0$.

Let's now add a couple of constraints. The first, is that the solution must be within the unit disk. $$ x^2 + y^2 \le 1$$ The second is that $$x + 3y \le 5 $$

We need to rewite the constraints in our standard format: \begin{align} x^2 + y^2 -1 &\le 0 \\ x + 3y -5 &\le 0 \end{align}

This problem is constrained, nonlinear, and differentiable. The matlab algorithm that is equipped to solve that type of problem is fmincon. If we look at the documentation for fmincon we can see that it has a function signature like this:

```
In [ ]:
```[xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);

```
In [8]:
```% -------- starting point and bounds ----------
x0 = [0, 0];
ub = [5, 5]; % upper bound, the largest values we will allow x to acheive
lb = [-5, -5]; % lower bound
% ---------------------------------------------

```
In [7]:
```% ------ linear constraints ----------------
A = [];
b = [];
Aeq = [];
beq = [];
% ------------------------------------------

```
In [ ]:
```% ---- Objective and Constraints -------------
function [J, c, ceq] = objcon(x)
% objective
J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;
% inequality constraints
c = zeros(2, 1); % there are 2 of them
c(1) = x(1)^2 + x(2)^2 - 1;
c(2) = x(1) + 3*x(2) - 5;
% equality constraints
ceq = []; % there are none
end
% -------------------------------------------

We can set various options. I've included some of the more common ones we might use below.

```
In [9]:
```options = optimoptions(@fmincon,...
'Algorithm', 'active-set', ... % choose one of: 'interior-point', 'sqp', 'active-set', 'trust-region-reflective'
'AlwaysHonorConstraints', 'bounds', ... % forces interior point algorithm to always honor bounds
'display', 'iter-detailed', ... % display more information
'MaxIter', 1000, ... % maximum number of iterations
'MaxFunEvals', 10000, ... % maximum number of function calls
'TolCon', 1e-6, ... % convergence tolerance on constraints
'TolFun', 1e-6, ... % convergence tolerance on function value
'FinDiffType', 'forward', ... % if finite differencing, can also use central
'GradObj', 'off', ... % supply gradients of objective
'GradConstr', 'off', ... % supply gradients of constraints
'DerivativeCheck', 'off', ... % on if you want to check your supplied gradients against finite differencing
'Diagnostics', 'on'); % display diagnotic information

```
In [13]:
```[xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);

```
```

```
In [17]:
```xopt
fopt
con(xopt)
output.iterations

```
```

Note that the first constraint is active (right at 0 for $c(x) \le 0$). The second constraint is inactive ($< 0$). In other words, if we removed this second constraint it would not change the solution at all.

Finally, let's create a convergence plot. The metric to track is the first order optimality. We will learn later what this means. We could just copy/paste the column in the table above under first-order optimality and plot this, but we will also explore a programmatic approach.

In the documentation for fmincon look for output functions. An output function is of the form:

function stop = outfun(x, optimValues, state)

It is called every iteration. In output function we will just append the firstorderopt to the end of an array.

```
In [ ]:
```function stop = outfun(x, optimValues, state)
firstorderopt = [firstorderopt, optimValues.firstorderopt];
stop = false;
end

```
In [ ]:
```firstorderopt = [];
options = optimoptions(options, 'OutputFcn', @outfun);

The full script is shown below:

```
In [ ]:
```function [xopt, fopt, exitflag, output, firstorderopt] = optimize()
% -------- starting point and bounds ----------
x0 = [0, 0];
ub = [5, 5]; % upper bound, the largest values we will allow x to acheive
lb = [-5, -5]; % lower bound
% ---------------------------------------------
% ------ linear constraints ----------------
A = [];
b = [];
Aeq = [];
beq = [];
% ------------------------------------------
% ---- Objective and Constraints -------------
function [J, c, ceq] = objcon(x)
% set objective/constraints here
% objective
J = (1 - x(1))^2 + 100*(x(2) - x(1)^2)^2;
% inequality constraints
c = zeros(2, 1); % there are 2 of them
c(1) = x(1)^2 + x(2)^2 - 1;
c(2) = x(1) + 3*x(2) - 5;
% equality constraints
ceq = []; % there are none
end
% -------------------------------------------
% ----------- options ----------------------------
options = optimoptions(@fmincon, ...
'Algorithm', 'active-set', ... % choose one of: 'interior-point', 'sqp', 'active-set', 'trust-region-reflective'
'AlwaysHonorConstraints', 'bounds', ... % forces interior point algorithm to always honor bounds
'display', 'iter-detailed', ... % display more information
'MaxIter', 1000, ... % maximum number of iterations
'MaxFunEvals', 10000, ... % maximum number of function calls
'TolCon', 1e-6, ... % convergence tolerance on constraints
'TolFun', 1e-6, ... % convergence tolerance on function value
'FinDiffType', 'forward', ... % if finite differencing, can also use central
'GradObj', 'off', ... % supply gradients of objective
'GradConstr', 'off', ... % supply gradients of constraints
'DerivativeCheck', 'off', ... % on if you want to check your supplied gradients against finite differencing
'Diagnostics', 'on'); % display diagnotic information
% -------------------------------------------
firstorderopt = [];
options = optimoptions(options, 'OutputFcn', @outfun);
function stop = outfun(x, optimValues, state)
firstorderopt = [firstorderopt, optimValues.firstorderopt];
stop = false;
end
% -- NOTE: no need to change anything below) --
% ------- shared variables -----------
xlast = [];
Jlast = []; % last objective
clast = []; % last nonlinear inequality constraint
ceqlast = []; % last nonlinear equality constraint
% --------------------------------------
% ------ separate obj/con --------
function [J] = obj(x)
% check if computation is necessary
if ~isequal(x, xlast)
[Jlast, clast, ceqlast] = objcon(x);
xlast = x;
end
J = Jlast;
end
function [c, ceq] = con(x)
% check if computation is necessary
if ~isequal(x, xlast)
[Jlast, clast, ceqlast] = objcon(x);
xlast = x;
end
% set constraints
c = clast;
ceq = ceqlast;
end
% ------------------------------------------------
% call fmincon
[xopt, fopt, exitflag, output] = fmincon(@obj, x0, A, b, Aeq, beq, lb, ub, @con, options);
end

We now call the script and plot the firstorderopt as a function of iteration on a semilog plot.

```
In [1]:
```[xopt, fopt, exitflag, output, firstorderopt] = optimize();
figure;
n = length(firstorderopt);
semilogy(1:n, firstorderopt);
xlabel('iteration');
ylabel('first order optimality')

```
```