In this problem we want to find the three-dimensional shape with the lowest supersonic wave drag for a fixed length and fixed maximum diameter. This problem has a known solution called the Sears-Haack body (see below). This solution is useful, for example, in supersonic aircraft design. To design an aircraft with low wave drag, it should have an area distribution close to that of the Sears-Haack body (it's more complicated than that, but that's the basic idea).
For this exercise, we assume that the solution is unknown, and use surrogate-based optimization to find the best design we can. We parameterize the body of revolution as shown below, with six radii to optimize (the body has fore-aft symmetry, and the aft half is not shown). Points are clustered towards the nose where the shape changes rapidly. The interior 6 points are bookended by a point at the nose with 0 radius so that the body is closed, and a point with a fixed radius of 5 so that the diameter of the body is constant (all dimensions are unitless in this problem). An Akima spline is fit between the 8 points to define the full shape. The length of the body is 100.
My colleague Dev Rajnarayan has provided the code that will be used to analyze the wave drag of the supersonic body as a function of the six radii. You will not have access to the source code, and the output has some "discretization noise", both of which simulate common real-life scenarios. Think of this code as a computational fluid dynamics (CFD) simulation. CFD is typically very timing consuming to run (the code in this exercise is quick) and unless you have access to the source to add analytic gradients, or gradients are already provided, then optimizing with CFD in the loop is usually too costly. Because of a limited time budget we want to find a good solution with as few function calls as possible. We may also want to explore variations in our design after finding the optimal solution. These requirements make the problem a good candidate for a surrogate model. We will construct a global polynomial surrogate model of the simulation and will optimize the surrogate rather than the actual function.
The following write-up will guide you through the steps of the analysis, but you'll need to fill in some of the details yourself. This guide uses Matlab because the external code is written in Java and it's easier to interface with for this one-off analysis. Everywhere where you need to complete the code is marked with a %TODO.
First, clear everything and load the provided jar files (the external code). The external code is available on our class website if you haven't already downloaded it: http://flow.byu.edu/me575/assignments/
In [11]:
clear; close all; clc; clear java;
% Add jar files to java classpath
javaaddpath({strcat(pwd,'/me575hw6.jar'), strcat(pwd,'/colt.jar')});
wavedrag = @WaveDragAS.compute; % rename function call for convenience
% syntax: drag = wavedrag(x) % where x is a vector of length six
Let's define some reasonable bounds on the design variables:
In [8]:
% Lower bound
lb = [0.05, 0.15, 0.65, 1.75, 3.00, 4.25];
% Upper bound
ub = [0.25, 0.40, 0.85, 2.25, 3.25, 4.75];
We need to sample our function to construct the initial surrogate, and will use Latin Hypercube Sampling (LHS). Matlab has a function called lhsdesign
which can do this for us. We will start with a really small number of function calls (20) in our 6-dimensional space, but you can adjust that number as you like. The lhsdesign call will return numbers between 0 and 1, and so we need to scale those so that the number span lb to ub.
We also need to evaluate our wave drag coefficient function at each of the sample points. Note that LHS is based on random number generation so you will get different sample points every time (unless you set the random number generator seed).
In [4]:
% Construct LHS Sample (use lhsdesign)
M = 20; % number of samples
N = 6; % number of design variables
% TODO
% in a loop, create the actual x using the equation above,
% and evaluate wavedrag(x) at each x, and save the result a vector called f.
% TODO
% f =
We now need to decide what type of surrogate to construct. For this exercise we choose a polynomial model, and more specifically a quadratic model. (To do a better job we should do parameter studies where we vary the order of the model, try other surrogate models, and use cross-validation to select the model, but we will skip all of that to keep things simpler in this exercise.) There are 6 dimensions so our quadratic model will have 28 unknowns that we need to estimate. The constant $c$ provides 1 term, the linear term $l$ has 6 terms, and the quadratic term Q has $n(n+1)/2 = 21$ terms.
$$\text{drag} \approx c + l^T x + x^T Q x$$Note that there are only 28 terms because we don't want to double count terms like $x_1 x_2$ and $x_2 x_1$, which are actually the same. If we combine all the unknowns from $c$, $l$, and $Q$ into one unknown vector $w$, we can rewrite that same equation as:
\begin{align} \hat{f} &= w_1 + w_2 x_1 + w_3 x_2 + \ldots + w_8 x_1^2 + w_9 x_1 x_2 + w_{10} x_1 x_3 + \ldots + w_{28} x_6^2\\ \hat{f} &= w^T x_{exp} \end{align}where $\hat{f}$ is our estimated value for $f$. The expanded x vector $x_{exp}$ is something we are going to need a few times so I've provided a function called expandQuad.m to do that for you.
syntax: x_exp = expandQuad(x)
We can now use our samples to perform a maximum likelihood estimate of $w$. We will form a linear system
\begin{equation} \left[\begin{matrix} f_1 \\ f_2\\ \vdots\\ f_M \end{matrix}\right] = \left[\begin{matrix} [1 & x_1 & x_2 & \ldots & x_1^2 & x_1 x_2 & x_1 x_3 & \ldots & x_6^2]_{\textrm{at sample 1}}\\ [1 & x_1 & x_2 & \ldots & x_1^2 & x_1 x_2 & x_1 x_3 & \ldots & x_6^2]_{\textrm{at sample 2}}\\ \vdots\\ [1 & x_1 & x_2 & \ldots & x_1^2 & x_1 x_2 & x_1 x_3 & \ldots & x_6^2]_{\textrm{at sample M}}\\ \end{matrix} \right] \left[\begin{matrix} w_1 \\ w_2\\ w_3 \\\vdots\\ w_{28} \end{matrix}\right] \end{equation}Which can be expressed as $f = \Phi w$ where $\Phi$ is
\begin{equation} \Phi = \left[\begin{matrix} {x_{exp}}_1\\ {x_{exp}}_2\\ \vdots\\ {x_{exp}}_M\\ \end{matrix}\right] \end{equation}
In [5]:
% Create PHI (in a loop)
PHI = zeros(M, 28);
% TODO
We can now estimate the w terms through a separate optimization process. In this case the optimization is easy because our parametrization is linear in w. The equation $f = \Phi w$ does not have a unique solution, instead we are interested in a least-squares approximate solution (minimizing the sum of the squared errors between prediction and actual function values). We now have $\Phi$ and $f$ so we can use a least squares solution to find $w$.
In [6]:
% Compute w from a least squares solution of Phi w = f
% TODO
With an estimate for $w$, we have created a surrogate model that can estimate the wave drag coefficient at any point (accuracy at any point is another question). The estimate of the drag at any point $x$ using our surrogate quadratic model would simply be:
\begin{equation} \widehat{C_D} = x_{exp}^T w \end{equation}or in code:
CDhat = expandQuad(x)*w;
With our surrogate model constructed we can now perform an optimization using the surrogate model to find a minimum drag design (according to the surrogate).
In [ ]:
% optimization or surrogate model
J = @(x) expandQuad(x)*w;
x0 = [0.100, 0.198, 0.7, 1.800, 3.200, 4.550]; % starting point
[xopt, fopt] = fmincon(J, x0, [], [], [], [], lb, ub);
The resulting point will probably not be particularly accurate. This is because we used an extremely small number of samples (20) to construct our initial surrogate in a 6-dimensional space, so we can't hope for high accuracy everywhere except near the sample points. One strategy to improve this, is to sample more points initially to create a better global model. This would be a good approach if wanted to explore global changes to the designs. However, in this case we don't necessarily care about having an accurate model everywhere, but instead care about having an accurate model near the optimum. We intentionally kept the initial sampling small because we are going to add our "optimum" solution to our sample, recreate the surrogate with the new point added, and then reoptimize using the new surrogate. This should improve our surrogate near the actual optimum, and if we iterate a few times we should get a pretty good solution. (Note: If we are not careful we could add points to our surrogate that are really close to points that our already there, which can cause numerical issues with our least squares solutions---but we won't worry about that in this example.)
Fortunately, we also know exactly how far our surrogate is off by evaluating the actual function at the point in question. Let's run our optimization until the optimal point has a predicted drag coefficient within 5% (arbitrarily chosen) of the actual drag coefficient.
$$ \frac{f(\hat{x}^*) - \hat{f}(\hat{x}^*)}{f(\hat{x}^*)} < 0.05$$where $x^*$ is the actual (unknown) optimal point, $\hat{x}^*$ is our esitmated optimal point, $\hat{f}$ is our surrogate model, and $f$ is the actual model.
We will see that even though the difference between our predicted function value $\hat{f}(\hat{x}^*)$ and the actual function value $f(\hat{x}^*)$ may be off by as much as 5%, the difference between the actual function value $f(\hat{x}^*)$ and the actual optimum $f^* = f(x^*)$ (which is what we actually care about) is smaller. In addition to our error stopping criteria, we will limit the loop to 20 function calls at the most (generally the loop breaks after about 10).
In [ ]:
% starting design
x0 = [0.100, 0.198, 0.7, 1.800, 3.200, 4.550];
for i = 1:20
% perform optimization using surrogate
% TODO
% compute the error in our surrogate prediction and decide whether or not to break
% TODO
% if not, add another row to sample data (add a row to PHI and a row to F)
% be sure to use the actual function value at xopt
% and not not the f from fmincon which is based on the surrogate
% TODO
% re-estimate the surrogate parameters (w)
% TODO
end
We can use the optimal point of the previous solution as the starting point of the next iteration, but evaluating the surrogate is ridiculously cheap (just a vector-vector multiply) so it really doesn’t matter. We now have a pretty good solution to the minimum wave drag body of revolution and it should take around 30 function calls. By comparison, optimizing with the actual function takes around 100 function calls for this same starting point (depending on which algorithm you use), and the result is generally not that much better:
In [ ]:
J = @(x) wavedrag(x);
[xopt, fopt, exitflag, output] = fmincon(J, x0, [], [], [], [], lb, ub);
disp(output.funcCount);
% solution error
CD_sol = wavedrag(xopt);
fprintf('Optimization Error = %3.3g%%\n', (factual - CD_sol)/CD_sol*100); % factual comes from the SBO
The error in using surrogate-based optimization (SBO) versus a full optimization in this problem is generally less than 1% (will vary randomly because of LHS), but requires only about 1/3 as many function calls.
We can also make some comparisons to the theoretical minimum because it is a known solution for this problem. It is important to note that even optimizing with the actual function will not reach the theoretical minimum because of the discretization error (gets within 1.63\%). The theoretical minimum wave drag for a fixed length and diameter is:
\begin{equation} {C_D}_{min} = \left(\frac{\pi d}{l}\right)^2 \end{equation}We can print out our error as compared to the theoretical minimum:
In [ ]:
% theoretical error
d = 10; l = 100;
CD_min = (pi*d/l)^2;
fprintf('Theoretical Error = %3.3g%%\n', (factual - CD_min)/CD_min*100);
Finally, we can plot the difference in shapes between our SBO solution and the theoretical solution. The difference should be extremely minimal.
In [ ]:
% --------------- Plot geometry ------------------------
% SBO solution
x = l*[0.0, 0.005, 0.01, 0.025, 0.1, 0.2, 0.35, 0.5, 0.65, 0.8, 0.9, 0.975, 0.99, 0.995, 1.0];
r = [0 xopt 5 fliplr(xopt) 0]; % this must come from your SBO solution
% theoretical optimal solution - Sears Haack
xSH = .01:.01:1;
rSH = d/2*sqrt(sqrt(1-xSH.^2)-xSH.^2.*log((1+sqrt(1-xSH.^2))./xSH));
xSH = l/2*[-fliplr(xSH) xSH] + l/2;
rSH = [fliplr(rSH) rSH];
figure; hold on;
h1 = plot(x,r,'b');
plot(x,-r,'b');
h2 = plot(xSH,rSH,'r--');
plot(xSH,-rSH,'r--');
axis equal;
legend([h1, h2], {'SBO','Sears-Haack'});
% ----------------------------------------------------------