In [28]:
% ignore - internal setup
path('psm', path)
set(0, 'DefaultLineLineWidth', 2)
set(0, 'DefaultAxesFontSize', 14);

Polynomial Surrogate Model

Consider a simple underlying function with Gaussian noise added to simulate experimental or noisy computational data.

$$y = 3x^2 + 2x + 1 + \mathcal{N}(0, \sigma), \text{ where } \sigma = 1$$

We will create the data at 20 points between $-2 \ldots 2$.


In [12]:
n = 20;
sigma = 1.0;
xdata = linspace(-2, 2, n)';
fdata = 3*xdata.^2 + 2*xdata + 1 + randn(n, 1)*sigma;

figure()
plot(xdata, fdata, 'o')
xlabel('x')
ylabel('f')


This data is our training data and consists of two pairs: $(x^{(i)}, f^{(i)})$. Given this data, we want to compute a polynomial fit. In this case, we know that it's quadratic, so let's use a quadratic model of the form:

$$\hat{f}(x) = a x^2 + b x + c$$

This is a simple least squares problem.

$$\text{minimize} \sum_i \left(\hat{f}(x^{(i)}) - f^{(i)} \right)^2 $$

Which we can rewrite in matrix form as:

$$\text{minimize} \; || \Psi w - f ||^2 $$

where

$$ \Psi = \begin{bmatrix} \text{---} & \psi(x^{(1)})^T & \text{---} \\ \text{---} & \psi(x^{(2)})^T & \text{---} \\ & \vdots \\ \text{---} & \psi(x^{(n)})^T & \text{---} \\ \end{bmatrix} $$$$ f = \begin{bmatrix} f^{(1)}\\ f^{(2)}\\ \vdots \\ f^{(n)}\\ \end{bmatrix} $$

In our specific case: $w = [a, b, c]$ and $\psi = [x^2, x, 1]$. The matrix equation can be solved as a least squares solution. Notice that this simple form works in any number of dimensions (not just one). In higher dimensions we just add more terms to $\psi$.


In [13]:
Psi = zeros(n, 3);
Psi(:, 1) = xdata.^2;
Psi(:, 2) = xdata;
Psi(:, 3) = 1;

w = Psi\fdata


w =

    2.7833
    2.1178
    1.0514

Ideally w would be [3; 2; 1] bsaed on our underlying polynomial, but it won't recover that exactly because of the noise in the function.

We now have a polynomial model: $\hat{f} = \Psi w$ that we can evaluate at any point. Let's compute at many locations along the domain and compare out fit:


In [16]:
nfull = 200;
xfull = linspace(-2, 2, nfull)';
Psifull = zeros(nfull, 3);
Psifull(:, 1) = xfull.^2;
Psifull(:, 2) = xfull;
Psifull(:, 3) = 1;
ffull = Psifull*w;

figure(); hold on;
plot(xdata, fdata, 'o')
plot(xfull, ffull, '--')
xlabel('x')
ylabel('f')


Cross Validation

Let's consider another simple function with Gaussian noise:

$$f = (6x - 2)^2*\sin(12 x - 4) + \mathcal{N}(0, \sigma)$$

In [52]:
func = @(x) (6*x-2).^2.*sin(12*x-4) + randn(size(x))*1;


% ---- create training data ---------
ndata = 20;
xdata = linspace(0, 1, ndata)';
fdata = func(xdata);

plot(xdata, fdata, 'o');
xlabel('x');
ylabel('f');


Visually, it's harder to determine what the best order for the polynomial is. Of course, in higher dimensions we cannot visualize the function and so generally won't know beforehand what terms to use in our polynomial model. We will use cross validation to help us choose an appropriate order. Below, we create a few helper functions to create and evaluate a polynomial for any order (in a 1D space):


In [32]:
%load psm/getPsi.m

In [ ]:
function Psi = getPsi(x, order)

    n = length(x);
    Psi = zeros(n, order+1);
    for i = 1:order+1
        Psi(:, i) = x.^(order - i + 1);  % [x^3 x^2 x^1 x^0] for order = 3
    end

end

In [22]:
%load psm/createpoly.m

In [ ]:
function w = createpoly(x, f, order)

    Psi = getPsi(x, order);
    w = Psi \ f;
    
end

In [23]:
%load psm/evalpoly.m

In [ ]:
function fhat = evalpoly(x, w)

    order = length(w) - 1;
    Psi = getPsi(x, order);
    fhat = Psi*w;

end

Let's try different polynomial orders and check to see what the error is in our fit.


In [53]:
ordervec = 1:19;
no = length(ordervec);
error = zeros(no, 1);

for i = 1:no
    
    % build a polynomial model from the training data 
    w = createpoly(xdata, fdata, ordervec(i));
    
    % test the error 
    fhat = evalpoly(xdata, w);
    error(i) = norm(fhat - fdata);
end


% plot error
figure()
plot(ordervec, error, 'o')
xlabel('order of polynomial')
ylabel('error')


> In createpoly (line 4)
Warning: Rank deficient, rank = 18, tol =  1.986027e-14.
> In createpoly (line 4)
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =  2.255898e-17.

This suggests that the higher the order of the polynomial the better! Of course we know that's not true. Let's look at what the polynomial model looks like in 19 dimensions:


In [54]:
order = 19;
w = createpoly(xdata, fdata, order);

nhat = 200;
xhat = linspace(0, 1, nhat)';
fhat = evalpoly(xhat, w);

figure(); hold on;
plot(xdata, fdata, 'o')
plot(xhat, fhat, '--')
ylim([-10, 20])
xlabel('x')
ylabel('f')


> In createpoly (line 4)
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =  2.255898e-17.

We notice that the error at the points we are trying to fit is very small (which is what our least squares solution is doing), but the predictive capability of the model is very poor. The reason for this issue is that we tested our model using the same points we used to train our model, so of course the error was low. What we need to do instead is keep a separate set of training data and separate set of validation data to test how good the model is. There are many methods to do this. In this example, we use k-hold out cross validation.


In [57]:
div = 10;  % we will divide our data into div segments
ndata = 20;  % number of data points
arrlength = ndata/div;  % each segment should contain this much data
idxrand = randperm(ndata);  % random index into data from 1 ... 20


% iterate through polynomial orders
for i = 1:no
    
    % iterate through divisions of data for k-holdout
    for j = 1:div
        
        % indicies of data to leave out from the random permutations
        holdout = idxrand(arrlength*(j-1)+1:arrlength*j);
    
        % separaet into training set and testing set
        xtrain = xdata;  % copy
        ftrain = fdata;
        xtrain(holdout) = [];
        ftrain(holdout) = [];
        xtest = xdata(holdout);
        ftest = fdata(holdout);
       
        % build a polynomial model from the training data 
        w = createpoly(xtrain, ftrain, ordervec(i));

        % test the error with the validation set
        fhat = evalpoly(xtest, w);
        error(i) = error(i) + norm(fhat - ftest) / div;  % average error across divisions
    end
end
        

% plot error
figure();
plot(ordervec, error, 'o');
xlabel('order of polynomial');
ylabel('error');


> In createpoly (line 4)
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =  2.176209e-16.

In [58]:
% plot error (with restricted range)
figure();
plot(ordervec, error, 'o');
xlabel('order of polynomial');
ylabel('error');
ylim([0, 25]);


Both plots are the same, but the axis is smaller on the bottom one because the error blows up quickly. Notice that now, instead of the error continualy decreasing, it decreases for a while then increases as we run into problems with overfitting. Generally, we like to choose the simplest model that gives reasonable error. The curve is pretty flat near the minimum, and in this case a good point is somewhere around a 4th order polynomial (and maybe as high as 7th order). Because I have so few points, I'm going to lean toward the smaller end.


In [56]:
order = 4;
w = createpoly(xdata, fdata, order);

nhat = 200;
xhat = linspace(0, 1, nhat)';
fhat = evalpoly(xhat, w);

figure(); hold on;
plot(xdata, fdata, 'o')
plot(xhat, fhat, '--')
ylim([-10, 20])
xlabel('x')
ylabel('f')


We can see that this results in an improved predictive model over the 19th order polynomial.


In [ ]: