In [28]:
% ignore - internal setup
path('psm', path)
set(0, 'DefaultLineLineWidth', 2)
set(0, 'DefaultAxesFontSize', 14);
In [12]:
n = 20;
sigma = 1.0;
xdata = linspace(-2, 2, n)';
fdata = 3*xdata.^2 + 2*xdata + 1 + randn(n, 1)*sigma;
figure()
plot(xdata, fdata, 'o')
xlabel('x')
ylabel('f')
This data is our training data and consists of two pairs: $(x^{(i)}, f^{(i)})$. Given this data, we want to compute a polynomial fit. In this case, we know that it's quadratic, so let's use a quadratic model of the form:
$$\hat{f}(x) = a x^2 + b x + c$$This is a simple least squares problem.
$$\text{minimize} \sum_i \left(\hat{f}(x^{(i)}) - f^{(i)} \right)^2 $$Which we can rewrite in matrix form as:
$$\text{minimize} \; || \Psi w - f ||^2 $$where
$$ \Psi = \begin{bmatrix} \text{---} & \psi(x^{(1)})^T & \text{---} \\ \text{---} & \psi(x^{(2)})^T & \text{---} \\ & \vdots \\ \text{---} & \psi(x^{(n)})^T & \text{---} \\ \end{bmatrix} $$$$ f = \begin{bmatrix} f^{(1)}\\ f^{(2)}\\ \vdots \\ f^{(n)}\\ \end{bmatrix} $$In our specific case: $w = [a, b, c]$ and $\psi = [x^2, x, 1]$. The matrix equation can be solved as a least squares solution. Notice that this simple form works in any number of dimensions (not just one). In higher dimensions we just add more terms to $\psi$.
In [13]:
Psi = zeros(n, 3);
Psi(:, 1) = xdata.^2;
Psi(:, 2) = xdata;
Psi(:, 3) = 1;
w = Psi\fdata
Ideally w would be [3; 2; 1] bsaed on our underlying polynomial, but it won't recover that exactly because of the noise in the function.
We now have a polynomial model: $\hat{f} = \Psi w$ that we can evaluate at any point. Let's compute at many locations along the domain and compare out fit:
In [16]:
nfull = 200;
xfull = linspace(-2, 2, nfull)';
Psifull = zeros(nfull, 3);
Psifull(:, 1) = xfull.^2;
Psifull(:, 2) = xfull;
Psifull(:, 3) = 1;
ffull = Psifull*w;
figure(); hold on;
plot(xdata, fdata, 'o')
plot(xfull, ffull, '--')
xlabel('x')
ylabel('f')
In [52]:
func = @(x) (6*x-2).^2.*sin(12*x-4) + randn(size(x))*1;
% ---- create training data ---------
ndata = 20;
xdata = linspace(0, 1, ndata)';
fdata = func(xdata);
plot(xdata, fdata, 'o');
xlabel('x');
ylabel('f');
Visually, it's harder to determine what the best order for the polynomial is. Of course, in higher dimensions we cannot visualize the function and so generally won't know beforehand what terms to use in our polynomial model. We will use cross validation to help us choose an appropriate order. Below, we create a few helper functions to create and evaluate a polynomial for any order (in a 1D space):
In [32]:
%load psm/getPsi.m
In [ ]:
function Psi = getPsi(x, order)
n = length(x);
Psi = zeros(n, order+1);
for i = 1:order+1
Psi(:, i) = x.^(order - i + 1); % [x^3 x^2 x^1 x^0] for order = 3
end
end
In [22]:
%load psm/createpoly.m
In [ ]:
function w = createpoly(x, f, order)
Psi = getPsi(x, order);
w = Psi \ f;
end
In [23]:
%load psm/evalpoly.m
In [ ]:
function fhat = evalpoly(x, w)
order = length(w) - 1;
Psi = getPsi(x, order);
fhat = Psi*w;
end
Let's try different polynomial orders and check to see what the error is in our fit.
In [53]:
ordervec = 1:19;
no = length(ordervec);
error = zeros(no, 1);
for i = 1:no
% build a polynomial model from the training data
w = createpoly(xdata, fdata, ordervec(i));
% test the error
fhat = evalpoly(xdata, w);
error(i) = norm(fhat - fdata);
end
% plot error
figure()
plot(ordervec, error, 'o')
xlabel('order of polynomial')
ylabel('error')
This suggests that the higher the order of the polynomial the better! Of course we know that's not true. Let's look at what the polynomial model looks like in 19 dimensions:
In [54]:
order = 19;
w = createpoly(xdata, fdata, order);
nhat = 200;
xhat = linspace(0, 1, nhat)';
fhat = evalpoly(xhat, w);
figure(); hold on;
plot(xdata, fdata, 'o')
plot(xhat, fhat, '--')
ylim([-10, 20])
xlabel('x')
ylabel('f')
We notice that the error at the points we are trying to fit is very small (which is what our least squares solution is doing), but the predictive capability of the model is very poor. The reason for this issue is that we tested our model using the same points we used to train our model, so of course the error was low. What we need to do instead is keep a separate set of training data and separate set of validation data to test how good the model is. There are many methods to do this. In this example, we use k-hold out cross validation.
In [57]:
div = 10; % we will divide our data into div segments
ndata = 20; % number of data points
arrlength = ndata/div; % each segment should contain this much data
idxrand = randperm(ndata); % random index into data from 1 ... 20
% iterate through polynomial orders
for i = 1:no
% iterate through divisions of data for k-holdout
for j = 1:div
% indicies of data to leave out from the random permutations
holdout = idxrand(arrlength*(j-1)+1:arrlength*j);
% separaet into training set and testing set
xtrain = xdata; % copy
ftrain = fdata;
xtrain(holdout) = [];
ftrain(holdout) = [];
xtest = xdata(holdout);
ftest = fdata(holdout);
% build a polynomial model from the training data
w = createpoly(xtrain, ftrain, ordervec(i));
% test the error with the validation set
fhat = evalpoly(xtest, w);
error(i) = error(i) + norm(fhat - ftest) / div; % average error across divisions
end
end
% plot error
figure();
plot(ordervec, error, 'o');
xlabel('order of polynomial');
ylabel('error');
In [58]:
% plot error (with restricted range)
figure();
plot(ordervec, error, 'o');
xlabel('order of polynomial');
ylabel('error');
ylim([0, 25]);
Both plots are the same, but the axis is smaller on the bottom one because the error blows up quickly. Notice that now, instead of the error continualy decreasing, it decreases for a while then increases as we run into problems with overfitting. Generally, we like to choose the simplest model that gives reasonable error. The curve is pretty flat near the minimum, and in this case a good point is somewhere around a 4th order polynomial (and maybe as high as 7th order). Because I have so few points, I'm going to lean toward the smaller end.
In [56]:
order = 4;
w = createpoly(xdata, fdata, order);
nhat = 200;
xhat = linspace(0, 1, nhat)';
fhat = evalpoly(xhat, w);
figure(); hold on;
plot(xdata, fdata, 'o')
plot(xhat, fhat, '--')
ylim([-10, 20])
xlabel('x')
ylabel('f')
We can see that this results in an improved predictive model over the 19th order polynomial.
In [ ]: