Interpolation vs curve-fitting

Consider a discrete set of data points

$$ (x_i, y_i),\quad i=0,\ldots,N,$$

and that we wish to approximate this data in some sense. The data may be known to be exact (if we wished to approximate a complex function by a simpler expression say), or it may have errors from measurement or observational techniques with known or unknown error bars.

Interpolation

Interpolation assumes that these data points are exact (e.g. no measurement errors) and at distinct $x$ locations. It aims to fit a function (or curve), $y=f(x)$, to this data which exactly passes through the $N$ discrete points. This means that we have the additional constraint on the $x_s$'s that $$x_0 < x_1 < \ldots < x_N,$$ and that $$y_i=f(x_i),\quad \forall i.$$

In this case the function $f$ is known as the interpolating function, or simply the interpolant.

Curve-fitting

Alternatively, when we have data with noise, or multiple different measurement values ($y$) at a given $x$ then we cannot fit a function/curve that goes through all points exactly, and rather have to perform curve-fitting - finding a function that approximates the data in some sense by does not necessarily hit all points. In this case we no longer have the requirement that $$x_0 < x_1 < \ldots < x_N$$ and can consider the data simply as a cloud of points. This is the most typical case for real world data which contains variability and noise giving rise to multiple different measurements (i.e. $y$ values) at the same $x$ location.

An example of interpolation would be to simply fit a line between every successive two data points - this is a piecewise-linear (an example of the more general piecewise-polynomial) interpolation.

If we were to construct a single straight line ($y=mx+c$ where we have only two free parameters $m$ and $c$) that, for example, minimised that sum of the squares of the differences to the data, this would be what is known as a least squares approximation to the data using a linear function. In real data this fitting of data to a function has the effect of smoothing complex or noisy data.

Choice of interpolating function

We have a lot of choice for how we construct the interpolating or curve-fitting function. Considerations for how to do this include the smoothness of the resulting function (i.e. how many smooth derivatives it has - cf. the piecewise polynomial case - what does this approximation tell us about the rate of change of the data?), replicating known positivity or periodicity, the cost of evaluating it, etc.

Some choices include: polynomials, piecewise polynomials, trigonometric series (sums of sines and cosines leading to an approximation similar to Fourier series).

Lagrange polynomial

Lagrange polynomials are a particularly popular choice for constructing an interpolant for a given data set. The Lagrange polynomial is the polynomial of the least degree that passes through each data point in the set. The interpolating polynomial of the least degree is unique.

Given a set of points as defined above, the Lagrange polynomial is defined as the linear combination

$$L(x) = \sum_{i=0}^{N} y_i \ell_i(x).$$

The functions $\ell_i$ are known as the Lagrange basis polynomials defined by the product

$$\ell_i(x) := \prod_{\begin{smallmatrix}0\le m\le N\\ m\neq i\end{smallmatrix}} \frac{x-x_m}{x_i-x_m} = \frac{(x-x_0)}{(x_i-x_0)} \cdots \frac{(x-x_{i-1})}{(x_i-x_{i-1})} \frac{(x-x_{i+1})}{(x_i-x_{i+1})} \cdots \frac{(x-x_k)}{(x_i-x_k)},$$

where $0\le i\le N$.

Notice from the definition the requirement that no two $x_i$ are the same, $x_i - x_m \neq 0$, so this expression is always well-defined (i.e. we never get a divide by zero!) The reason pairs $x_i = x_j$ with $y_i\neq y_j$ are not allowed is that no interpolation function $L$ such that $y_i = L(x_i)$ would exist; a function can only get one value for each argument $x_i$. On the other hand, if also $y_i = y_j$, then those two points would actually be one single point.

For all $i\neq j$, $\ell_j(x)$ includes the term $(x-x_i)$ in the numerator, so the whole product will be zero at $x=x_i$:

$\ell_{j\ne i}(x_i) = \prod_{m\neq j} \frac{x_i-x_m}{x_j-x_m} = \frac{(x_i-x_0)}{(x_j-x_0)} \cdots \frac{(x_i-x_i)}{(x_j-x_i)} \cdots \frac{(x_i-x_k)}{(x_j-x_k)} = 0$.

On the other hand,

$\ell_i(x_i) := \prod_{m\neq i} \frac{x_i-x_m}{x_i-x_m} = 1$

In other words, all basis polynomials are zero at $x=x_i$, except $\ell_i(x)$, for which it holds that $\ell_i(x_i)=1$, because it lacks the $(x-x_i)$ term.

It follows that $y_i \ell_i(x_i)=y_i$, so at each point $x_i$, $L(x_i)=y_i+0+0+\dots +0=y_i$, showing that $L$ interpolates the function exactly.

To help illustrate our discussion lets first create some data in Python and take a look at it.


In [1]:
%pylab inline

# Invent some raw data 
x=numpy.array([0.5,2.0,4.0,5.0,7.0,9.0])
y=numpy.array([0.5,0.4,0.3,0.1,0.9,0.8])

# For clarity we are going to add a small margin to all the plots.
pylab.margins(0.1)

# We want to overlay a plot the raw data a few times so lets make this a function.
def plot_raw_data(x,y):
    # Plot the data as black stars
    pylab.plot(x,y,'k*',label='raw data')
    pylab.xlabel('x')
    pylab.ylabel('y')
    pylab.grid(True)

# The simple plot function you used in Introduction to Programming last term
# will show a piecewise-linear approximation:
pylab.plot(x,y,'r',label='p/w linear')

# Overlay raw data
plot_raw_data(x,y)

# Add a legend
pylab.legend(loc='best')

pylab.show()


Populating the interactive namespace from numpy and matplotlib

We can use scipy.interpolate.lagrange from SciPy to generate the Lagrange polynomial for a dataset as shown below.

(Note: SciPy provides a [wide range of interpolators](http://docs.scipy.org/doc/scipy/reference/interpolate.html) with many different properties which we do not have time to go into in this course. When you need to interpolate data for your specific application then you should look up the literature to ensure you are using the best one.)


In [2]:
import scipy.interpolate

# Create the Lagrange polynomial for the given points.
lp=scipy.interpolate.lagrange(x, y)

# Evaluate this fuction at a high resolution so that we can get a smooth plot. 
xx=numpy.linspace(0.4, 9.1, 100)
pylab.plot(xx, lp(xx), 'b', label='Lagrange')

# Overlay raw data
plot_raw_data(x, y)

# Add a legend
pylab.legend(loc='best')

pylab.show()


Error in Lagrange interpolation

Note that it can be proven that in the case where we are interpolating a known function (e.g. a complex non-polynomial function by a simpler polynomial), the error is proportional to the distance from any of the data points (which makes sense as the error is obviously zero at these points) and to the $(n+1)$-st derivative of that function evaluated at some location within the bounds of the data. I.e. the more complex (sharply varying) the function is, the higher the error could be.

Exercise 1: Approximating a function

Sample the function $y(x)=x^3$ at the points $x=(1,2,3)$.

Write your own Python function to construct the Lagrange polynomials $L_0$, $L_1+L_0$ and $L_2+L_1+L_0$. Plot the resulting polynomials along with the error compared to the original exact function. (Guru tip: Using the pylab function [fill_between](http://matplotlib.org/examples/pylab_examples/fill_between_demo.html) provides a nice way of illustrating the difference between graphs.)

Curve fitting

Curve-fitting in the least squares sense is popular when the dataset contains noise (nearly always the case when dealing with real world data). This is straightforward to do for polynomials of different polynomial degree using numpy.polyfit, see below.


In [4]:
# Calculate coefficients of polynomial degree 0 - ie a constant value.
poly_coeffs=numpy.polyfit(x, y, 0)

# Construct a polynomial function which we can use to evaluate for arbitrary x values.
p0 = numpy.poly1d(poly_coeffs)
pylab.plot(xx, p0(xx), 'k', label='Constant')

# Fit a polynomial degree 1 - ie a straight line.
poly_coeffs=numpy.polyfit(x, y, 1)
p1 = numpy.poly1d(poly_coeffs)
pylab.plot(xx, p1(xx), 'b', label='Linear')

# Quadratic
poly_coeffs=numpy.polyfit(x, y, 2)
p2 = numpy.poly1d(poly_coeffs)
pylab.plot(xx, p2(xx), 'r', label='Quadratic')

# Cubic
poly_coeffs=numpy.polyfit(x, y, 3)
p3 = numpy.poly1d(poly_coeffs)
pylab.plot(xx, p3(xx), 'g', label='Cubic')

# Overlay raw data
plot_raw_data(x, y)

# Add a legend
pylab.legend(loc='best')

pylab.show()


Exercise 2: Squared error calculation

As described in the docs (numpy.polyfit), least squares fitting minimises the square of the difference between the data provided and the polynomial,

$$E = \sum_{i=0}^{k} (p(x_i) - y_i)^2,$$

where $p(x_i)$ is the value of the polynomial function that has been fit to the data evaluated at point $x_i$, and $y_i$ is the $i^th$ data value.

Write a Python fucntion that evaluates the squared error, $E$, and use this function to evaluate the error for each of the polynomials calculated above. Tip: Try to pass the function *p* in as an argument to your error calculation function. One of the great features of Python is that it is easy to pass in functions as arguments.

Why is the square of the difference used?

Exercise 3: Degree of approximation

Extend the example above by fitting and plotting polynomials of increasing degree past cubic. At what degree does the resulting polynomial approximation equate to the Lagrange interpolant?

Why does this make sense?

Hint: think about the number of free parameters in a polynomial, and the amount of data you have.

Extrapolation

Take to remember that interpolation by definition is used to estimate $y$ for values of $x$ within the bounds of the available data (here $[0.5,0]$) with some confidence. Extrapolation on the other hand is the process of estimating (e.g. using the interpolating function) $y$ outside the bounds of the available data. However, extrapolation requires a great deal of care as it will become increasingly inaccurate as you go further out of bounds.

Exercise 4: Extrapolation

Recreate the plots in the example above for different degrees of polynomial, setting the x-range from -2.0 to 13.0. What do you notice about extrapolation when you use higher degree polynomials?

Challenge of the day

Exercise 5: Submarine landslide size in the North Atlantic

Open the data file Length-Width.dat giving the lengths and widths of submarine landslides in the North Atlantic basin [from Huhnerbach & Masson, 2004, Fig. 7]. Fit a linear best fit line using polyfit and try to recreate the image below.

Hint: You will need to take the log of the data before fitting a line to it.

Reference: V. Huhnerbach, D.G. Masson, Landslides in the North Atlantic and its adjacent seas: an analysis of their morphology, setting and behaviour, Marine Geology 213 (2004) 343 – 362.


In [ ]: