Least squares problems

We sometimes wish to solve problems of the form

$$ \boldsymbol{A} \boldsymbol{x} = \boldsymbol{b} $$

where $\boldsymbol{A}$ is a $m \times n$ matrix, where $m > n$. Clearly $\boldsymbol{A}$ is not square, and in general no solution to the problem exists. This is a typical of an over-determined problem - we have more equations than unknowns. A classical example is when trying to fit an $k$th-order polynomial to $p > k + 1$ data points - the degree of the polynomial is not high enough to construct an interpolating polynomial.

In this notebook we assume that $\boldsymbol{A}$ has full rank, i.e. the columns of $\boldsymbol{A}$ are linearly independent. We will look at the case when $\boldsymbol{A}$ is not full rank later.

Before computing least-squares problems, we start with examples of polynomial interpolation.

Note: This notebook uses interactive widgets to interactively explore various effects. The widget sliders will be be visiable through nbviewer. The below installs interactive widegts when running the notebook in a Jupyter session.

Polynomial interpolation

Polynomial interpolation involves fitting a $n$th-order polynomial to $n + 1$ data points. The polynomial interpolates each point.

Interpolating the sine function

We first consider the interpolation of 20 equally spaces points that lie of the sine graph. To do this, we use NumPy to generate 20 points $\{ x_{i} \}$ on the interval $[-\pi, \pi]$, and evaluate $\sin(x)$ at each point such that $\boldsymbol{y} = \{ \sin(x_{i})\}$:


In [1]:
import numpy as np
N = 20
x_p = np.linspace(-np.pi, np.pi, N)
y_p = np.sin(x_p)

We use the variable N to hold the number of points so we can change it if we want to experiment.

We can plot the points:


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('Points on a sine graph')
plt.plot(x_p, y_p,'ro');


With 20 data points, we can interpolate the points with a polynomial $f(x)$ of degree 19:

$$ f = c_{19} x^{19} + c_{18} x^{18} + \ldots + c_{1} x + c_{0}. $$

We can find the polynomial coefficients $c_{i}$ by solving $\boldsymbol{A} \boldsymbol{c} = \boldsymbol{y}$, where $\boldsymbol{A}$ is the Vandermonde matrix:

$$ \boldsymbol{A} = \begin{bmatrix} x_{1}^{19} & x_{1}^{18} & \ldots & x_{1}^{2} & x_{1} & 1 \\ x_{2}^{19} & x_{2}^{18} & \ldots & x_{2}^{2} & x_{2} & 1 \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ x_{20}^{19} & x_{20}^{18} & \ldots & x_{20}^{2} & x_{20} & 1 \end{bmatrix} $$

the vector $\boldsymbol{c}$ contains the polynomial coefficient

$$ \boldsymbol{c} = \begin{bmatrix} c_{19} & c_{20} & \ldots & c_{0} \end{bmatrix}^{T} $$

and the vector $\boldsymbol{y}$ contains the points $y_{i}$ that we wish to fit, which is this example are the points on the sine graph.

Note: the ordering in each row of the Vandermonde matrix above is reversed with respect to what you will find in most books. We do this because earlier versions of the NumPy built-in function for generating the Vandermonde matrix use the above ordering. Later verions provide an option to generate the more conventional ordering.

Using the NumPy built-in function to generate the Vandermonde matrix:


In [3]:
A = np.vander(x_p, N)

We can solve the system to find the coefficients:


In [4]:
c = np.linalg.solve(A, y_p)

NumPy has a function poly1d to turn the coefficients into a polynomial object, and it can display a representation of the polynomial:


In [5]:
p = np.poly1d(c)
print(p)


            19            18             17            16
-7.313e-18 x  - 2.96e-20 x  + 2.793e-15 x  + 1.02e-18 x 
              15             14             13            12
 - 7.645e-13 x  - 1.416e-17 x  + 1.606e-10 x  + 1.02e-16 x 
              11             10             9             8
 - 2.505e-08 x  - 4.086e-16 x  + 2.756e-06 x + 8.993e-16 x
              7             6            5            4          3
 - 0.0001984 x - 9.811e-16 x + 0.008333 x + 3.72e-16 x - 0.1667 x + 1 x

To plot the fitted polynomial, we evaluate the polynomial at 200 points:


In [6]:
# Create an array of 200 equally spaced points on [-pi, pi]
x_fit = np.linspace(-np.pi, np.pi, 200) 

# Evaluate the polynomial at the points
y_fit = p(x_fit)

# Plot the interpolating polynomial and the sample points
plt.xlabel('$x$')
plt.ylabel('$f$')
plt.title('Points on a sine graph interpolate by a polynomial')
plot = plt.plot(x_p, y_p, 'ro', x_fit, y_fit,'b-');


We can see from the graph that the polynomial closely ressambles the sine function.

Interpolating a noisy sine curve

We now repeat the fitting exercise, but we now add a small amount of noise to the points on the sine curve that we wish to interpolate. We will create a function to make the plot so we can easily change the amplitude of the noise.


In [7]:
def y_p_noise(noise):
    return y_p + np.random.uniform(-noise/2.0, noise/2.0, len(y_p))

We can plot the points with noise of amplitude $0.02$:


In [8]:
plt.xlabel('x')
plt.ylabel('$y$')
plt.ylim(-1.1, 1.1)
plt.title('Points on a noisy sine')
y_noise = y_p_noise(0.02)
plt.plot(x_p, y_noise, 'ro');


To the eye, the noise canniot be detected.

We can solve the system to find the coefficients:


In [9]:
c_noise = np.linalg.solve(A, y_noise)

In [10]:
p_noise = np.poly1d(c_noise)

In [11]:
y_fit = p_noise(x_fit)

plt.xlabel('$x$')
plt.ylabel('$f$')
plt.title('Points on a sine graph with noise interpolated by a polynomial')
plt.plot(x_p, y_p, 'ro', x_fit, y_fit,'b-');


The points are clearly interpolated, but the the result is now terrible near the boundaries of the interval, with large spikes. The spikes are known as Runge's phenomenon. A similar effect with a Fourier basis at discontinuties is known as 'Gibb's phenomenon'.

This is a common problem with polynomial fitting. With the exact sine points, we were lucky. With well-chosen, non-uniform interpolation points it is possible to improve the interpolation.

To explore the effect of noise, we can create an interactive plot with a slider to vary the noise apmplitude.


In [12]:
from ipywidgets import widgets
from ipywidgets import interact

@interact(noise=(0.0, 0.25, 0.005))
def plot_interp_sine(noise=0.005):
    y_noise = y_p_noise(noise)
    c_noise = np.linalg.solve(A, y_noise)
    p_noise = np.poly1d(c_noise)
    y_noise_fit = p_noise(x_fit)

    plt.xlabel('x')
    plt.ylabel('$y$')
    plt.title('Points on a noisy sine (noise amplitude: {})'.format(noise))
    plt.plot(x_p, y_noise, 'ro', x_fit, y_noise_fit,'b-');


Conditioning of the Vandermonde matrix

We have seen by example already that the conditioning of the Vandermonde matrix is poor. If the conditioning of matrix $\boldsymbol{A}$ is poor, then the conditioning of the normal matrix $\boldsymbol{A}^{T}\boldsymbol{A}$ will be much worse:

The Vandermonde matrix is notoriously ill-conditioned. Computing the condition number:


In [13]:
print("Condition number of the Vandermonde matrix: {}".format(np.linalg.cond(A, 2)))


Condition number of the Vandermonde matrix: 933564155026.0371

We see the that the condition number is very large. Such a matrix should not be solved with methods such as LU decomposition (despite what is done in the above examples!).

Least-squares fitting

We will now looking at fitting a polynomial of degree $k < n + 1$ to points on the sine graph. The degree of the polynomial is not high enough to interpolate all points, so we will compute a best-fit in the least-squares sense.

We have seen in lectures that solving the least squares solution involves solving

$$ \boldsymbol{A}^{T}\boldsymbol{A} \boldsymbol{c} = \boldsymbol{A}^{T} \boldsymbol{y} $$

If we want ot fit a $5$th-order polynomial to 20 data points, $\boldsymbol{A}$ is the $20 \times 6$ matrix:

$$ \boldsymbol{A} = \begin{bmatrix} x_{1}^{5} & x_{1}^{4} & \ldots & x_{1}^{2} & x_{1} & 1 \\ x_{2}^{5} & x_{2}^{4} & \ldots & x_{2}^{2} & x_{2} & 1 \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ x_{20}^{5} & x_{20}^{4} & \ldots & x_{20}^{2} & x_{20} & 1 \end{bmatrix} $$

and $\boldsymbol{c}$ contains the $6$ polynomial coefficients

$$ \boldsymbol{c} = \begin{bmatrix} c_{0} & c_{1} & c_{2} & c_{3} & c_{4} \end{bmatrix} $$

and $\boldsymbol{y}$ contains the 20 points we want to fit.

Fitting points on the sine graph

Let's try fitting a lower-order polynomial to the 20 data points without noise. We start with a polynomial of degree 6. We first create the Vandermonde matrix:


In [14]:
A = np.vander(x_p, 6)

and then solve $$\boldsymbol{A}^{T}\boldsymbol{A} \boldsymbol{c} = \boldsymbol{A}^{T} \boldsymbol{y}$$ and create a NumPy polynomial from the coefficients:


In [15]:
ATA = (A.T).dot(A)
c_ls = np.linalg.solve(ATA, (A.T).dot(y_p))
p_ls = np.poly1d(c_ls)
print(p_ls)


          5             4          3             2
0.005489 x + 2.716e-16 x - 0.1539 x - 2.606e-15 x + 0.9856 x + 2.934e-15

Plotting the polynomial:


In [16]:
# Evaluate polynomial at some points
y_ls = p_ls(x_fit)

# Plot
plt.xlabel('$x$')
plt.ylabel('$f$')
plt.ylim(-1.1, 1.1)
plt.title('Least-squares fit of 20 points on the sine graph with a $6$th-order polynomial')
plt.plot(x_p, y_p, 'ro', x_fit, y_ls,'b-');


To explore the polynimial order, we will create an interactive plot with a slider for the polynomial degree.


In [17]:
@interact(order=(0, 19))
def plot(order):

    # Create Vandermonde matrix    
    A = np.vander(x_p, order + 1)
    
    ATA = (A.T).dot(A)
    c_ls = np.linalg.solve(ATA, (A.T).dot(y_p))
    p_ls = np.poly1d(c_ls)
    
    # Evaluate polynomial at some points
    y_ls = p_ls(x_fit)

    # Plot
    plt.xlabel('$x$')
    plt.ylabel('$f$')
    plt.ylim(-1.2, 1.2)
    plt.title('Least-squares fit of 20 points on the sine graph with a ${}$th-order polynomial'.format(order))
    plt.plot(x_p, y_p, 'ro', x_fit, y_ls,'b-');


The fit appears to be very good. We experiment now with some other order polynomials. We will use from now on the NumPy function polyfit to shorten the code. Moreover, plolyfit will uses a different solution algorithm from what we have above, namely a singular value decomposition, to compute the same problem but with less susceptibility to round-off errors. We will write a short function to make is easy to vary the polnomial degree;


In [18]:
def plot(order=10):
    # Compute the coefficients of a nth order polynomial that fits the data points (x_p, y_p) 
    c_ls = np.polyfit(x_p, y_p, order)

    # Create a polynomial object from the coefficients
    p_ls = np.poly1d(c_ls)

    # Evaluate the polynomial at the plotting points
    y_ls = p_ls(x_fit)

    # Plot
    plt.xlabel('$x$')
    plt.ylabel('$f$')
    plt.ylim(-1.1, 1.1)
    plt.title('Least-squares fit of 20 points on the sine graph with a polynomial of degree {}.'.format(order))
    plt.plot(x_p, y_p, 'ro', x_fit, y_ls,'b-');

Starting with degree 3:


In [19]:
plot(3)


The fit is clearly not as good as for the $5$th order polynomial, nonetheless looks quite good. Now for a quadratic polynomial:


In [20]:
plot(2)


Clearly the quadratic fit is very poor.

Creating an interactive plot with a slider for the polynomial degree:


In [21]:
interact(plot, order=(0, 19));


Sine points with noise

Let's now look at a least-squares fit to the sine data with noise. We start by generalising the plot function to include noise:


In [22]:
# Compute least squares fit to sine points with noise
def plot(order=10, noise=0.0):
    # Generate points on sine graph with nosie
    y_noise = y_p_noise(noise)
    
    # Compute the coefficients of a nth order polynomial that fits the data points (x_p, y_p) 
    c_ls = np.polyfit(x_p, y_noise, order)

    # Create a polynomial object from the coefficients
    p_ls = np.poly1d(c_ls)

    # Evaluate the polynomial at the plotting points
    y_ls = p_ls(x_fit)

    # Plot
    plt.xlabel('$x$')
    plt.ylabel('$f$')
    plt.ylim(-1.1, 1.1)
    plt.title('Least-squares fit of 20 points on the sine graph with a polynomial of degree {}.'.format(order))
    plt.plot(x_p, y_noise, 'ro', x_fit, y_ls,'b-');

We start by fitting a polynomial of degree 12:


In [23]:
plot(12, 0.02)


The fit looks very good, and note that there is no discernible noise at the ends of the interval.

We now make an interactive plot to explore the interaction between noise and polynomial degree.


In [24]:
interact(plot, order=(0, 19), noise=(0.0, 0.4, 0.005));


Conditioning of the normal matrix

We have seen already that the conditioning of the Vandermonde matrix $\boldsymbol{A}$ is poor. If we consider $\boldsymbol{A}^{T}\boldsymbol{A}$, we see that the conditioning is much worse again:


In [25]:
A = np.vander(x_p, 20)
print("Condition number of A (Vandermonde matrix, 20): {}".format(np.linalg.cond(A)))
print("Condition number of (A.T)A (Vandermonde matrix, 20): {}".format(np.linalg.cond((A.T).dot(A))))


Condition number of A (Vandermonde matrix, 20): 933564155026.0371
Condition number of (A.T)A (Vandermonde matrix, 20): 1.3802264032492922e+23

The poor condition number indicates why it is not a good idea to form and solve $\boldsymbol{A}^{T}\boldsymbol{A}$ directly. In practice, robust algorithms do not follow this approach.