This is a Jupyter notebook using Python. You can install Jupyter locally to edit and interact with this notebook.
Interpolation and regression address the problem of approximating functions using their (possibly noisy) values at a finite set of points. There is usually an underlying process from which the observed data are obtained, but this process is impractical to evaluate every time a function value is needed. Examples of underlying processes include:
We would like an inexpensive deterministic surrogate that we can use instead. The most common surrogate functions are polynomials and rational functions (ratios of polynomials) because they are convenient to compute with. Other choices are often made when there is prior knowledge about the behavior of the system, such as using
We start our discussion by building surrogate functions that exactly match the observations at a number of points, either given or specially chosen, using polynomials.
In the Linear Algebra notebook, we discussed Vandermonde matrices which we could use to solve for polynomial coefficients. It is also possible to compute the coefficients explicitly (rather than by solving a linear system).
Suppose we are given function values $y_0, \dotsc, y_m$ at the distinct points $x_0, \dotsc, x_m$ and we would like to build a polynomial of degree $m$ that goes through all these points. This explicit construction is attributed to Lagrange (though he was not first):
$$ p(x) = \sum_{i=0}^m y_i \prod_{j \ne i} \frac{x - x_j}{x_i - x_j} $$
In [63]:
%matplotlib notebook
import numpy
from matplotlib import pyplot
def lagrange(x, y):
def p(t):
from numpy import prod
m = len(x) - 1
w = 0
for i in range(m):
w += y[i] * (prod(t - x[:i]) * prod(t - x[i+1:])
/ (prod(x[i] - x[:i]) * prod(x[i] - x[i+1:])))
w += y[m] * prod(t - x[:m]) / prod(x[m] - x[:m])
return w
return numpy.vectorize(p)
x = numpy.linspace(-2,2,4)
y = numpy.sin(x)
p = lagrange(x, y)
xx = numpy.linspace(-3,3)
pyplot.style.use('ggplot')
pyplot.rcParams['figure.max_open_warning'] = False
pyplot.figure()
pyplot.plot(x, y, '*')
pyplot.plot(xx, p(xx), label='p(x)')
pyplot.plot(xx, numpy.sin(xx), label='sin(x)')
pyplot.legend(loc='upper left')
pyplot.show()
In [2]:
p = numpy.linalg.solve(numpy.vander(x), y)
pyplot.figure()
pyplot.plot(x, y, '*')
pyplot.plot(xx, numpy.vander(xx, 4).dot(p), label='p(x)')
pyplot.plot(xx, numpy.sin(xx), label='sin(x)')
pyplot.legend(loc='upper left')
Out[2]:
In [3]:
def vander_newton(x, abscissa=None):
if abscissa is None:
abscissa = x
n = len(abscissa)
A = numpy.zeros((len(x), n))
A[:,0] = 1
for i in range(1,n):
A[:,i] = A[:,i-1] * (x - abscissa[i-1])
return A
A = vander_newton(numpy.linspace(-1,1,5))
print(A)
In [4]:
# First, let's check that it works.
p = numpy.linalg.solve(vander_newton(x), y)
pyplot.figure()
pyplot.plot(x, y, '*')
pyplot.plot(xx, vander_newton(xx, x).dot(p), label='p(x)')
pyplot.plot(xx, numpy.sin(xx), label='sin(x)')
pyplot.legend(loc='upper left')
Out[4]:
In [5]:
def cond(mat, points, interval=(-1,1), nmax=20):
degree = numpy.arange(2, nmax)
return degree, numpy.array([numpy.linalg.cond(mat(points(*interval,n))) for n in degree])
pyplot.figure()
pyplot.semilogy(*cond(numpy.vander, numpy.linspace), label='monomial')
pyplot.semilogy(*cond(vander_newton, numpy.linspace), label='newton')
pyplot.legend(loc='upper left')
Out[5]:
In [6]:
pyplot.figure()
pyplot.semilogy(*cond(numpy.vander, numpy.linspace, (10,12)), label='monomial')
pyplot.semilogy(*cond(vander_newton, numpy.linspace, (10,12)), label='newton')
pyplot.legend(loc='upper left')
Out[6]:
We have seen that monomials and Newton bases are ill-conditioned, but we have a procedure for constructing well-conditioned bases that span the same space.
In [7]:
def vander_q(x, n=None, interval=None, print_basis=False):
if n is None:
n = len(x)
if interval is None:
a, b = min(x), max(x)
else:
a, b = interval
# Set up integration on the interval [a,b] using the midpoint rule
w = b - a
V = numpy.vander(numpy.linspace(a + 0.5*w/100, b - 0.5*w/100, 100), n, increasing=True)
V *= numpy.sqrt(w/100)
Q, R = numpy.linalg.qr(V)
if print_basis:
print('R', R)
A = numpy.vander(x, n, increasing=True)
return numpy.linalg.solve(R.T, A.T).T
p = numpy.linalg.solve(vander_q(x), y)
pyplot.figure()
pyplot.plot(x, y, '*')
pyplot.plot(xx, vander_q(xx, 4, interval=(min(x), max(x))).dot(p), label='p(x)')
pyplot.plot(xx, numpy.sin(xx), label='sin(x)')
pyplot.legend(loc='upper left')
Out[7]:
In [8]:
pyplot.figure()
pyplot.semilogy(*cond(numpy.vander, numpy.linspace, (-1,1)), label='monomial')
pyplot.semilogy(*cond(vander_newton, numpy.linspace, (-1,1)), label='newton')
pyplot.semilogy(*cond(vander_q, numpy.linspace, (-1,1)), label='q')
pyplot.legend(loc='upper left')
Out[8]:
In [9]:
def cosspace(a, b, n=50):
return (a + b)/2 + (b - a)/2 * (numpy.cos(numpy.linspace(0, numpy.pi, n)))
pyplot.figure()
pyplot.semilogy(*cond(numpy.vander, cosspace, (-1,1)), label='monomial')
pyplot.semilogy(*cond(vander_newton, cosspace, (-1,1)), label='newton')
pyplot.semilogy(*cond(vander_q, cosspace, (-1,1)), label='q')
pyplot.legend(loc='upper left')
Out[9]:
In [10]:
for n in range(2,5):
vander_q(numpy.linspace(-1,1,n), print_basis=True)
cosspace
for interpolation with monomials or Newton basis does not qualitatively change their ill-conditioning.cosspace
with orthogonal polynomials gives a small condition number.
In [11]:
x = numpy.linspace(-1, 1, 100)
M = numpy.vander(x, 8, increasing=True)
N = vander_newton(x, abscissa=numpy.linspace(-1,1,8))
Q = vander_q(x, 8)
pyplot.figure()
pyplot.plot(x, M)
pyplot.title('Monomials')
pyplot.figure()
pyplot.plot(x, N)
pyplot.title('Newton Polynomials')
pyplot.figure()
pyplot.plot(x, Q)
pyplot.title('Orthogonal Polynomials')
Out[11]:
Constructing these "good" (orthogonal) polynomials using QR factorization is a bit cumbersome and depends on a finite accuracy parameter.
Classical (19th century) mathematics discovered the polynomials we are approximating because they are eigenfunctions (resonant modes) of a differential operator, $$ \frac{d}{d x} (1 - x^2) \frac{d P_n(x)}{dx} . $$ Anyway, this theory led to a recursive definition $$\begin{split} P_0(x) &= 1 \\ P_1(x) &= x \\ (n+1) P_{n+1}(x) &= (2n+1) x P_n(x) - n P_{n-1}(x) \end{split}$$
We can implement this recurrence in code to efficiently evaluate Legendre polynomials.
In [12]:
def vander_legendre(x, n=None):
if n is None:
n = len(x)
P = numpy.ones((len(x), n))
if n > 1:
P[:,1] = x
for k in range(1,n-1):
P[:,k+1] = ((2*k+1) * x * P[:,k] - k * P[:,k-1]) / (k + 1)
return P
P = vander_legendre(x, 8)
pyplot.figure()
pyplot.plot(x, P)
pyplot.title('Legendre Polynomials')
Out[12]:
In [13]:
# Legendre polynomials are well-conditioned
pyplot.figure()
pyplot.semilogy(*cond(vander_q, cosspace, (-1,1)), label='q')
pyplot.semilogy(*cond(vander_legendre, cosspace, (-1,1)), label='legendre')
pyplot.legend(loc='upper left')
Out[13]:
Define $$ T_n(x) = \cos (n \arccos(x)) .$$ This turns out to be a polynomial, but it's not obvious why. Recall $$ \cos(a + b) = \cos a \cos b - \sin a \sin b .$$ Let $y = \arccos x$ and check $$ \begin{split} T_{n+1}(x) &= \cos (n+1) y = \cos ny \cos y - \sin ny \sin y \\ T_{n-1}(x) &= \cos (n-1) y = \cos ny \cos y + \sin ny \sin y \end{split}$$ Adding these together produces a similar recurrence: $$\begin{split} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{n+1}(x) &= 2 x T_n(x) - T_{n-1}(x) \end{split}$$ which we can also implement in code
In [14]:
def vander_chebyshev(x, n=None):
if n is None:
n = len(x)
T = numpy.ones((len(x), n))
if n > 1:
T[:,1] = x
for k in range(1,n-1):
T[:,k+1] = 2* x * T[:,k] - T[:,k-1]
return T
T = vander_chebyshev(x, 8)
pyplot.figure()
pyplot.plot(x, T)
pyplot.title('Chebyshev Polynomials')
Out[14]:
In [15]:
# Chebyshev polynomials are also well-conditioned
pyplot.figure()
pyplot.semilogy(*cond(vander_q, cosspace, (-1,1)), label='q')
pyplot.semilogy(*cond(vander_legendre, cosspace, (-1,1)), label='legendre')
pyplot.semilogy(*cond(vander_chebyshev, cosspace, (-1,1)), label='chebyshev')
pyplot.legend(loc='upper left')
Out[15]:
In [16]:
print(cond(vander_chebyshev, cosspace, (-1,1))[1])
This is actually amazing: converting from the values at some special points to the coefficients of some specially crafted polynomials has a constant condition number of about 1.6. As we will find later, Chebyshev interpolation has a number of other remarkable properties.
We've seen before that the accuracy of an interpolating polynomial is often poor near (and beyond) the ends of the interval. We've also found "good" bases for representing the polynomials and found that when "good" points are used, the condition number can be small independent of polynomial order/number of points. When points are poorly spaced, however, the condition number grows rapidly.
In [17]:
print(cond(vander_chebyshev, numpy.linspace, (-1,1))[1])
In [26]:
def chebyshev_interp_and_eval(x, xx):
"""Matrix mapping from values at points x to values
of Chebyshev interpolating polynomial at points xx"""
A = vander_chebyshev(x)
B = vander_chebyshev(xx, len(x))
return B.dot(numpy.linalg.inv(A))
print(numpy.linalg.cond(chebyshev_interp_and_eval(cosspace(-1,1,20),
numpy.linspace(-1,1,1000))))
In [27]:
print(numpy.linalg.cond(chebyshev_interp_and_eval(cosspace(-1,1,20),
cosspace(-1,1,100))))
In [28]:
print(numpy.linalg.cond(chebyshev_interp_and_eval(numpy.linspace(-1,1,20),
numpy.linspace(-1,1,100))))
In [21]:
print(numpy.linalg.cond(chebyshev_interp_and_eval(cosspace(-1,1,20),
numpy.linspace(-2,2,100))))
x=cosspace
points is good (well conditioned)x=linspace
points is bad (ill conditioned)(-1,1)
accurately; it doesn't matter whether we use linspace
or cosspace
.
In [22]:
def runge1(x):
return 1 / (1 + 10*x**2)
x = numpy.linspace(-1,1,20)
xx = numpy.linspace(-1,1,100)
pyplot.figure()
pyplot.plot(x, runge1(x), '*')
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge1(x)))
Out[22]:
In [23]:
x = numpy.linspace(-1,1,21)
pyplot.figure()
pyplot.plot(x, runge1(x), '*')
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge1(x)))
Out[23]:
In [24]:
def runge2(x):
return numpy.exp(-(4*x)**2)
pyplot.figure()
pyplot.plot(x, runge2(x), '*')
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge2(x)))
Out[24]:
In [25]:
x = cosspace(-1,1,20)
pyplot.figure()
pyplot.plot(x, runge1(x), '*')
pyplot.plot(x, runge2(x), '^')
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge1(x)))
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge2(x)))
Out[25]:
In [35]:
def runge3(x):
return 1.*(x > 0)
pyplot.figure()
pyplot.plot(x, runge3(x), '*')
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(runge3(x)))
Out[35]:
interp_and_eval
procedure above depend on the basis used to represent polynomials?
In [103]:
A = chebyshev_interp_and_eval(numpy.linspace(-1,1,20), xx)
print(A.shape, numpy.linalg.cond(A))
U, S, V = numpy.linalg.svd(A, full_matrices=0)
print(U.shape, S.shape, V.shape)
print(S[:4])
pyplot.figure()
pyplot.plot(x, V[:,:2])
pyplot.title('Worst input functions')
Out[103]:
In [110]:
pyplot.figure()
pyplot.plot(x, V[:,5])
pyplot.plot(xx, chebyshev_interp_and_eval(x, xx).dot(V[:,5]))
Out[110]:
In [43]:
x = cosspace(-1,1,80)
numpy.linalg.norm(chebyshev_interp_and_eval(x, xx).dot(numpy.abs(x)) - numpy.abs(xx), numpy.inf)
Out[43]:
In [111]:
pyplot.figure()
pyplot.plot(xx, numpy.sin(1/xx))
Out[111]:
Up to this point, we have been primarily concerned with stability. That is, we have been looking for formulations in which small changes to the input do not produce large changes to the output. Intrinsically, this problem "should" have a norm of about 1 (with suitable scaling, or using the max ($\infty$) norm -- changing the input function should change the output function by the same amount.
Now we explore accuracy: the dependence of error on the cost of interpolation. We'll start with piecewise constant interpolation.
In [120]:
def find_nearest(x, xx):
"""For each target (xx), find the index of the nearest source point (x)"""
i = x.argsort() # Indices that sort x
x = x[i] # x sorted
j = numpy.arange(len(i))[i] # Index into the original x
loc = x.searchsorted(xx)
loc -= abs(xx - x[loc-1]) < abs(xx - x.take(loc, mode='wrap'))
return j[loc]
def piecewise_constant_interp_and_eval(x, xx):
"""Construct matrix for interpolation of data at x, evaluating at xx"""
nearest = find_nearest(x, xx)
A = numpy.zeros((len(xx), len(x)))
A[numpy.arange(len(xx)), nearest] = 1
return A
x = numpy.linspace(-1,1,15)
xx = numpy.linspace(-1, 1, 100)
pyplot.figure()
pyplot.ylim(-0.2, 1.2)
pyplot.plot(x, runge3(x), '*')
pyplot.plot(xx, piecewise_constant_interp_and_eval(x, xx).dot(runge3(x)), label='interp')
pyplot.plot(xx, runge3(xx), label='exact')
Out[120]:
In [121]:
print(numpy.linalg.cond(piecewise_constant_interp_and_eval(numpy.linspace(-1,1,20),
numpy.linspace(-1,1,100))))
In [126]:
def maxerror(interp_and_eval, f, xspace, interval, npoints):
error = []
for n in npoints:
x = xspace(*interval, n)
xx = numpy.linspace(*interval, 300)
A = interp_and_eval(x, xx)
error.append(numpy.linalg.norm(A.dot(f(x)) - f(xx), numpy.inf))
return error
npoints = numpy.arange(2, 30)
pyplot.figure()
pyplot.loglog(npoints, maxerror(piecewise_constant_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='piecewise_constant')
pyplot.loglog(npoints, maxerror(chebyshev_interp_and_eval, runge1, cosspace, (-1,1), npoints), label='chebyshev cos')
pyplot.loglog(npoints, maxerror(chebyshev_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='chebyshev lin')
pyplot.loglog(npoints, 1/npoints, label='slope=-1')
pyplot.loglog(npoints, npoints**(-2.), label='slope=-2')
pyplot.legend(loc='lower left')
pyplot.xlabel('Number of points')
pyplot.ylabel('Max error')
Out[126]:
In [190]:
pyplot.figure()
pyplot.semilogy(npoints, maxerror(piecewise_constant_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='piecewise_constant')
pyplot.semilogy(npoints, maxerror(chebyshev_interp_and_eval, runge1, cosspace, (-1,1), npoints), label='chebyshev cos')
pyplot.semilogy(npoints, maxerror(chebyshev_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='chebyshev lin')
pyplot.semilogy(npoints, 1/npoints, label='$n^{-1}$')
pyplot.semilogy(npoints, npoints**(-2.), label='$n^{-2}$')
pyplot.legend(loc='lower left')
pyplot.xlabel('Number of points')
pyplot.ylabel('Max error')
Out[190]:
If we are given an arbitrary distribution of points, interpolation with a single polynomial is not robust. Piecewise constant interpolation is not very accurate and gives a rough function. We could improve the accuracy by using a piecewise linear function (see HW3), but the accuracy is still limited and the function still isn't smooth (there is a "corner" where the slope changes at each data point). Splines are a way to guarantee an arbitrary amount of smoothness. The idea is that given sorted input points $\{x_i\}_{i=0}^n$, we compute an interpolating polynomial $s_i(x)$ on every interval $(x_i, x_{i+1})$.
Given a function value $y_i$ at each $x_i$, we require $$\begin{split} s_i(x_i) &= y_i \\ s_i(x_{i+1}) &= y_{i+1} \end{split} $$ so that the polynomial interpolates our data. If the polynomial has order greater than 1, we are left with some extra degrees of freedom. To provide a unique solution, we'll need to add conditions.
The conditions above guarantee continuity, but not smoothness. We use our extra degree(s) of freedom to impose smoothness conditions of the form $$\begin{split} s_i'(x_{i+1}) &= s_{i+1}'(x_{i+1}) \\ s_i''(x_{i+1}) &= s_{i+1}''(x_{i+1}) . \end{split}$$ These conditions, which are applied at the interior nodes ($x=1,\dotsc,n-1$) couple the splines from adjacent intervals and causes the spline approximation to be globally coupled.
The conditions above are still not enough to guarantee a unique spline. Suppose we use quadratic polynomials for each $s_i$. Then with $n$ intervals, we have $n$ degrees of freedom after imposing the interpolation condition. Meanwhile, there are only $n-1$ internal nodes. If we impose continuity of the first derivative, we have $n - (n-1) = 1$ undetermined degrees of freedom. We could fix this by imposing a boundary condition, such as that the slope at one end-point (e.g., $s_0'(x_0)$) was equal to a known value. This is not symmetric and is often an unnatural condition.
Suppose we use cubic polynomials. Now we have two degrees of freedom per interval after imposing the interpolation condition. If we impose continuity of the first and second derivatives, we have $2n - 2(n-1) = 2$ remaining degrees of freedom. A common choice here is the "natural spline", $s_0''(x_0) = 0$ and $s_n''(x_n) = 0$. Cubic splines are the most popular spline in this family.
We need to choose a basis for the polynomials $s_i(x)$. We could choose $$ s_i(x) = a_i + b_i x + c_i x^2 + d_i x^3 $$ but this would be very ill-conditioned when the interval $(x_i,x_{i+1})$ is far from zero. A better-conditioned choice is $$ s_i(x) = a_i + b_i(x - x_i) + c_i(x - x_i)^2 + d_i(x - x_i)^3 . $$ The interpolation property gives $$\begin{split} a_i &= y_i \\ a_i + b_i(x_{i+1} - x_i) + c_i(x_{i+1} - x_i)^2 + d_i(x_{i+1} - x_i)^3 &= y_{i+1} \end{split}$$ and continuity of the first and second derivatives gives $$\begin{split} b_i + 2c_i(x_{i+1} - x_i) + 3d_i(x_{i+1}-x_i)^2 &= b_{i+1} \\ 2c_i + 6d_i(x_{i+1} - x_i) &= 2 c_{i+1} . \end{split}$$ After trivially eliminating the $a_i$, this is a block bidiagonal system ($3\times 3$ blocks). We can reduce this to a scalar tridiagonal system. Define $\delta_i = x_{i+1} - x_i$ and $\Delta_i = y_{i+1} - y_i$. Then eliminate $d_i$ using $$ d_i = \frac{c_{i+1} - c_i}{3\delta_i} $$ and $b_i$ using $$\begin{split} b_i\delta i &= \Delta_i - c_i\delta_i^2 - \underbrace{\frac{c_{i+1} - c_i}{3\delta_i}}_{d_i} \delta_i^3 \\ b_i &= \frac{\Delta_i}{\delta_i} - \frac{\delta_i}{3}(c_{i+1} + 2c_i) . \end{split}$$ Substituting into the equation for continuity of the first derivative gives $$ \frac{\Delta_i}{\delta i} - \frac{\delta_i}{3}(c_{i+1} + 2c_i) + 2c_i\delta_i + (c_{i+1} - c_i)\delta_i = \frac{\Delta_{i+1}}{\delta_{i+1}} - \frac{\delta_{i+1}}{3}(c_{i+2} + 2c_{i+1}) $$ which reduces to $$ \delta_i c_i + 2(\delta_i + \delta_{i+1}) c_{i+1} + \delta_{i+1} c_{i+2} = 3\left(\frac{\Delta_{i+1}}{\delta_{i+1}} - \frac{\Delta_i}{\delta_i}\right) . $$ To impose boundary conditions, we add a dummy interval on the right end (the actual value of $x_{n+1}>x_n$ cancels out) so that the equation above is valid for $i = 0,\dotsc,n-2$ and the right boundary condition $s_{n-1}''(x_n) = s_n''(x_n)$ becomes $c_n = 0$. The left boundary condition $s_0''(x_0) = 0$ yields $c_0 = 0$, so we must solve $$\begin{bmatrix} 1 & & & & & & \\ \delta_0 & 2(\delta_0+\delta_1) & \delta_1 & & & & \\ & \delta_1 & 2(\delta_1+\delta_2) & \delta_2 & & & \\ & & \ddots & \ddots & \ddots & & \\ & & & & \delta_{n-2} & 2(\delta_{n-2}+\delta_{n-1}) & \delta_{n-1} \\ & & & & & & 1 \\ \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ \\ c_{n-1} \\ c_n \end{bmatrix} = \begin{bmatrix} 0 \\ 3\left(\frac{\Delta_1}{\delta_1} - \frac{\Delta_0}{\delta_0} \right) \\ \vdots \\ 3\left(\frac{\Delta_{n-1}}{\delta_{n-1}} - \frac{\Delta_{n-2}}{\delta_{n-2}}\right) \\ 0 \end{bmatrix} .$$ After solving this equation for $c_i$, we will recover $b_i$ and $d_i$, and then can evaluate the spline at arbitrary points.
In [197]:
def spline_cubic_interp_and_eval(x, xx):
n = len(x) - 1
def s_interp(y):
perm = numpy.argsort(x)
xs = x[perm]
ys = y[perm]
delta = xs[1:] - xs[:-1]
Delta = ys[1:] - ys[:-1]
# Assemble tridiagonal system (this can be optimized)
T = numpy.zeros((n+1, n+1))
T[0,0] = 1
for i in range(1,n):
T[i,i-1:i+2] = [delta[i-1], 2*(delta[i-1]+delta[i]), delta[i]]
T[-1,-1] = 1
rhs = numpy.zeros(n+1)
rhs[1:-1] = 3*(Delta[1:]/delta[1:] - Delta[:-1]/delta[:-1])
c = numpy.linalg.solve(T, rhs)
S = numpy.zeros((n, 5)) # Matrix to hold splines as [xs,d,c,b,a]
S[:,0] = xs[:-1] # Sorted interval left ends
S[:,2] = c[:-1] # From tridiagonal solve
S[:,4] = ys[:-1] # Interpolation property
S[:,1] = (c[1:] - c[:-1])/(3*delta) # Recover d
S[:,3] = Delta/delta - delta/3*(2*c[:-1] + c[1:]) # Recover b
return S
def s_eval(S, xx):
left = S[:,0].searchsorted(xx) - 1
left[left<0] = 0 # Use the leftmost interval even if xx<=x
f = numpy.zeros_like(xx)
for i,t in enumerate(xx):
f[i] = numpy.polyval(S[left[i],1:], xx[i] - S[left[i],0])
return f
# Build an explicit matrix for the spline fit evaluated at xx
A = numpy.zeros((len(xx), len(x)))
for i,e in enumerate(numpy.eye(len(x), len(x))):
S = s_interp(e)
A[:,i] = s_eval(S, xx)
return A
spline_cubic_interp_and_eval(numpy.linspace(-1,1,3), numpy.linspace(-1,1,6))
Out[197]:
In [184]:
x = numpy.linspace(-1,1,10)
xx = numpy.linspace(-1, 1, 100)
pyplot.figure()
pyplot.ylim(-0.2, 1.2)
pyplot.plot(x, runge3(x), '*')
pyplot.plot(xx, spline_cubic_interp_and_eval(x, xx).dot(runge3(x)), label='spline')
pyplot.plot(xx, runge3(xx), label='exact')
pyplot.legend(loc='upper left')
Out[184]:
In [192]:
# Check the accuracy of cubic splines for smooth equations
npoints = numpy.arange(2,30)
pyplot.figure()
pyplot.loglog(npoints, maxerror(piecewise_constant_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='piecewise_constant')
pyplot.loglog(npoints, maxerror(chebyshev_interp_and_eval, runge1, cosspace, (-1,1), npoints), label='chebyshev cos')
pyplot.loglog(npoints, maxerror(spline_cubic_interp_and_eval, runge1, numpy.linspace, (-1,1), npoints), label='spline_cubic')
pyplot.loglog(npoints, 1/npoints, label='$n^{-1}$')
pyplot.loglog(npoints, npoints**(-2.), label='$n^{-2}$')
pyplot.legend(loc='lower left')
pyplot.xlabel('Number of points')
pyplot.ylabel('Max error')
Out[192]:
S
)
In [195]:
# Are splines well conditioned?
print(numpy.linalg.cond(
spline_cubic_interp_and_eval(numpy.linspace(-1,1,20),
numpy.linspace(-1,1,1000))))
Long ago, we solved an over-determined linear system to compute a polynomial approximation of a function. Sometimes we make approximations because we are interested in the polynomial coefficients. That is usually only for very low order (like linear or quadratic fits). Inferring higher coefficients is ill-conditioned (as we saw with Vandermonde matrices) and probably not meaningful. We now know that Chebyshev bases are good for representing high-degree polynomials, but if the points are arbitrarily spaced, how many do we need for the Chebyshev approximation to be well-conditioned?
In [201]:
def chebyshev_regress_eval(x, xx, n):
V = vander_chebyshev(x, n)
Q, R = numpy.linalg.qr(V)
return vander_chebyshev(xx, n).dot(numpy.linalg.inv(R).dot(Q.T))
print(chebyshev_regress_eval(x, xx, 5))
pyplot.figure()
pyplot.plot(xx, chebyshev_regress_eval(x, xx, 5).dot(runge1(x)), label='p(x)')
pyplot.plot(xx, runge1(xx), label='runge1(x)')
pyplot.legend(loc='upper left')
Out[201]:
In [ ]: