Learning Objective: Learn to interpolate 1d and 2d datasets of structured and unstructured points using SciPy.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
We have already seen how to evaluate a Python function at a set of numerical points:
$$ f(x) \rightarrow f_i = f(x_i) $$Here is an array of points:
In [2]:
x = np.linspace(0,4*np.pi,10)
x
Out[2]:
This creates a new array of points that are the values of $\sin(x_i)$ at each point $x_i$:
In [3]:
f = np.sin(x)
f
Out[3]:
In [4]:
plt.plot(x, f, marker='o')
plt.xlabel('x')
plt.ylabel('f(x)');
This plot shows that the points in this numerical array are an approximation to the actual function as they don't have the function's value at all possible points. In this case we know the actual function ($\sin(x)$). What if we only know the value of the function at a limited set of points, and don't know the analytical form of the function itself? This is common when the data points come from a set of measurements.
Interpolation is a numerical technique that enables you to construct an approximation of the actual function from a set of points:
$$ \{x_i,f_i\} \rightarrow f(x) $$It is important to note that unlike curve fitting or regression, interpolation doesn't not allow you to incorporate a statistical model into the approximation. Because of this, interpolation has limitations:
Once you have performed interpolation you can:
Warning:
The different functions in SciPy work with a range of different 1d and 2d arrays. To help you keep all of that straight, I will use lowercase variables for 1d arrays (x
, y
) and uppercase variables (X
,Y
) for 2d arrays.
We begin with a 1d interpolation example with regularly spaced data. The function we will use it interp1d
:
In [5]:
from scipy.interpolate import interp1d
Let's create the numerical data we will use to build our interpolation.
In [6]:
x = np.linspace(0,4*np.pi,10) # only use 10 points to emphasize this is an approx
f = np.sin(x)
To create our approximate function, we call interp1d
as follows, with the numerical data. Options for the kind
argument includes:
linear
: draw a straight line between initial points.nearest
: return the value of the function of the nearest point.slinear
, quadratic
, cubic
: use a spline (particular kinds of piecewise polynomial of a given order.The most common case you will want to use is cubic
spline (try other options):
In [7]:
sin_approx = interp1d(x, f, kind='cubic')
The sin_approx
variabl that interp1d
returns is a callable object that can be used to compute the approximate function at other points. Compute the approximate function on a fine grid:
In [8]:
newx = np.linspace(0,4*np.pi,100)
newf = sin_approx(newx)
Plot the original data points, along with the approximate interpolated values. It is quite amazing to see how the interpolation has done a good job of reconstructing the actual function with relatively few points.
In [9]:
plt.plot(x, f, marker='o', linestyle='', label='original data')
plt.plot(newx, newf, marker='.', label='interpolated');
plt.legend();
plt.xlabel('x')
plt.ylabel('f(x)');
Let's look at the absolute error between the actual function and the approximate interpolated function:
In [10]:
plt.plot(newx, np.abs(np.sin(newx)-sin_approx(newx)))
plt.xlabel('x')
plt.ylabel('Absolute error');
It is also possible to use interp1d
when the x data is not regularly spaced. To show this, let's repeat the above analysis with randomly distributed data in the range $[0,4\pi]$. Everything else is the same.
In [11]:
x = 4*np.pi*np.random.rand(15)
f = np.sin(x)
In [12]:
sin_approx = interp1d(x, f, kind='cubic')
In [13]:
# We have to be careful about not interpolating outside the range
newx = np.linspace(np.min(x), np.max(x),100)
newf = sin_approx(newx)
In [14]:
plt.plot(x, f, marker='o', linestyle='', label='original data')
plt.plot(newx, newf, marker='.', label='interpolated');
plt.legend();
plt.xlabel('x')
plt.ylabel('f(x)');
In [15]:
plt.plot(newx, np.abs(np.sin(newx)-sin_approx(newx)))
plt.xlabel('x')
plt.ylabel('Absolute error');
Notice how the absolute error is larger in the intervals where there are no points.
For the 2d case we want to construct a scalar function of two variables, given
$$ {x_i, y_i, f_i} \rightarrow f(x,y) $$For now, we will assume that the points $\{x_i,y_i\}$ are on a structured grid of points. This case is covered by the interp2d
function:
In [16]:
from scipy.interpolate import interp2d
Here is the actual function we will use the generate our original dataset:
In [17]:
def wave2d(x, y):
return np.sin(2*np.pi*x)*np.sin(3*np.pi*y)
Build 1d arrays to use as the structured grid:
In [18]:
x = np.linspace(0.0, 1.0, 10)
y = np.linspace(0.0, 1.0, 10)
Build 2d arrays to use in computing the function on the grid points:
In [19]:
X, Y = np.meshgrid(x, y)
Z = wave2d(X, Y)
Here is a scatter plot of the points overlayed with the value of the function at those points:
In [20]:
plt.pcolor(X, Y, Z)
plt.colorbar();
plt.scatter(X, Y);
plt.xlim(0,1)
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y');
You can see in this plot that the function is not smooth as we don't have its value on a fine grid.
Now let's compute the interpolated function using interp2d
. Notice how we are passing 2d arrays to this function:
In [21]:
wave2d_approx = interp2d(X, Y, Z, kind='cubic')
Compute the interpolated function on a fine grid:
In [22]:
xnew = np.linspace(0.0, 1.0, 40)
ynew = np.linspace(0.0, 1.0, 40)
Xnew, Ynew = np.meshgrid(xnew, ynew) # We will use these in the scatter plot below
Fnew = wave2d_approx(xnew, ynew) # The interpolating function automatically creates the meshgrid!
In [23]:
Fnew.shape
Out[23]:
Plot the original course grid of points, along with the interpolated function values on a fine grid:
In [24]:
plt.pcolor(xnew, ynew, Fnew);
plt.colorbar();
plt.scatter(X, Y, label='original points')
plt.scatter(Xnew, Ynew, marker='.', color='green', label='interpolated points')
plt.xlim(0,1)
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y');
plt.legend(bbox_to_anchor=(1.2, 1), loc=2, borderaxespad=0.);
Notice how the interpolated values (green points) are now smooth and continuous. The amazing thing is that the interpolation algorithm doesn't know anything about the actual function. It creates this nice approximation using only the original course grid (blue points).
It is also possible to perform interpolation when the original data is not on a regular grid. For this, we will use the griddata
function:
In [25]:
from scipy.interpolate import griddata
There is an important difference between griddata
and the interp1d
/interp2d
:
interp1d
and interp2d
return callable Python objects (functions).griddata
returns the interpolated function evaluated on a finer grid.This means that you have to pass griddata
an array that has the finer grid points to be used. Here is the course unstructured grid we will use:
In [26]:
x = np.random.rand(100)
y = np.random.rand(100)
Notice how we pass these 1d arrays to our function and don't use meshgrid
:
In [27]:
f = wave2d(x, y)
It is clear that our grid is very unstructured:
In [28]:
plt.scatter(x, y);
plt.xlim(0,1)
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y');
To use griddata
we need to compute the final (strcutured) grid we want to compute the interpolated function on:
In [29]:
xnew = np.linspace(x.min(), x.max(), 40)
ynew = np.linspace(y.min(), y.max(), 40)
Xnew, Ynew = np.meshgrid(xnew, ynew)
In [30]:
Xnew.shape, Ynew.shape
Out[30]:
In [31]:
Fnew = griddata((x,y), f, (Xnew, Ynew), method='cubic', fill_value=0.0)
In [32]:
Fnew.shape
Out[32]:
In [33]:
plt.pcolor(Xnew, Ynew, Fnew, label="points")
plt.colorbar()
plt.scatter(x, y, label='original points')
plt.scatter(Xnew, Ynew, marker='.', color='green', label='interpolated points')
plt.xlim(0,1)
plt.ylim(0,1)
plt.xlabel('x')
plt.ylabel('y');
plt.legend(bbox_to_anchor=(1.2, 1), loc=2, borderaxespad=0.);
Notice how the interpolated function is smooth in the interior regions where the original data is defined. However, outside those points, the interpolated function is missing (it returns nan
).