DFInterpolator
Linear interpolation between gridded datapoints lies at the heart of much of what isochrones does. The custom DFInterpolator
object manages this interpolation, implemented to optimize speed and convenience for large grids. A DFInterpolator
is built on top of a pandas multi-indexed dataframe, and while designed with stellar model grids in mind, it can be used with any similarly structured data.
Let's demonstrate with a small example of data on a 2-dimensional grid.
In [1]:
import itertools
import numpy as np
import pandas as pd
x = np.arange(1, 4)
y = np.arange(1, 6)
index = pd.MultiIndex.from_product((x, y), names=['x', 'y'])
df = pd.DataFrame(index=index)
df['sum'] = [x + y for x, y in itertools.product(x, y)]
df['product'] = [x * y for x, y in itertools.product(x, y)]
df['power'] = [x**y for x, y in itertools.product(x, y)]
df
Out[1]:
The DFInterpolator
is initialized with this dataframe and then can interpolate the values of the columns at any location within the grid defined by the multiindex.
In [2]:
from isochrones.interp import DFInterpolator
interp = DFInterpolator(df)
interp([1.4, 2.1])
Out[2]:
Individual columns may also be accessed by name:
In [3]:
interp([2.2, 4.6], ['product'])
Out[3]:
This object is very similar to the linear interpolation objects available in scipy, but it is significantly faster for single interpolation evaluations:
In [4]:
from scipy.interpolate import RegularGridInterpolator
nx, ny = len(x), len(y)
grid = np.reshape(df['sum'].values, (nx, ny))
scipy_interp = RegularGridInterpolator([x, y], grid)
# Values are the same
assert(scipy_interp([1.3, 2.2])==interp([1.3, 2.2], ['sum']))
# Timings are different
%timeit scipy_interp([1.3, 2.2])
%timeit interp([1.3, 2.2])
The DFInterpolator
is about 30x faster than the scipy regular grid interpolation, for a single point. However, for vectorized calculations, scipy is indeed faster:
In [5]:
N = 10000
pts = [1.3 * np.ones(N), 2.2 * np.ones(N)]
%timeit scipy_interp(np.array(pts).T)
%timeit interp(pts, ['sum'])
However, the DFInterpolator
has an additional advantage of being able to manage missing data---that is, the grid doesn't have to be completely filled to construct the interpolator, as it does with scipy:
In [6]:
df_missing = df.drop([(3, 3), (3, 4)])
df_missing
Out[6]:
In [7]:
interp_missing = DFInterpolator(df_missing)
interp_missing([1.3, 2.2])
Out[7]:
However, if the grid cell that the requested point is in is adjacent to one of these missing points, the interpolation will return nans:
In [8]:
interp_missing([2.3, 3])
Out[8]:
In other words, the interpolator can be constructed with an incomplete grid, but it does not fill values for the missing points.