Interpolation is a method to construct new data points, knowing a set of data points. We will use it to convert units and provide the output result on a regular grid (which may be needed for further processing).
In this exercise we take the output of an azimuthal integration program (SPD) which provide the distance to the center and transform it into diffraction angle $2\theta$. We will evaluate the calculation time for various polynomial interpolation schemes and also the "correctness" of the interpolation.
Input data are created by the SPD progam and contain a 1D array of data (EDF files are usually 2D datasets), representing the diffraction intensity as function of the pixel coordinate.
In [1]:
%pylab inline
In [2]:
import fabio
img = fabio.open("moke-powder.edf")
data = img.data.reshape(-1)
for key, value in img.header.items():
print("%20s:\t %s"%(key,value))
In [3]:
plot(data)
xlabel("pixels")
Out[3]:
From the EDF header information, one can read the sample distance (SampleDistance: 0.1m) and the pixel size (PSize_1: 0.0001 m).
This is enough to define the pixel position in space. Pixel i being actually located at (i+0.5)*pixel_size. The diffraction angle is then:
$tan(2\theta_i) = \frac{(i+0.5)*PixelSize}{SampleDistance}$
nota: it is often adviced to use arctan2 which spreads from $-\pi$ to $+\pi$ while arctan is limited to $-\pi/2$ to $+\pi/2$
So one can directly use numpy functions to calculate $2\theta$:
In [4]:
tth = numpy.rad2deg(numpy.arctan2(1e-4*(numpy.arange(data.size)+0.5), 0.1))
plot(tth, data)
xlabel(r"$2\theta$ ($^{o}$)")
ylabel("Intensity")
Out[4]:
These test data are actually triangular shaped peaks at full integer values of 2$\theta$ values in degree.
The interpolated values will be taken on a much denser grid, 2000 points instead of the 300 initial points. While this oversampling is not especially meaningful on the physical point of view it will help highlighting artifacts.
All 1d interpolation are available from scipy.interpolate.interp1d. One needs to provide the initial data_set as x and y and the kind on interpolator expected (often, the order of the polynomial used) and the fill value for "out_of_bound" data.
The retruned type is a function one may call on the data set of its choice.
In [5]:
from scipy.interpolate import interp1d
nearest = interp1d(tth, data, kind="nearest", fill_value=0, bounds_error=False)
linear = interp1d(tth, data, kind="linear", fill_value=0, bounds_error=False)
cubic = interp1d(tth, data, kind="cubic", fill_value=0, bounds_error=False)
quintic = interp1d(tth, data, kind=5, fill_value=0, bounds_error=False)
In [6]:
tth_dense = numpy.linspace(0, 17, 2000)
In [7]:
plot(tth_dense, nearest(tth_dense), label="nearest")
plot(tth_dense, linear(tth_dense), label="linear")
plot(tth_dense, cubic(tth_dense), label="cubic")
plot(tth_dense, quintic(tth_dense), label="quintic")
xlabel(r"$2\theta$ ($^{o}$)")
ylabel("Intensity")
legend()
Out[7]:
In [8]:
%timeit nearest(tth_dense)
In [9]:
%timeit linear(tth_dense)
In [10]:
%timeit cubic(tth_dense)
In [11]:
%timeit quintic(tth_dense)
In [12]:
dmin, dmax = 9.8, 10.2
tth_dense = numpy.linspace(dmin, dmax, 500)
#plot(tth_dense, nearest(tth_dense), label="nearest")
mask = (tth <= dmax) & (tth >= dmin)
plot(tth[mask], data[mask], "o", label="data")
plot(tth_dense, linear(tth_dense), label="linear")
plot(tth_dense, cubic(tth_dense), label="cubic")
plot(tth_dense, quintic(tth_dense), label="quintic")
xlabel(r"$2\theta$ ($^{o}$)")
ylabel("Intensity")
legend()
Out[12]:
Our input image had peak of triangular shape: those are peaks are continuous but discontinuous on their first derivative. This discontinuity introduces artifacts when using higher order interpolators (cubic, ...).
While cubic spline (and higher order) present smoother interpolated curves, they can introduce wobbles, hence negative intensities. The safest interpolation scheme remains the linear interpolation which garanties integrated intensity conservation but at the price of a broadening of the signal.