Data fitting

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of eperimental data points.

In this exercise we will calculate the sum of the "medipix.edf" image along the vertical and horizontal axis and fit a gaussian function, either using scipy or the module from PyMca.

Initialization of the scientific scripting environment



In [1]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
import fabio
img = fabio.open("medipix.edf").data
imshow(np.log(img))









    



-c:3: RuntimeWarning: divide by zero encountered in log






    Out[2]:





<matplotlib.image.AxesImage at 0x7fe93341ccd0>



In [3]:

    
vert = img.sum(axis=1)
plot(vert)
title("Sum along the horizontal direction")









    Out[3]:





<matplotlib.text.Text at 0x7fe9333559d0>

Definition of the fitting function

The fitting function is a function of the data-point (xdata) and of a sert of parameters.

For a gaussian function defined as: gaussian



In [4]:

    
def gaussian(x, height, center, sigma, offset):
    return height*exp(-(x-center)**2/(2*sigma**2)) + offset

An essential part is the initial set of parameters p0:



In [5]:

    
p0 = [70000, 140, 10,10000]
xdata = numpy.arange(len(img),dtype="float")
ydata = vert.astype("float")
plot(vert, label="Actual data")
plot(gaussian(xdata, *p0), label="Initial guess")
legend()
title("Before the fit:")









    Out[5]:





<matplotlib.text.Text at 0x7fe9332e22d0>

Fitting using scipy optimizers:



In [6]:

    
from scipy.optimize import curve_fit
p,cov = curve_fit(gaussian, xdata, ydata, p0)
print(p)









    



[  6.31031426e+04   1.35955545e+02   2.99885986e+00   1.19186952e+04]



In [7]:

    
%timeit p,cov = curve_fit(gaussian, xdata, ydata, p0)









    



1000 loops, best of 3: 1.05 ms per loop



In [8]:

    
#Display the result of the fit
plot(gaussian(xdata, *p0), label="Initial guess")
plot(gaussian(xdata, *p), label="Fitted curve")
plot(vert,label="Experimental data")
legend()
title("Result of the fit")









    Out[8]:





<matplotlib.text.Text at 0x7fe92f6f93d0>

Fitting using the PyMca library

PyMca changed recently its position so depending if you are using PyMca4 or PyMca5 the import path varies ...



In [9]:

    
try:
    from PyMca5.PyMca import Gefit
except ImportError:
    from PyMca import Gefit

PyMca's Levenberg-Marquardt fitting module needs the function to be layed out the other way around:



In [10]:

    
def gaussian_ge(param, xdata):
    return gaussian(xdata, *param)

p, chi2, err = Gefit.LeastSquaresFit(gaussian_ge, p0, xdata=xdata, ydata=ydata)
print(p)









    



[63103.22036555503, 135.95551675017475, 2.998849240794874, 11918.81798431471]



In [11]:

    
%timeit p, chi2, err = Gefit.LeastSquaresFit(gaussian_ge, p0, xdata=xdata, ydata=ydata)









    



100 loops, best of 3: 3.57 ms per loop



In [12]:

    
#Display the result of the fit
plot(gaussian(xdata, *p0), label="Initial guess")
plot(gaussian(xdata, *p), label="Fitted curve")
plot(vert,label="Experimental data")
legend()
title("Result of the fit")









    Out[12]:





<matplotlib.text.Text at 0x7fe92f699690>

Conclusion

Curve fitting goes alway via the following steps:

get the data_point x_data and y_data
define the fitting function as y_data = function (x_data, param)
chose an initial guess for the set of parameters p0
run the optimizer
check the result.

Note that some optimizer don't fit the function but instead minimize the error_function: error(param) = y_data - function(x_data) It is often necessary to look at the documentation of the fitting function