05 Linear fits to some data

When you have a set of data that you would like to fit with some theoretical curve, you can use the SciPy optimize library to do it. When your data is linear or a subset of the data is linear, you can use a straight line fit. Here is an example of fitting a straight line to a portion of a sine curve.



In [ ]:

    
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
#Import the curve fitter from the scipy optimize package
from scipy.optimize import curve_fit

Create an array of points that represent a sine curve between 0 and 2$\pi$.



In [ ]:

    
#create the data to be plotted
x = np.linspace(0, 2*np.pi, 300)
y = np.sin(x)

Plot the data over the full range as a dashed line and then overlay the section of the data that looks roughly linear, which we will try to fit with a straight line.



In [ ]:

    
#Now plot it
plt.plot(x,y,'b--')
plt.plot(x[110:180], y[110:180]) #subset of points that we will fit
plt.show()

We need to define the function that we will try to fit to this data. In this example, we will use the equation for a straight line, which has two parameters, the slope $m$ and the y-intercept $b$.



In [ ]:

    
#Define the fit function
def func(x, m, b):
    return (m*x + b)

Before we can fit the data we need to make an initial guess at the slope and y-intercept which we can pass to the optimizer. It will start with those values and then keep trying small variations on those values until it minimizes the linear least squared difference between the data points we are trying to fit and points on the line described by those parameters.

Looking at the graph, the top-left of the solid blue curve will probably hit around $y$ = 2 when $x$ = 0 (the y-intercept). The slope is negative (decreasing y for increasing x) in the region we are fitting and it looks like the "rise" in $y$ (really it's a drop) over the "run" in $x$ appears to be about 1. Here's the parameter array we will pass to the optimizer. The order of the parameters has to match the order that they are called in the function we defined (func) so the slope comes first.



In [ ]:

    
# Make initial guess at parameters, slope then y-intercept
p0 = [-1.0, 2.0]

Now call the optimizer. It will return two arrays. The first is the set of optimized parameters and the second is a matrix that shows the covariance between the parameters. Don't worry about the details of the covariance matrix for now.



In [ ]:

    
#Call the curve fitter and have it return the optimized parameters (popt) and covariance matrix (pcov)
popt, pcov = curve_fit(func, x[110:180], y[110:180], p0)

The diagonal elements of the covariance matrix are related to the uncertainties in the optimized fit parameters - they are the square of the uncertainties, actually. Any off-diagonal elements that are non-zero tell you how correlated the parameters are. Values close to zero mean the parameters are totally uncorrelated to one another. Values close to one tell you that the parameters are tightly correlated, meaning that changing the value of one of them makes the value of the other one change by a lot. In the case of a linear fit, changing the slope of the line will change where that line intersects the y-axis, so you would expect a high degree of correlation between the slope and the y-intercept. When you are trying to understand how well a theoretical model matches data and extract parameters with some physical meaning, analyzing the covariance matrix if very important. For now, we just want the best-fit parameters and their uncertainties.



In [ ]:

    
#Compute the parameter uncertainties from the covariance matrix
punc = np.zeros(len(popt))
for i in np.arange(0,len(popt)):
    punc[i] = np.sqrt(pcov[i,i])

#Print the result
print "optimal parameters: ", popt
print "uncertainties of parameters: ", punc

Let's look at how the fit compares to the data by plotting them on top of one another. The fitresult array extends over the full range in x. You can see that a linear fit in the range of interest is pretty good, but it deviates quite significantly from the data (the sine curve) oustide that range.



In [ ]:

    
#plot the fit result with the data
fitresult = func(x,popt[0],popt[1])

plt.plot(x,y,'b--',label="data")
plt.plot(x,fitresult,'g',label="fit")
plt.legend(loc="best")
plt.show()



In [ ]: