An Introduction to Model Fitting with Python

In this notebook, I assume you have used python to some degree to analyze data. I will be using numpy/scipy, the de-facto numerical workhorse in python. I will also use matplotlib to visualize the data. We're going to fit a model to some 'fake' data: a constant continuum with a Gaussian line superimposed. The sequel to this notebook will be model fitting with Markov Chain Monte Carlo techniques (MCMC). But first, let's make the fake data.

1 Making a Fake Emission Line

The "true" data is some background flux of photons (a continuum from the source or background) that has a linear trend, plus a Gaussian line with some amplitude, width and center. I set these up as variables so it's easy to play around with them and see how things change.


In [ ]:
from numpy import *  # Deal with it
# Start by defining some parameters. Change these if you like!
cont_zp = 500.0   # value at 0
cont_slope = 5.0  # change in continuum per channel 
amplitude = 150.0 # peak of the line
width = 0.5       # Width of the line
center = 5.0      # location of the line

# Next, a grid of wavelength channels (assumed to have no uncertainty)
wave = linspace(0,10,100)
# The 'true' observations
flux = amplitude*exp(-0.5*power(wave-center,2)/width**2)+ \
       cont_zp + + cont_slope*wave
# The actual observations = true observations + Poisson noise
obs_flux = random.poisson(flux)

So we have the wavelength on the x-axis, which is assumed to have no uncertainty. The measured flux is different from the "true" flux due to Poisson noise. Let's plot the true flux and observed flux to see how things look.


In [ ]:
%matplotlib inline
from matplotlib.pyplot import subplots, plot,step,xlabel,ylabel,show
fig,ax = subplots(1)
ax.plot(wave, flux, 'r-')
ax.step(wave, obs_flux, color='k')
ax.set_xlabel('Wavelength Chanel')
ax.set_ylabel('Counts')

2 Fitting with Non-Linear Least Squares

Just to see how you can get a quick fit, let's use the non-linear least-quares routine scipy.optimize.curve_fit. To do this, we must first write a python function that defines the model we are going to fit to the data. The first argument is the x-data, the rest are parameters (the order of the parameters will define the order of the parameter vector).


In [ ]:
def model(x, cont, slope, amp, center, width):
    model = amp*exp(-0.5*power(x-center,2)/width**2)+cont+slope*x
    return model

Now we run curve_fit. We pass in the model function, the x and y data, an initial guess for the parameters, and the error in the observations. Since the flux has Poisson noise, we can simply put in $\sigma(y) = \sqrt y$.


In [ ]:
from scipy.optimize import curve_fit
popt,pcov = curve_fit(model, wave, obs_flux, p0=(425.,0.0,80.,4.5,1.0))
print(popt)
err = sqrt(diag(pcov))
print(err)

The popt variable holds the best fit parameters as a length-4 array and pcov is the 4X4 covariance matrix. The diagonal of this is the variance of each parameter, so the square root of the diagonal gives the formal errors. Let's plot out this least-squares answer and compare with the "true" value.


In [ ]:
ax.plot(wave, model(wave, *popt), 'b-')  # Note:  *popt is a python parameter substitution trick
fig

Aside from the best-fit values and their uncertainties, it's also a good idea to examine the covariance matrix, to see how correlated the parameters are. A quick way to do this is to construct the correlation matrix from the covariance matrix $C[i,j]$ and errors $\sigma[i]$: $$\rho[i,j] = \frac{C[i,j]}{\sigma[i]\sigma[j]}$$ positive values denote correlation, negative denote anti-correlation. $\rho$ ranges from -1 to 1. A value close to 0 denotes no significant correlation.


In [ ]:
pcor = pcov/outer(err,err)
for i in range(pcor.shape[0]):
  for j in range(pcor.shape[1]):
    print("{:5.2f} ".format(pcor[i,j]), end='')
  print()

From this correlation matrix, you can probably see that the continuum zero-point (first row/column) is significantly anti-correlated with the continuum slope (second row/column) and the amplitude (third row/column) is anti-correlated with the width (5th row/column). The center of the line (fourth row/column) is not significantly correlated with any of the parameters. If you think aobut it, this makes sense.

A way to visualize the correlations is to plot equal-probability ellipses in parameter space. There's no automatic way to do this that I'm aware of, so we'll follow this procedure. Briefly, we'll compute the eigenvectors and eigenvalues of the covariance matrix which gives us the major and minor axes of the ellipse. We then need to scale the whole ellipse by a factor that depends on the number of parameters we're fitting (degrees of freedom) and there are lookup tables for that, but I've just supplied the value.

Matplotlib does not (yet) have a simple function to plot ellipses. We have to use the deeper-down API to first create an ellipse artist and then add this artist to the current axis (which we get with the gca() function).


In [ ]:
from matplotlib.patches import Ellipse
from matplotlib.pyplot import gca,show,xlim,ylim,legend,axvline,axhline
eigval,eigvec = linalg.eigh(pcov[0:2,0:2])
if eigval[0]<eigval[1]:   # make sure eigvals are reverse-sorted
    eigval = eigval[::-1]
    eigvec = eigvec[:,::-1]
# eigvec is 2X2 matrix, each eigenvector is a column. Compute angle of 
# first vector which will be the major axis of the ellipse
theta = 180.0/pi*arctan2(eigvec[1,0],eigvec[0,0])
# The full width and height of the 68% error ellipse is 
# 2*sqrt(eigval)*sqrt(s), where for 5 degrees of freedom, s = 5.9
width,height = 2*sqrt(eigval)*sqrt(5.9)
plot([popt[0]],[popt[1]], "*", ms=18, label="Best-fit solution")
ell = Ellipse(xy=[popt[0],popt[1]], width=width, height=height, angle=theta,
              fc='None', ec='red')
ax = gca()
ax.add_artist(ell)
# Show the real answer:
axhline(cont_slope, linestyle='--', label="True answer")
axvline(cont_zp, linestyle='--')
xlabel('cont_zp')
ylabel('cont_slope')
# Set some reasonable limits
xlim(popt[0]-4*err[0],popt[0]+4*err[0])
ylim(popt[1]-4*err[1],popt[1]+4*err[1])
legend(numpoints=1)
show()

Does the true value end up in your error ellipse? Should it? Well, if it really is a 68% error ellipse, then we would expect the true answer to end up within the ellipse 68% of the time. So if you re-run this entire notebook 100 times, you'd expect it to lie outside the ellipse about 32 times. If you make the ellipse twice as large (2-sigma), then you should only end up outside the ellipse 5 times. A 3-sigma error ellipse will only fail 1% of the time. Also, if you kept track of where the best-fit solution falls with respect to the true answer each time, it should make an elliptical pattern like the one plotted above, but centered on the true value. In the you'll see how MCMC methods give us this kind of "try it again and again" for free.