One of the most common things in scientific computing is model fitting. Numerical Recipes devotes a number of chapters to this.
We will also need to be able to read data. We will use pickle (-01) and an ascii file reader (-02)
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import math
In [ ]:
!ls -l n6503-*.p
An alternative to execute this shell code, is to open a terminal session within juypter! Go back to the jupyter home screen, and select New->Terminal from the top left pulldown. Then navigate to the notebooks directory, and see what pickle files you have.
In [ ]:
try:
import cPickle as pickle
except:
import pickle
sp = pickle.load(open("n6503-sp.p","rb"))
#
# go query this object before going on...... because we don't remember
#
velz = sp['z']
flux = sp['i']
plt.plot(velz,flux)
plt.xlabel(sp['zunit'])
plt.ylabel(sp['iunit'])
plt.title("Pos: %s %s" % (str(sp['xpos']),str(sp['ypos'])));
In [ ]:
# compute moments of this spectrum to get an idea what the mean and dispersion of this signal is
# recall this method can easily result in zdisp < 0 for noisy data and becomes unreliable if
# the signal to noise is not high
tmp1 = flux*velz
tmp2 = flux*velz*velz
zmean = tmp1.sum()/flux.sum()
zdisp = tmp2.sum()/flux.sum() - zmean*zmean
print("mean,var:",zmean,zdisp)
if zdisp > 0:
sigma = math.sqrt(zdisp)
print("sigma,FWHM:",sigma,sigma*2.355)
Note the conversion factor from $\sigma$ to FWHM is $2\sqrt{2\ln{2}} \approx 2.355$, see also https://en.wikipedia.org/wiki/Full_width_at_half_maximum
In [ ]:
# noisy spectra can easily result in bogus values for mean and dispersion. Let's try something else:
imax = flux.argmax()
print("Max at %d: %g %g" % (imax, velz[imax], flux[imax]))
nn = 3 # pick a few nearest neighbors
flux1 = flux[imax-nn:imax+nn]
velz1 = velz[imax-nn:imax+nn]
tmp1 = flux1*velz1
tmp2 = flux1*velz1*velz1
zmean1 = tmp1.sum()/flux1.sum()
zdisp1 = tmp2.sum()/flux1.sum() - zmean1*zmean1
print("mean,var:",zmean1,zdisp1)
if zdisp1 > 0:
sigma1 = math.sqrt(zdisp1)
print("sigma,FWHM:",sigma1,sigma1*2.355)
Q1: in the previous two cells some common code can be extracted and you can refactor your code.
Code refactoring is the process of restructuring existing computer code —changing the factoring—without changing its external behavior. Refactoring improves nonfunctional attributes of the software.
The scipy module has a large number of optimization and fitting routines that work with numpy arrays. They directly call lower level C routines, generally making fitting a fast process.
To fit an actual gaussian , instead of using moments, we use the curve_fit function in scipy.optimize:
In [ ]:
from scipy.optimize import curve_fit
def gauss(x, *p):
# if len(p) != 3: raise ValueError("Error, found %d, (%s), need 3" % (len(p),str(p)))
A, mu, sigma = p
return A*np.exp(-(x-mu)**2/(2.*sigma**2))
# p0 is the initial guess for the fitting coefficients (A, mu and sigma above, in that order)
# does it matter what the initial conditions are?
p0 = [0.01, 10, 10]
#p0 = [0.004, 24, 6]
#p0 = [0.005, -50, 5]
#p0 = [0.007, 100, 22, 0]
coeff, cm = curve_fit(gauss, velz, flux, p0=p0)
flux_fit = gauss(velz, *coeff)
plt.plot(velz,flux, label='Data')
plt.plot(velz,flux_fit, label='Fit')
plt.legend()
print("Fitted amp :",coeff[0])
print("Fitted mean :",coeff[1])
print("Fitted sigma and FWHM :",coeff[2], coeff[2]*2.355)
print("Covariance Matrix :\n",cm)
# what are now the errors in the fitted values?
print("error amp:",math.sqrt(cm[0][0]))
print("error mean:",math.sqrt(cm[1][1]))
print("error sigma:",math.sqrt(cm[2][2]))
Q1: extend the gauss model to have a baseline that is not 0.
PySpecKit is an extensible spectroscopic analysis toolkit for astronomy. See http://pyspeckit.bitbucket.org/html/sphinx/index.html
Installing this with
pip install pyspeckit
resulted in an error
AttributeError: module 'distutils.config' has no attribute 'ConfigParser'
turns out this was a python2/3 hack that was needed. The current released version of pyspeckit did not handle this. A manual install of the development release solved this, although a update to pip may be needed as well:
pip install --upgrade pip
pip install https://bitbucket.org/pyspeckit/pyspeckit/get/master.tar.gz
The cell below is an adapted version of a gaussian fit case from the pySpecKit manual. By default, this will create some known data with noise. Copy the cell and change it to make it work with our spectrum from n6503. How does it compare to scipy's curve_fit ? Note that in this method initial conditions are generated to help in a robust way of fitting the gauss.
In [ ]:
import pyspeckit
# set up a gauss (amp,center,sigma)
xaxis = np.linspace(-50.0,150.0,100)
amp = 1.0
sigma = 10.0
center = 50.0
synth_data = amp*np.exp(-(xaxis-center)**2/(sigma**2 * 2.))
# Add 10% noise (but fix the random seed)
np.random.seed(123)
stddev = 0.1
noise = np.random.randn(xaxis.size)*stddev
error = stddev*np.ones_like(synth_data)
data = noise+synth_data
# this will give a "blank header" warning, which is fine
sp = pyspeckit.Spectrum(data=data, error=error, xarr=xaxis,
xarrkwargs={'unit':'km/s'},
unit='erg/s/cm^2/AA')
sp.plotter()
# Fit with automatic guesses
sp.specfit(fittype='gaussian')
# Fit with input guesses
# The guesses initialize the fitter
# This approach uses the 0th, 1st, and 2nd moments
amplitude_guess = data.max()
center_guess = (data*xaxis).sum()/data.sum()
width_guess = data.sum() / amplitude_guess / np.sqrt(2*np.pi)
guesses = [amplitude_guess, center_guess, width_guess]
sp.specfit(fittype='gaussian', guesses=guesses)
sp.plotter(errstyle='fill')
sp.specfit.plot_fit()
In [ ]:
np.random.seed(123)
x = np.linspace(0.0,10.0,10)
y = 2.0 + 3.0 * x + np.random.normal(2.0,2.0,len(x))
plt.scatter(x,y)
In [ ]:
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
print(slope,intercept,r_value,p_value,std_err)
or using the more general curve_fit from scipy
In [ ]:
from scipy.optimize import curve_fit
def line(x, A, B):
return A + B*x
coef,cm = curve_fit(line, x, y)
print(coef,cm)
#
# Q: how would you now plot the data and overplot the (model) line
See http://python4astronomers.github.io/fitting/spectrum.html for the full example.
In [ ]:
from astropy.extern.six.moves.urllib import request
url = 'http://python4astronomers.github.com/_downloads/3c273.fits'
open('3c273.fits', 'wb').write(request.urlopen(url).read())
In [ ]:
!ls -l 3c273.fits
In [ ]:
from astropy.io import fits
dat = fits.open('3c273.fits')[1].data
wlen = dat.field('WAVELENGTH')
flux = dat.field('FLUX')
In [ ]:
plt.plot(wlen,flux);
Q1: what would the approach be to fit this spectrum?