In [9]:
import os
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook 
%config InlineBackend.figure_format = 'retina'

#%matplotlib qt
#%gui qt

dataDir = "/Users/rhl/TeX/Talks/DSFP/2018-01/Exercises/Detectors"

Let's start by looking at some extracted spectra

Read the data (and don't worry about the details of how this cell works!).

  • The reference lines (a dict arcLines indexed by fiberId containing numpy arrays):
    • pixelPos Measured centroid of an arc line (pixels)
    • pixelPosErr Error in pixelPos
    • refWavelength Nominal wavelength (from NIST)
    • modelFitWavelength Wavelength corresponding to pixelPos, based on instrument model
    • status bitwise OR of flags (see statusFlags)
  • A dict statusFlags giving the meaning of the status bits
  • wavelength0 and npPerPix A couple of scalers giving an approximate wavelength solution

  • The arc spectra (a dict spectra indexed by fiberId containing numpy arrays):

    • flux The flux level of each pixel
    • fluxVar The error in flux
    • modelFitLambda The wavelength at each pixel according to my instrument model
  • An array pixels giving the pixel values associated with the wavelengths, fluxes, and fluxVars arrays

In [11]:
dataDump = np.load(os.path.join(dataDir, "spectro.npz"))

packed      = dataDump["packed"]
statusFlags = dataDump["statusFlags"].reshape(1)[0]
fiberIds    = dataDump["fiberIds"]
dataFields  = dataDump["dataFields"]
wavelength0 = dataDump["wavelength0"]
nmPerPix    = dataDump["nmPerPix"]
pixels      = dataDump["pixels"]
fluxes      = dataDump["fluxes"]
fluxVars    = dataDump["fluxVars"]
modelFitLambdas = dataDump["modelFitLambdas"]

arcLines = {}
spectra = {}
for i, fiberId in enumerate(fiberIds):
    arcLines[fiberId] = {}
    spectra[fiberId] = {}

    for j, n in enumerate(dataFields):
        arcLines[fiberId][n] = packed[i][j]
    arcLines[fiberId]['status'] = arcLines[fiberId]['status'].astype('int')

    spectra[fiberId]["modelFitLambda"] = modelFitLambdas[i]
    spectra[fiberId]["flux"] = fluxes[i]
    spectra[fiberId]["fluxVar"] = fluxVars[i]
            
del packed; del dataFields
del modelFitLambdas; del fluxes; del fluxVars

Plot the spectrum. Do the lines from the different fibres line up?


In [12]:
plt.clf()
for fiberId in fiberIds:

# ...


  File "<ipython-input-12-875dbc7df782>", line 4
    # ...
         ^
SyntaxError: unexpected EOF while parsing

I think that the answer is, "No". That's why we need a wavelength solution!

Fortunately I have measured the positions of the brighter arclines (the dict arcLines). Choose a fibre and plot the measured lines on top of the spectrum. You should find that I did a good job with the centroiding.


In [ ]:

We have good pixel centroids and we know the true wavelengths, so we can measure our wavelength solution.

Let's concentrate on just one fiber for now; choose a fibre, any fibre.

Plot some of the fitted lines. A good place to start would be the pixel position (pixelPos) and the reference wavelength (refWavelength) for a fibre, then use wavelength0 and nmPerPix to construct an approximate (linear) wavelength solution and look at the residuals from the linear fit.


In [13]:
fiberId = 5
assert fiberId in arcLines, "Unknown fiberId: %d" % fiberId
pixelPos = arcLines[fiberId]['pixelPos']
refWavelength = arcLines[fiberId]['refWavelength']

wavelength = wavelength0 + nmPerPix*pixelPos

#...

Take a look at the statusFlags and the values of status from your fibre. You probably want to ignore some of the data. The FIT bit is good; the SATURATED, INTERPOLATED, and CR bits are bad (and the other bits aren't set in this dataset).

Remake your residual plot with the bad lines removed; is that better?


In [7]:
fiberId = 5
assert fiberId in arcLines, "Unknown fiberId: %d" % fiberId
        
pixelPos = arcLines[fiberId]['pixelPos']
refWavelength = arcLines[fiberId]['refWavelength']
status = arcLines[fiberId]["status"]

wavelength = wavelength0 + nmPerPix*pixelPos

good = np.logical_and.reduce([(status & statusFlags["FIT"]) != 0,
                               status & (statusFlags["INTERPOLATED"]) == 0])

plt.plot(pixelPos[good], (refWavelength - wavelength)[good], 'o', label=str(fiberId))

plt.legend()
plt.xlabel("pixels")
plt.ylabel(r"$\lambda_{ref} - \lambda_{linear}$");


Here's the plot for all the fibres:


In [8]:
plt.clf()
for fiberId, fiberData in arcLines.items():
    pixelPos = fiberData["pixelPos"]
    pixelPosErr = fiberData["pixelPosErr"]
    refWavelength = fiberData["refWavelength"]
    modelFitWavelength = fiberData["modelFitWavelength"]
    status = fiberData["status"]
    
    wavelength = wavelength0 + nmPerPix*(pixelPos)

    good = np.logical_and.reduce([(status & statusFlags["FIT"]) != 0,
                                   status & (statusFlags["INTERPOLATED"]) == 0])
    
    plt.plot(pixelPos[good], (refWavelength - wavelength)[good], 'o', label=str(fiberId))
    
plt.xlabel("pixels")
plt.ylabel(r"$\lambda_{ref} - \lambda_{linear}$");
plt.legend(loc=9, bbox_to_anchor=(1.0, 1.1));


Fit a curve to those residuals (it's generally better to fit to residuals, especially when you have a more informative model than our current linear one).

Use Chebyshev polynomials; I recommend using np.polynomial.chebyshev.Chebyshev.fit. You should set the domain of the fit so that it'll be usable over all the CCD's rows.

Experiment with a range of order of fitter, and look at the rms error in the wavelength solution. You can look at $\chi^2/\nu$ too, if you like, but I think you'll find that the centroiding errors are wrong.

You probably want to look at the fit (to the residuals!) and at the residuals from the fit.


In [9]:
import numpy.polynomial.chebyshev
myFiberId = 315

pixelPosErrorFloor = 1 # 1e-4

plt.clf()
plotResiduals = True
useLinearApproximation = False

fitOrder = 5 if useLinearApproximation else 2

for fiberId in arcLines:
    if False and fiberId != myFiberId:
        continue

    fiberData = arcLines[fiberId]
    #
    # Unpack data
    #
    pixelPos = fiberData["pixelPos"]
    pixelPosErr = fiberData["pixelPosErr"]
    refWavelength = fiberData["refWavelength"]
    modelFitWavelength = fiberData["modelFitWavelength"]
    status = fiberData["status"]
    
    good = np.logical_and.reduce([(status & statusFlags["FIT"]) != 0,
                                   status & (statusFlags["SATURATED"]) == 0])
    #
    # OK, on to work
    #
    linearWavelength = wavelength0 + nmPerPix*(pixelPos)
    if useLinearApproximation:
        wavelength = linearWavelength
    else:
        wavelength = modelFitWavelength
    
    nominalPixelPos = (refWavelength - wavelength0)/nmPerPix
    fitWavelengthErr = pixelPosErr*nmPerPix

    x = nominalPixelPos[good]
    y = (refWavelength - wavelength)[good]
    yerr = np.hypot(fitWavelengthErr, pixelPosErrorFloor*nmPerPix)[good]
    
    used = np.ones_like(good[good])

    wavelengthCorr = np.polynomial.chebyshev.Chebyshev.fit(
        x[used], y[used], fitOrder, domain=[pixels[0], pixels[-1]], w=1/yerr[used])
    arcLines[fiberId]['wavelengthCorr'] = wavelengthCorr
    yfit = wavelengthCorr(x)

    if plotResiduals:
        plt.plot(x, y - yfit, 'o', label=str(fiberId))
        plt.axhline(0, ls=':', color='black')
    else:
        ax = plt.plot(x, y, 'o', label=str(fiberId))[0]
        plt.plot(pixels, wavelengthCorr(pixels), color=ax.get_color())
    
    print("%-3d rms = %.2f mpix" % 
          (fiberId, 1e3*np.sqrt(np.sum((y - yfit)**2)/(len(y) - fitOrder))))
    
plt.legend(loc=9, bbox_to_anchor=(1.0, 1.1));


5   rms = 4.14 mpix
67  rms = 5.04 mpix
194 rms = 3.63 mpix
257 rms = 7.05 mpix
315 rms = 5.82 mpix
337 rms = 5.89 mpix
393 rms = 3.93 mpix
455 rms = 4.64 mpix
582 rms = 4.72 mpix
644 rms = 5.38 mpix

Use your wavelength solution to plot the arc spectra against wavelength (you started out by plotting against pixels). Do the Ne lines for the different fibres agree now?


In [13]:
plt.clf()
for fiberId in fiberIds:
    xlab = r"$\lambda$ (nm)"
    if useLinearApproximation:
        x = wavelength0 + nmPerPix*(pixels)
    else:
        x = spectra[fiberId]["modelFitLambda"]

    x += arcLines[fiberId]['wavelengthCorr'](pixels)

    plt.plot(x, spectra[fiberId]["flux"], label=str(fiberId))
    
plt.legend()
plt.xlabel(xlab)

if True:
    plt.xlim(625, 675)
    plt.ylim(-10000, 250000)


Now repeat the preceeding exercise using the model of the spectrograph (i.e. spectra[fiberId]["modelFitLambda"] not your linear approximation). What order of polynomial is needed now?

Zoom in on a single strong (but not saturated) line and compare the solution derived from the linear wavelength model to that from the model based on knowledge of the instrument.


Is that rms error honest, or are we overfitting? Modify your code to hold back some number of arclines from the fit and measure the rms only of those ones.


If this was all too easy:

I was nice and gave you clean (but real) data. In the real world you'd probably want to do an n-sigma clip on the residuals and iterate. Implement this.


In [ ]: