Ben Kamphaus, PhD
FITS data can be a pain to work with, especially when it's data outside of your particular specialty (my background is remote sensing for sensors pointed at earth, not other planets), so this notebook shows some of the head-to-desk, brute force data wrangling involved in figuring out what's going on.
In [1]:
# our routine imports
import pyfits
import numpy as np
from matplotlib import pyplot as plt
In [2]:
!ls data/20070303_003520/
In [7]:
fits_file = 'data/20070303_003520/lsb_0035209319_0x53c_sci_1.fit'
pyfits.info(fits_file)
Ok, this time we've got an image cube.
In [10]:
data, hdr = pyfits.getdata(fits_file, 0, header=True)
hdr
Out[10]:
In [11]:
np.shape(data)
Out[11]:
In [48]:
%matplotlib inline
plt.plot(np.ravel(data[:,100,100]))
plt.ylim((0,2e13))
plt.show()
In [45]:
plt.plot(np.ravel(data[100,:,100]))
plt.show()
In [46]:
plt.plot(np.ravel(data[100,100,:]))
Out[46]:
The official spec for the LEISA instrument says that pixels are BIL, using (X, Y, Z) as 256,256,N where X, Y, Z are wavelength, at X-offset, in the Nth frame. Since FITS files are created/written w/IDL, Fortran conventions (column major instead of row major) and numpy is (w/o changing args/using keywords) row major, I think the correct interpretation of this is frame, X-offset, wavelength.
This looks consistent with calibrated radiance-ish -- in that it looks biases in a black-body-ish way (except the extra radiance towards the higher wavelengths).
In [49]:
wls, wl_hdr = pyfits.getdata(fits_file, 1, header=True)
np.shape(wls)
Out[49]:
In [61]:
# the first band of this is supposed to be center wavelengths, though I'm not sure about the ordering
# this does look like this direction (slice of index 1) encompasses a range of wavelengths.
wls[0, :, 0]
Out[61]:
In [62]:
# and this one looks like slight variation in real wavelength center across pixels that
# should roughly be in the same wavelength range.
wls[0, 0, :]
Out[62]:
In [54]:
wl_hdr
Out[54]:
FITS data can be so strange. You might think "-32" would be a signed 32-bit integer, but this is floating point.
In [78]:
# ok, let's see if we can plot a spectrum. Ordering will probably be sort of strange
# based on what we've seen.
plt.plot(np.ravel(wls[0,:,100]), np.ravel(data[100,0,:]))
Out[78]:
Ok, this is an ok candidate for a spectrum. There's something of a glitchy looking line of odd (false?) contiguity across wavelengths in the above plot, maybe correctable w/a sort, but this looks reasonable as a spectrum prior to the application of gains and offsets. The spec shows that we have a dataset in the FITS file of gains and offsets as well as a map of bad pixels that can be used as a mask. Lets look at applying these.
Note: I could still be completely wrong at this point.
In [73]:
gain_offset, go_hdr = pyfits.getdata(fits_file, 4, header=True)
(go_hdr, np.shape(gain_offset))
Out[73]:
The general strategy to apply gains and offsets is this basic equation:
rad_cal = gain * DN + offset
It's fairly trivial to adapt this to Python
In [79]:
def cal(DN, gain, offset):
return gain*DN + offset
In [80]:
# with operator overloading for ndarrays in numpy, we can apply this across the arrays.
cal_data = cal(data[0,:,:], gain_offset[0,:,:], gain_offset[1,:,:])
In [92]:
# really, though, we want to apply this to each frame. this is the kind of thing that's trivial
# to express w/a list of frames, harder w/an ND array.
cal_all_data = np.zeros_like(data)
(np.shape(cal_all_data,), np.shape(data))
Out[92]:
In [114]:
# Populate cal_all_data image cube. We could probably replace this w/a vectorized operation,
# but this solves it and I don't have a perf requirement to optimize this -- yet.
for i in range(np.shape(cal_all_data)[0]):
cal_all_data[i] = cal(data[i,:,:], gain_offset[1,:,:], gain_offset[0,:,:])
In [124]:
# ok this result looks really smooth, but doesn't have the features you'd expect
# for a spectrum (fine absorption and emissivity features)
plt.plot(cal_all_data[0,250,:])
plt.show()
In [126]:
plt.plot(wls[0,:,250], cal_all_data[0,250,:])
Out[126]:
I can't really determine anything from this. I want the wavelengths in order. This is
where numpy.argsort
comes in - it returns indices we can use to sort multiple arrays.
In [130]:
sort_i = np.argsort(np.ravel(wls[0,:,250]))
plt.plot(wls[0,sort_i,250], cal_all_data[0,250,sort_i])
Out[130]:
In retrospect, the outcome of the sort should have been fairly obvious.
In [135]:
# ok, so my earlier assumption was wrong. Shouldn't draw conclusions from uncalibrated data!
# this matching calibration looks ok, but I'm not really sure what I'm looking at (other than
# that it looks like an inverted sigmoid).
plt.plot(wls[0,sort_i,250], cal_all_data[0,sort_i,5])
plt.show()
In [137]:
plt.plot(wls[0,sort_i,250], cal_all_data[100,sort_i,5])
plt.show()
In [146]:
plt.imshow(cal_all_data[:,50,:], cmap='bone')
plt.title("Image at ~wavelength " + str(np.mean(wls[0,50,:])))
plt.show()
I think this may be the image portion, but I'm pretty out of my depth here and I think I need a new strategy. Let me summarize the outcome of this session:
This is point where I think it's worth calling it quits and looking to match to a dataset with a published image or something where I can match a specific outcome, then return to more "play" with the data.