https://fits.gsfc.nasa.gov/fits_primer.html
Flexible Image Transport System is data format used within astronomy for transporting, analyzing, archiving scientific data files. It is design to store data sets consisting of multidimensiional arrays and two dimensional tables.
In [1]:
%matplotlib inline
In [2]:
import os
import glob
import random
import h5py
import astropy.io.fits
import numpy as np
import matplotlib.pyplot as plt
In [3]:
# find the normalized spectra in data_path directory
# add all filenames to the list fits_paths
FITS_DIR = 'data/ondrejov/'
fits_paths = glob.glob(FITS_DIR + '*.fits')
len(fits_paths)
Out[3]:
In [4]:
# pick random fits
random_fits = random.choice(fits_paths)
random_fits
Out[4]:
A FITS file is comprised of segmets called Header/Data Units (HDUs). The first HDU is called the 'Primary HDU'. The primary data array can contain a 1-999 dimensional array of numbers. A typical primary array could contain a 1 dimensional spectrum, a 2 dimensional image, a 3 dimensional data cube.
Any number of additional HDUs may follow the primary array. These HDUs are referred as 'extensions'. There are three types of standart extensions currently defined:
XTENSION = 'IMAGE'
)XTENSION = 'TABLE'
)XTENSION = 'BINTABLE'
)
In [5]:
# open file with astropy
hdulist = astropy.io.fits.open(random_fits)
# display info about the HDUs
hdulist.info()
Every HDU consists of an ASCII formatted 'Header Unit' and 'Data Unit'.
Each header unit contains a sequence of fixed-length 80 character long keyword record which have form:
KEYNAME = value / comment string
Non-printing ASCII character such as tabs, carriage-returns, line-feeds are not allowed anywhere in the header unit.
In [6]:
hdulist[0].header
Out[6]:
In [7]:
hdulist[1].header
Out[7]:
Note that the data unit is not required. The image pixels in primary array or an image extension may have one of 5 supported data types:
The othe 2 standard extensions, ASCII tables and binary tables, contain tabular information organized into rows and columns. Binary tables are more compact and are faster to read and write then ASCII tables.
All the entries within a column of a tables have the same datatype. The allowed data formats for an ASCII table column are: integer, signe and double precision floating point value, character string. Binary table also support logical, bit and complex data formats.
In [8]:
data = hdulist[1].data
data
Out[8]:
In [9]:
flux = data.field('flux').astype(np.float64, order='C', copy=True)
wave = data.field('spectral').astype(np.float64, order='C', copy=True)
flux.shape, wave.shape
Out[9]:
In [10]:
plt.plot(wave, flux)
plt.ylabel('flux')
plt.xlabel('wavelength')
plt.grid(True)
In [11]:
def parse_fits_id(path):
return os.path.splitext(os.path.split(path)[-1])[0]
# http://astropy.readthedocs.io/en/latest/io/fits/appendix/faq.html#i-m-opening-many-fits-files-in-a-loop-and-getting-oserror-too-many-open-files
def parse_fits(filename):
'''Parse normalized spectrum from fits file.'''
try:
with astropy.io.fits.open(filename, mmap=False) as hdulist:
data = hdulist[1].data
header = hdulist[1].header
wave = data['spectral'].astype(np.float64, order='C', copy=True)
flux = data['flux'].astype(np.float64, order='C', copy=True)
date = header['DATE']
ra = header['RA']
dec = header['DEC']
except IOError as e:
print(e, filename)
return None, None
return parse_fits_id(filename), {
'wave': wave,
'flux': flux,
'date': date,
'ra': ra,
'dec': dec,
}
parse_fits(random_fits)
Out[11]:
In [12]:
H_ALPHA = 6562.8
def in_range(start, stop, val=H_ALPHA):
'''Check if val is in range [start, stop]'''
return start <= val <= stop
def in_wavelen(wavelens, val=H_ALPHA):
'''Check if val is somewhere in-between wavelens
array start and end.'''
return wavelens is not None and in_range(wavelens[0], wavelens[-1], val)
In [ ]:
# the data stucture is dict where a key is spectrum id
# and a value is dict of wavelen and flux
spectra = {
fits_id: data_dict
for fits_id, data_dict in map(parse_fits, fits_paths)
if in_wavelen(wave, H_ALPHA)
}
len(spectra)
In [ ]:
with h5py.File('data/data.hdf5', 'w') as f:
for ident, data in spectra.items():
if ident is None or data is None:
continue
wave = data['wave']
flux = data['flux']
group = 'spectra/' + ident
dset = f.create_dataset(group, (2, wave.shape[0]), dtype=wave.dtype)
dset[0, :] = wave
dset[1, :] = flux
dset.attrs['date'] = data['date']
dset.attrs['dec'] = data['dec']
dset.attrs['ra'] = data['ra']