The goal of this notebook is to demonstrate how to read in and manipulate DESI spectra using simulated spectra created as part of a DESI Data Challenge.
If you identify any errors or have requests for additional functionality please create a new issue on https://github.com/desihub/desispec/issues or send a note to desi-data@desi.lbl.gov.
We'll assume that you've gone through the installation instructions at:
https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop/JuneMeeting
as far as Try one of our tutorials.
We'll also assume that you've grabbed the DESI data team's example spectral files and put them in the directory
$DESI_ROOT/spectro/redux/dc17a2
If you don't have these yet, a thumb drive is being passed around, but they're also at NERSC in the directory
/scratch2/scratchdirs/sjbailey/desi/dc17a/spectro/redux/dc17a2
If you've checked out desispec
from git, this jupyter notebook lives in desispec/doc/nb
and you can execute it as jupyter notebook Intro_to_DESI_spectra.ipynb
Now, let's import all the modules we'll need:
In [1]:
import os
import numpy as np
import healpy as hp
from glob import glob
import fitsio
from collections import defaultdict
from desitarget import desi_mask
import matplotlib.pyplot as plt
%pylab inline
If any of these imports fail, you should go back through the installation instructions as far as Try one of our tutorials.
In [2]:
%set_env DESI_SPECTRO_REDUX=/Users/dummy/desi/spectro/redux/
%set_env SPECPROD=dc17a2
In [3]:
import os
def check_env():
for env in ('DESI_SPECTRO_REDUX', 'SPECPROD'):
if env in os.environ:
print('{} environment set to {}'.format(env, os.getenv(env)))
else:
print('Required environment variable {} not set!'.format(env))
In [4]:
check_env()
In [5]:
basedir = os.path.join(os.getenv("DESI_SPECTRO_REDUX"),os.getenv("SPECPROD"),"spectra-64")
subdir = os.listdir(basedir)
print(basedir)
print(subdir)
In [6]:
basedir = os.path.join(basedir,subdir[0])
subdir = os.listdir(basedir)
pixnums = np.array([int(pixnum) for pixnum in subdir])
print(basedir)
print(subdir)
In [7]:
basedir = os.path.join(basedir,subdir[0])
subdir = os.listdir(basedir)
print(basedir)
print(subdir)
The directory structure is:
$DESI_SPECTRO_REDUX/$SPECTRO/spectra-{nside}/{group}/{pix}/*-{nside}-{pix}.fits`
where group = nside//100
. For example for nside=64
and pixel=16879
:
$DESI_SPECTRO_REDUX/$SPECTRO/spectra-64/168/16879/spectra-64-16879.fits
$DESI_SPECTRO_REDUX/$SPECTRO/spectra-64/168/16879/zbest-64-16879.fits
where the first file contains the spectra and the second file contains information on the best-fit redshifts from the redrock code.
In this directory-and-file structure, nside
is the HEALPix resolution and pixel
is the pixel number at that resolution. If you aren't familiar with HEALPix, it is an equal-area splitting of the sphere, where the sphere is initially divided into 12 equal-area pixels, and then each of those pixels is divided into 4 new equal-area pixels as nside
increases (a quad tree). Schematically, here's how nside
corresponds to pixel area in degrees:
In [8]:
sphere_area = 4*180.*180./np.pi
hpx_area = sphere_area
for i in range(10):
nside = 2**i
if i == 0:
hpx_area/=12.
else:
hpx_area/=4.
print(nside,hpx_area)
The nside
at which the example spectra are grouped therefore corresponds to ~0.84 sq. deg. Note that I could have checked this more easily (but less pedagogically) using the useful python HEALPix library:
In [9]:
hp.nside2pixarea(64,degrees=True)
Out[9]:
The spectra are stored in this fashion so that they are grouped (roughly) contiguously on the sky, with a reasonable number of spectra in each directory. It's easy to derive the approximate RA/Dec near each pixel number (note that we sneakily stored the pixel numbers as pixnums
when we were examining the directory structure):
In [10]:
ras, decs = hp.pix2ang(64, pixnums, nest=True, lonlat=True)
Note that the DESI Data Model will always use the NESTED scheme for HEALPix.
In [11]:
zipper = zip(pixnums,ras,decs)
[print("Pixel(nside=64): {} RA: {} DEC: {}".format(pix,ra,dec)) for pix,ra,dec in zipper]
Out[11]:
What about the Data Model for the spectra themselves? Let's take a look (remember that we built a directory containing a spectrum as basedir
above):
In [12]:
specfilename = glob(os.path.join(basedir,'spectra*fits'))[0]
DM = fitsio.FITS(specfilename)
DM
Out[12]:
The first extension FIBERMAP
stores the mapping of the imaging information used to target and place a fiber on the source:
In [13]:
fm = fitsio.read(specfilename,1)
fm.dtype.descr
Out[13]:
TARGETID
is the unique mapping from target information to a fiber. So, if you wanted to look up full imaging information for a spectrum, you can map back to target files using TARGETID
.
Just out of interest, are the RAs and Decs of these objects in the expected HEALPix pixel?
In [14]:
pixnums = hp.ang2pix(64, fm["RA_TARGET"], fm["DEC_TARGET"], nest=True, lonlat=True)
print(np.min(pixnums),np.max(pixnums))
print(specfilename)
In [15]:
plt.plot(fm["RA_TARGET"],fm["DEC_TARGET"],'b.')
Out[15]:
In [16]:
DM
Out[16]:
The remaining extensions store the wavelength, flux, inverse variance on the flux, mask and resolution matrix for the B, R and Z arms of the spectrograph. Let's determine the wavelength coverage of each spectrograph:
In [17]:
bwave = fitsio.read(specfilename,"B_WAVELENGTH")
rwave = fitsio.read(specfilename,"R_WAVELENGTH")
zwave = fitsio.read(specfilename,"Z_WAVELENGTH")
print("B coverage: {:.1f} to {:.1f} Angstroms".format(np.min(bwave),np.max(bwave)))
print("R coverage: {:.1f} to {:.1f} Angstroms".format(np.min(rwave),np.max(rwave)))
print("Z coverage: {:.1f} to {:.1f} Angstroms".format(np.min(zwave),np.max(zwave)))
Now that we understand the Data Model, let's plot some spectra. To start, let's use the file we've already been manipulating and read in the flux to go with the wavelengths we already have.
In [18]:
wave = np.hstack([bwave,rwave,zwave])
In [19]:
bflux = fitsio.read(specfilename,"B_FLUX")
rflux = fitsio.read(specfilename,"R_FLUX")
zflux = fitsio.read(specfilename,"Z_FLUX")
flux = np.hstack([bflux,rflux,zflux])
Note that the wavelength arrays are 1-D (every spectrum in the spectral file is mapped to the same binning in wavelength) but the flux array (and flux_ivar, mask etc. arrays) are 2-D, because they contain multiple spectra:
In [20]:
print(wave.shape)
print(flux.shape)
Let's plot the zeroth spectrum in this file (i.e. in this HEALPix grouping):
In [21]:
spectrum = 0
plt.plot(wave,flux[spectrum])
Out[21]:
Let's plot it again, but color-coding for the arms of the spectrograph:
In [22]:
plt.plot(bwave,bflux[spectrum],color='b')
plt.plot(rwave,rflux[spectrum],color='r')
plt.plot(zwave,zflux[spectrum],color='m')
Out[22]:
What about if we only want to plot spectra of certain target classes? The targeting information is stored in the DESI_TARGET
, BGS_TARGET
and MWS_TARGET
entries of the fibermap array:
In [23]:
fm.dtype.descr
Out[23]:
and which target corresponds to which targeting bit is stored in the desitarget mask (we imported this near the beginning of the notebook.
In [24]:
desi_mask
Out[24]:
Let's find the indexes of all standard F-stars in the spectral file:
In [25]:
stds = np.where(fm["DESI_TARGET"] & desi_mask["STD_FSTAR"])[0]
print(stds)
Where were these located on the original plate-fiber mapping?
In [26]:
plt.plot(fm["RA_TARGET"],fm["DEC_TARGET"],'b,')
plt.plot(fm["RA_TARGET"][stds],fm["DEC_TARGET"][stds],'kx')
Out[26]:
It might seem strange that there are 10 standard stars, but only 4 crosses in the plot. This is because the density of standard stars is somewhat low in this data challenge, and so some standard stars are observed multiple times. These stars will have the same TARGETID
, e.g.:
In [27]:
zipper = zip(stds,fm["TARGETID"][stds],fm["RA_TARGET"][stds],fm["DEC_TARGET"][stds])
[print("INDEX: {} TARGETID: {} RA: {} DEC: {}".format(i,tid,ra,dec)) for i,tid,ra,dec in zipper]
Out[27]:
This will not be the case for the real survey (or for subsequent iterations of the Data Challenge)!
Let's take a look at the spectra of these standard stars:
In [28]:
for i in range(len(stds)):
plt.plot(wave,flux[stds[i]],'g')
plt.show()
These seem realistic. Let's zoom in on some of the Balmer series for the zeroth standard:
In [29]:
Balmer = [4102,4341,4861,6563]
halfwindow = 50
for i in range(len(Balmer)):
plt.axis([Balmer[i]-halfwindow,Balmer[i]+halfwindow,0,np.max(flux[stds[0]])])
plt.plot(wave,flux[stds[0]])
plt.show()
The directory from which we took these spectra, also contains information on the best-fit redshifts for the spectra from the redrock code. Let's read that file and examine its contents:
In [30]:
zfilename = glob(os.path.join(basedir,'zbest*fits'))[0]
zs = fitsio.read(zfilename)
zs.dtype.descr
Out[30]:
Note that there are a different number of entries in the redshift and spectral file, meaning that there isn't a row-by-row mapping between spectra and redshifts. Ultimately, this will be fixed so that there is a row-by-row mapping between these files.
In [31]:
print(zs.shape)
print(bflux.shape)
But, the TARGETID
(which is intended to be unique) is in this file, too, allowing sources to be uniquely mapped from targeting, to spectra, to redshift. Let's extract all sources that were targeted as quasars using the fibermap information from the spectral file, and plot the first 20:
In [32]:
qsos = np.where(fm["DESI_TARGET"] & desi_mask["QSO"])[0]
for i in range(20):
plt.plot(wave,flux[qsos[i]],'b')
plt.show()
Let's match these quasar targets to the redshift file on TARGETID
to extract their best-fit redshifts from redrock
:
In [33]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
dd[item].append(index)
zqsos = [index for item in fm[qsos]["TARGETID"] for index in dd[item] if item in dd]
That might be hard to follow at first glance, but all I did was use some "standard" python syntax to match the indices in zs
(the ordering of objects in the redrock
redshift file) to those for quasars in fm
(the ordering of quasars in the fibermap file), on the unique TARGETID
, such that the indices stored in qsos
for fm
point to the corresponding indices in zqsos
for zs
. This might help illustrate the result:
In [34]:
zs[zqsos]["TARGETID"][0:7], fm[qsos]["TARGETID"][0:7]
Out[34]:
Let's see what best-fit template redrock
assigned to each quasar. This information is stored in the SPECTYPE
column.
In [35]:
zs[zqsos]["SPECTYPE"]
Out[35]:
Here, we've (partially) encountered a second minor bug in the Data Challenge: redrock
is currently not doing a fantastic job of fitting quasars. It correctly identified some, though. And, of course, not everything targeted as a quasar will turn out to actually be a quasar (there are some contaminants).
If we'd instead performed this same check for the standard stars, everything would have looked more reasonable:
In [36]:
dd = defaultdict(list)
for index, item in enumerate(zs["TARGETID"]):
dd[item].append(index)
zstds = [index for item in fm[stds]["TARGETID"] for index in dd[item] if item in dd]
For stars, we can also display the type of star that redrock
fit (this is stored in the SUBTYPE
column):
In [37]:
zipper = zip(zs[zstds]["SUBTYPE"],zs[zstds]["SPECTYPE"])
[ print("{}-{}".format(sub.decode('utf-8'),spec.decode('utf-8'))) for sub,spec in zipper ]
Out[37]:
(here the conversion to utf-8
is simply for display purposes because the strings in SUBTYPE
and SPECTYPE
are stored as bytes instead of unicode).
OK, back to our quasars. Let's plot the quasar targets that are identified as quasars , but add a label for the SPECTYPE
and the redshift fit by redrock
. I'll also over-plot some (approximate) typical quasar emission lines at the redrock redshift (if those lines would fall in the DESI wavelength coverage):
In [38]:
qsoid = np.where(zs[zqsos]["SPECTYPE"] == b'QSO ')[0]
#ADM bug...to be uncommented later
#qsoid = np.where(zs[zqsos]["SPECTYPE"] == b'QSO')[0]
qsolines = np.array([1216,1546,1906,2800,4853,4960,5008])
for i in range(len(qsoid)):
spectype = zs[zqsos[qsoid[i]]]["SPECTYPE"].decode('utf-8')
z = zs[zqsos[qsoid[i]]]["Z"]
plt.plot(wave,flux[qsos[qsoid[i]]],'b')
plt.suptitle("{}, z={:.3f}".format(spectype,z))
for line in qsolines:
if ((1+z)*line > np.min(wave)) & ((1+z)*line < np.max(wave)):
plt.plot([(1+z)*line,(1+z)*line],[np.min(flux[qsos[qsoid[i]]]),np.max(flux[qsos[qsoid[i]]])],'yo')
plt.show()
Not bad at all!
Note that, for illustrative purposes, we discussed the Data Model in detail and read in the required files individually from that Data Model. But, the DESI data team has also developed standalone functions in desispec.io
to facilitate reading in the plethora of information in the spectral files. For example:
In [39]:
from desispec import io
In [40]:
specobj = io.read_spectra(specfilename)
The wavelengths and flux in each band are then available as dictionaries in the wave
and flux
attributes:
In [41]:
specobj.wave
Out[41]:
In [42]:
specobj.flux
Out[42]:
So, to plot the "zeroth" spectrum:
In [43]:
spectrum = 0
plt.plot(specobj.wave["b"],specobj.flux["b"][spectrum],color='b')
plt.plot(specobj.wave["r"],specobj.flux["r"][spectrum],color='r')
plt.plot(specobj.wave["z"],specobj.flux["z"][spectrum],color='m')
Out[43]:
which should look very similar to one of the first plots we made earlier in the tutorial.
The fibermap information is available as a table in the fibermap
attribute:
In [44]:
specobj.fibermap
Out[44]:
and the TARGETID
for each source is available via its own method:
In [45]:
specobj.target_ids()
Out[45]:
All of the information that could be read in from the different extensions of the spectral file can be retrieved from the specobj
object. Here's what's available:
In [47]:
dir(specobj)
Out[47]:
In [ ]: