Documentation for OSMOSreduce software

OSMOSreduce reduces multi-slit spectroscopic observations from OSMOS on the 2.4m Hiltner Telescope and estimate redshifts.

Installation

OSMOSreduce can be downloaded from Dan Gifford's GitHub page at https://github.com/giffordw/OSMOSreduce There are no steps to "install" the software in a traditional sense. Instead, the user needs to organize their observations in a pre-defined directory structure inside the OSMOSreduce folder. This will be covered later.

Requirements

MDMreduce depends on several Python standard modules, common scientific/data modules, and a few other modules. Here is the list of the non-standard modules:

astropy - http://www.astropy.org/
matplotlib - http://matplotlib.org/
numpy - http://www.numpy.org/
scipy - http://www.scipy.org/
pyds9 - http://hea-www.harvard.edu/RD/pyds9/
pandas - http://pandas.pydata.org/
sqlcl - included in MDMreduce GitHub download http://skyserver.sdss.org/public/en/help/download/sqlcl/
proc4k.py - included in MDMreduce GitHub download - http://www.astronomy.ohio-state.edu/~martini/osmos/proc4k.py
ds9 - http://ds9.si.edu/site/Home.html

One easy way the user can install all the relevent scientific/data modules is to utilize the Anaconda http://continuum.io/downloads package from Continuum Analytics for you Python distribution.

Observations

When at the telescope, there are several observations one typically takes to assist in the data reduction.

comps: These are taken of an arc lamp (Argon/Xenon/Helium/Neon) and are used to wavelength calibrate the spectra
flats: Taken either at twilight (sky flats) or of an illuminated screen. Taken with the mask + disperser in and used to divide out chip sensitivity variations.
science image(s): It is wise to take several science exposures. Our observations of low redshift galaxies require 2 x 30min integrations.
offset sky: When doing sky subtraction, it is common to take light that is not the galaxy light from each slit in order to background subtract from the science images. Currently, OSMOSreduce does not have the capability to do this. Instead, we have been taking an "offset sky" image where we shift the mask by an arbitrary amount off-target and expose for roughly 1/2 the science integration time. We then can easily line up the masks and do the sky subtraction. This is inefficient at the telescope and later versions of the code will use the non-galaxy light directly from the slits.

Directory Structure

To run OSMOSreduce, the observations and support files need to be aranged in a defined directory structure. Starting at the top level, here is what the OSMOSreduce directory should look like:

In here is all the necessary code an calibrations files for reduction. The main program is OSMOSreduce.py and the cluster we will be reducing as an example is C4_0199.

The important instructions for the user are how to set up your cluster directory. Let's look inside the cluster directory here and see how it needs to be structured.

When at the telescope, the observer will take arcs/comps of the various lamps for wavelength calibration, flat field images, and science images. For this initial reduction testing, we also took images called offset_skies where we offset the telescope and took an additional exposure with empty sky through the slits rather than take the sky around each galaxy. Obviously, the latter capability will eventually be built into the code, however, it is not yet available. The alignment_data directory is not needed for reduction.

The other two important files in this directory are the mosaic_id.fits file and the osmos.id.oms file. The mosaic file is the image of the cluster used to build the mask. This will help in identifying which slits lie over galaxies, alignment stars, or empty sky. It does not need to be a 'mosaic', but the name of the file must have the format 'mosaic_anythingyouwant.fits'. The osmos file is what was output by the mask making software and was submitted to machine the masks. It can be called anything, but must have the extension .oms.

Now let's take a look inside each of these observed data directories:

Each directory has the fits images taken at the telescope aranged inside them. The files are named with the format: Some_id.xxxx.fits where the x's represent a 4 digit number. It is crucial that the files are named in this way, and that is the default output from the instrument itself. It does not matter what the 4 digit number is after the name, so long as it is a 4 digit number.

Running OSMOSreduce

OSMOSreduce.py is run from the OSMOSreduce directory where the code lives. To run the code, enter the following at the command line:

$ python OSMOSreduce.py C4_0199

If you want to run the code in IPython, you can:



In [ ]:

    
run OSMOSreduce.py C4_0199

So now we get into the code itself. The very first thing that happens after the module imports is some basic things like reading the cluster ID input, defining some constants, and reading in the mosaic file.

Then, the code will check and see if any reduced files exist. Paul has kindly provided some initial reduction code written in Python called proc4k.py. I include this module in the OSMOSreduce package. This code Perform the overscan subtraction and remove the relative gain differences for a single R4k image or a list of R4K images. When run, the code outputs new .fits files with the naming scheme 'name.xxxxb.fits'. Therefore, the code looks for these files and if they do not exist in the relevent directories, the files that are present are initially reduced with this module. Once reduced, the code reads in the reduced fits files.

Then, the code parses the .oms file for relevant information on the slits such as position in the x/y direction on the chip, RA/DEC on the sky, and slit width/length. All of these are saved into a Pandas Dataframe which will form the basis of our sample. Once we have all the positions, the code utilizes the get_photoz.py (which calls sqlcl.py) module to query the SDSS database for information on the objects in the slits. The object information I query is: ['#objID','SpecObjID','ra','dec','umag','gmag','rmag','imag','zmag','spec_z','photo_z','extra']

It is important to know that this query can only perform 60 queries per min. This is likely more than the number of slits on your mask, but if the code errors and needs to be rerun, it is possible to see an error due to maxing out the query limit. This is solved by simply waiting a minute and running again.

Once the query has been run, the program will load up a ds9 window with a split view. On the command line, it will ask the user if the images have loaded (y/n). Sometimes ds9 can take a bit to load up, so wait until it does before hitting (y).

Common Error: You must close all open ds9 instances before running OSMOSreduce.py. The code attempts to connect with ds9, but has trouble doing so with multiple windows open.

Labeling Slit Objects

After hitting (y) when the images have loaded, ds9 will zoom in on the first slit on your mask and overlay its position on the mosaic image you have in the cluster directory. In the terminal window, it will ask you if this slit is over a galaxy (g), reference star (r), or empty sky (s). These labels are used to decide which slits will be reduced later. Once you hit an appropriate key, the program will automatically move to the next object until you have run through all slits.

Enclosing Dispersed Spectra

Once all slits have been labeled, the ds9 window will switch to a raw image of a 'comp' with all the dispersed light on the chip. Like before, the program will automatically zoom in on the first dispersed spectrum from a labeled galaxy and draw a long rectangular box approximately over the spectrum. The user's job is to click+drag (or click and use arrow keys) to move this box so that it encloses as much of the light from the slit as possible. The user can also make the box wider to account for 'bending' spectra with the draggable points on the corners of the green rectangle. The edges of the rectangle are off the chip area, so moving the view slightly off the chip should reveal the edge. In the terminal window, it will ask the question if this is a good or bad spectra (y/n). For the mean time, it doesn't matter what you label it as, so label everything as (y) until this feature becomes useful. Once the question is answered (you need to hit 'Enter' this time and not just key press) it will move to the next galaxy dispersed slit to repeat the process.

Image reductions

The first time through the reduction process for a cluster, the science images need to be properly cleaned, flatfield reduced and sky subtracted.

Each of the arcs, flats, offset skies, and science images are cleaned with a median filter to remove cosmic ray strikes. I first median filter each pixel with its surrounding 9 pixels and save this to an image copy. I then look for all pixels in the original image that deviate from the median value by more than 8 sigma and replace only those pixels with the median value. This step is somewhat computationally expensive and may take ~30min on a typical laptop. (Good time to go grab some tea)

Once the images are all cleaned, they are either respectively added together (arc, science) or a median is taken (flats).

Then it's time to sky subtract. Because our offset sky exposure is not equal to our science exposure, we must compute the scaling factor to multipy the offset sky spectra by to best remove the sky from our science images. This is achieved by trying a variety of scaling constants between n-1 and n+1 where n is the number of science exposures taken. The code then calculates the total residual from (science - offset * scaling factor) for the empty sky slits in the images (labeled in the step above). The values of the scaling factor is chosen that minimizes the abs(residual) and the final science image is created by subtracting this sky and dividing by the cleaned flat field image.

Wavelength Calibration

Wavelength calibration for multi-slit spectrographs can be tricky because slits are not only vertically separated, but are offset randomly in the horizonal chip direction as well. This means that each slit needs to be calibrated independently to find the correct pixel -> wavelength mapping.

As mentioned earlier, a series of 'arcs' of one or more lamps are taken while at the telescope. Also, hopefully you did not skip the step of listing which lamps were used at the beginning of OSMOSreduce.py. If you did, quit() and edit the file. You can quickly skip all the completed steps before this point to return to the wavelength calibration stage.

The first step is to match the arc spectrum with the corresponding lines expected from the lamps. This program approaches this in 2 steps. The first is a rough guess by "shifting" and "stretching" the arc spectrum (blue) to match the reference lines (red). This is done in a Matplotlib GUI and an example is shown below:

So what are these sliders doing? The way we have chosen to map pixel value -> wavelength is through a high order (5th) polynomial where the independent variable is centered on the x-pixel value. A low order example would be: A(x-slit_x)^2 + B(x-slit_x) + C. Obviously, it would be pointless to expect the user to optimize more than a few of the polynomial constants at a time, so in the first stage you are estimating the 3 lowest order constants labeled the "shift", "stretch", and "quad" terms. By dragging the sliders below the plot, you are simply raising or lowering the strength of each term.

There are 5 sliders you can utilize: Shift, Stretch, Fine Shift, Fine Stretch, and Fine Quad. These should be somewhat obvious what terms they affect, and by playing with them it is possible to get a feel for how they affect the spectrum as a whole.

A good tip is to first start with the shift/fine shift sliders to roughly match the spectum with its reference lines, and then use the Fine Quad term to change the overall shape. The pre-estimated stretch turns out to be a good approximation and usually doesn't need to be adjusted

Once the spectral emission lines have been closely matched with their reference lines, the plot can be closed. Another GUI appears with a zoomed in view of the lowest wavelength reference lines and the spectrum around them. See below:

There are several things to point out in this view. The first is the plot itself. There is the arc spectrum (blue), reference lines (red), a thick red line covering the top half of one reference line, a cyan line extending up halfway from the bottom of the plot, and one orange circle sitting on top of the arc spectrum. In this step of the reduction, the user will be walked through each reference line and asked whether it is correctly matched with a spectral peak or not. The thick red line represents which reference line the user is comparing with at the moment. The cyan line and orange circle show the initial guess of what peak corresponds to that reference line based on the step before. If the guess is incorrect as it is for the first line above, the user can click near the correct peak on the spectrum and the orange circle will move to that peak. See below:

The cyan line will remain at the original guess, so the user's focus is on the orange circle. Once the circle is selecting the correct peak, the user can click the "Replace (m)" button on the GUI or simply hit the "m" key on their keyboard. Doing so will automatically advance to the next reference line.

If the user took their time on the previous step of sliding the spectrum to match the reference lines, many of the reference lines and their correct peaks will already be matched up. See below:

In this case, the "Next (n)" button can be clicked or the "n" key on keyboard pressed. Like before, the program will automatically advance to the next line.

Of course, there is no need to use every reference line and a corresponding peak to calibrate the spectrum. Certainly, the more lines/peaks identified and used the more accurate and robust the solution will be, but there are several reasons why you shouldn't use all the lines. The first is that it is time consuming. We tested this program on multi-slit observations of galaxy clusters which had ~25-30 slits/mask. Using some fraction of carefully spaced reference lines saves a lot of time. In addition, depending on the arc exposure time, some of the lines can be hard to see and match. Rather than risking mis-matching lines, the user can "delete" them using the "Delete (j)" button or pressing "j" on their keyboard. This does not actually delete the line from the list permanently. Rather, it simply tells the program to not use this line in the calibration of that particular slit.

Important Note: The "b" key on the keyboard can be hit at any time to 'go back' a step in the line/peak matching step. However **A DELETION CANNOT BE UNDONE** so be sure you don't want to use a line before removing it from that slit's calibration.

Once the end of the spectrum is reached, there may be additional reference lines that exist at higher wavelengths. To disregard those and fit a solution to calibrate the slit with the lines matched, simply hit the "Finish" button on the GUI. See below:

The program will automatically move onto the next slit for calibration. This will continue until all slits with a galaxy in them are wavelength calibrated.

Redshift Estimation

Currently, the program is set up to identify redshifts of low-redshift galaxies using their H and K lines found primarily in early-type galaxies. These absorption features have a rest wavelength around 3950 angstroms. When you get to this stage, a question is posed at the command line for the user:

Your sample contains X SDSS galaxies with spectra. Would you like to use a redshift prior that is the median of these galaxies (s)? Would you like to specify your own prior for each galaxy (q)? Press (p) to use the sdss photo_z as a prior. Press (z) to not use any prior: z

To be explicit, if you have observed some SDSS galaxies in your sample, and you believe the other galaxies observed are clustered in redshift space around these galaxies, using the median of these galactic redshifts as a redshift prior is a good first guess. The prior is gaussian in nature with a 1-sigma width in redshift of 0.06. This prior is necessary to help identify the local peaks in the likelihood space generated in the cross-correlation between the template and real spectra. Similarly, the prior can be centered on the photo_z value for each galaxy, or the spectrum is shown to the user first, and the prior is given by the user for each galaxy based on their belief of where the H&K lines are located. The latter is best used if not estimating redshifts of clustered galaxies.

SDSS (or photo_z), median redshift, or no prior

If either the SDSS spectra or photo_z prior is chosen, the first wavelength calibrated galaxy spectrum will be displayed:

In addition to the spectrum, there are several red, blue, and orange vertical lines. These are reference positions for noteable lines in galaxies.

OII 3727/9.0A (blue)
K 3934.7A (red)
H 3969.6A (red)
Hdelta 4102.9A (orange)
G 4305.6A (orange)
Hbeta 4862.7A (orange)
OIII 4960.3A (blue)
OIII 5008.2A (blue)
Mg 5176.7A (orange)

The lines will not appear at their reference wavelengths, but rather are redshifted with the estimated redshift of the galaxy. In the example above, this is the correct guess as the red H&K lines match well with the clear absorption lines between 4200-4300A. If the lines did not match the absorption features, the user then needs to correct this difference by right clicking on the figure where the H&K lines appear to be located. Doing so closes the plot and applies a new prior where the user has specified. If the lines are over their respective features, simply change the radio button to "clear" and hit "Accept & Close". No right clicking is necessary.

If, as in this case, the spectral features are very clear, the radio button on the right may be changed to "clear". This is simply for the user's benefit later on and is recorded as a column in the output table of results. If all is well, the "Accept & Close" button may be hit, and the next galaxy spectrum will appear with the intial redshift guess applied to the vertical reference lines.

Below the spectrum there is a plot that shows the correlation values between the observed and template spectrum for different redshift guesses. If you see a clear spike somewhere (as we do above) that is an indication of a robust solution. The dashed line is the redshift guess.

Now, let's say that after the initial guess, the lines do not match up with the spectrum and the user attempts to correct with a right click. However, after the right click, the lines still do not line up with the correct features. The user has 2 options.

This most often happens when there is no clear solution in redshift with the given template and spectrum. If this is the case, it is best to simply leave the radio button checked "Unclear" and hit "Accept & Close" to move onto the next spectrum. This galaxy can then be ignored in the follow-up analysis as not having high enough signal-to-noise.
If the lines are clear, once the user has right clicked and a new guess has appeared, the user has the ability to left click and drag the spectrum to match the lines. Do so only as a last resort as this results in fairly course redshift estimation, but is an option if need be.

Once everything is lined up, hit the "Accept & Close" button and move onto the next galaxy.

Individual prior

If the user chooses to visually set a prior, then they will initally be shown the entire galaxy spectrum. They can then zoom into whatever portion of the spectrum they wish to search for the H & K lines. The following statement is shown on the command line:

Take a look at the plotted galaxy spectrum and note, approximately, at what wavelength do the H and K lines exist? Then close the plot and enter that wavelength in angstroms.

Once the plot is closed, the user can enter the center wavelength around the H & K lines on the command line and that is taken as the "prior" location.