DR8b

The goal of this notebook is to document how we ran DR8b, including calibration files, with the updated data model and using the burst buffer on Cori.

John Moustakas
Siena College
2019 March 10

Many thanks to Dustin Lang, Adam Myers, Eddie Schlafly, David Schlegel, Martin Landriau, and Stephen Bailey.

A. Preliminaries

A1. Define $LEGACY_SURVEY_DIR.

First, choose a new, empty top-level directory on project and create it:

export LEGACY_SURVEY_DIR=/global/project/projectdirs/cosmo/work/legacysurvey/dr8b
mkdir -p $LEGACY_SURVEY_DIR
cd $LEGACY_SURVEY_DIR

Note: the same $LEGACY_SURVEY_DIR environment variable will need to be defined within the scripts below so that they are self-contained.

Hereafter I will assume that all scripts are launched from the $LEGACY_SURVEY_DIR directory.

Next, create soft links to the CP-reduced imaging data and the WISE background models:

mkdir -p images calib/wise
ln -s /global/project/projectdirs/cosmo/staging/90prime images/90prime
ln -s /global/project/projectdirs/cosmo/staging/mosaic images/mosaic
ln -s /global/project/projectdirs/cosmo/staging/decam images/decam
ln -s /project/projectdirs/cosmo/work/wise/unwise_catalog/dr1/mod calib/wise/modelsky

And finally grab a copy of the survey-bricks file, which we will need below:

cp /global/project/projectdirs/cosmo/work/legacysurvey/dr7/survey-bricks.fits.gz .

A2. Access (or create) the burst buffer.

Initially, Dustin Lang created a 40TB burst-buffer reservation called "DR8". You can think of this as a mounted external drive, where large files can be written and read without the significant overhead associated with $SCRATCH, although eventually the results will be copied onto project, as described below.

To access this file system for the first time you have to create a configuration file (just once).

echo "#DW persistentdw name=DR8" > bb.conf

The files in the burst buffer can be accessed only from an interactive Cori node. Let's do this (just once) and make a dedicated subdirectory to keep our outputs tidy:

salloc -N 1 -C haswell -q interactive -t 00:10:00 --bbf=bb.conf
mkdir -p $DW_PERSISTENT_STRIPED_DR8/dr8b

Note that the $DW_PERSISTENT_STRIPED_DR8 environment variable must be used always, as every user will have a different absolute path.

For the record, a new reservation can be made (if necessary) by submitting the following SLURM script to the queue:

#! /bin/bash
#SBATCH -q debug
#SBATCH -N 1
#SBATCH -C haswell
#SBATCH -t 00:05:00
#BB create_persistent name=BBNAME capacity=50000GB access_mode=striped type=scratch

where BBNAME is the desired name of the reservation.

A3. Set up software dependencies.

ToDo: Add Docker instructions (link to Adam's NB).
ToDo: Add instructions for compiling tractor and astrometry.net, if needed (separate notebook).
Note: In DR8b we had to rely on a local installations of qdo and tractor and astrometry.net (on edison), but here we document the ideal setup.

Most of the code dependencies are in the desiconda imaging stack (like tractor and astrometry.net), which we will source in our setup script, below, but we usually depend on a recent (hopefully tagged) version of legacyzpts and legacypipe.

cd $LEGACY_SURVEY_DIR
mkdir -p code ; cd code
git clone git@github.com:legacysurvey/legacyzpts.git ; cd legacyzpts
git checkout tags/dr8.0 ; cd ..
git clone git@github.com:legacysurvey/legacypipe.git
cd ..

Next, make a local copy (for convenience of editing, temporarily changing paths, etc.) of the bash script we use to set up all the necessary dependencies and myriad environment variables:

cp code/legacypipe/doc/dr8/dr8-env.sh .

Be sure to update this script with the appropriate $LEGACY_SURVEY_DIR path, and you will also need to add the following lines (and get the appropriate database password) in order to complete the qdo setup (for the desiconda user these lines are in the .bashrc.ext file):

export QDO_BACKEND=postgres
export QDO_BATCH_PROFILE=cori
export QDO_DB_HOST=nerscdb03.nersc.gov
export QDO_DB_NAME=desirun
export QDO_DB_USER=desirun_admin
export QDO_DB_PASS=ask_someone_on_the_imaging_team

A4. Create the input image lists.

For DR8 Eddie Schlafly and David Schlegel inspected all the DECam, 90prime, and mosaic imaging data on-disk and created three FITS files which should be used to define the input data:

  • dr8-ondisk-decam.fits
  • dr8-ondisk-90prime.fits
  • dr8-ondisk-mosaic.fits

We use these tables (cutting on QKEEP==True) to create our input image lists. To keep everything tidy, create a new $LEGACY_SURVEY_DIR/image-lists subdirectory, copy the dr8-ondisk-*.fits files there, source the dr8-env.sh file, and then run the following snippet of code in an ipython session.

import os
import fitsio
for camera in ('decam', 'mosaic', '90prime'):
    data = fitsio.read('dr8-ondisk-{}.fits'.format(camera), upper=True)
    with open('image-list-{}.txt'.format(camera), 'w') as flist:
        for imfile in data[data['QKEEP']]['FILENAME']:
            flist.write('{}\n'.format(os.path.join(camera, imfile.decode('utf-8').strip())))

The resulting output files are:

  • image-list-decam.txt (121123 images in DR8, 5515 in DR8b)
  • image-list-90prime.txt (34206 images in DR8, 1073 in DR8b)
  • image-list-mosaic.txt (61049 images in DR8, 1374 in DR8b)

B. Generate the calibration files.

B1. Load and launch qdo tasks (lists of images).

We use qdo to manage the myriad of tasks (one task = one image), and the shell script dr8-zpts.sh to produce the calibration files we need. First, load the tasks into the database with

qdo load dr8b-calibs-decam ./image-list-decam.txt
qdo load dr8b-calibs-90prime ./image-list-90prime.txt
qdo load dr8b-calibs-mosaic ./image-list-mosaic.txt

Finally, utilize various queues to get everything done. For example, for DECam it's not crazy to request 256 MPI tasks with 8 cores per task (equivalent to 256/8=64 nodes on cori). Using the debug and regular queue for 30 and 180 minutes, respectively, (and the burst buffer) would look like

qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 --walltime=00:30:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
  --batchqueue debug --keep_env --batchopts "--bbf=bb.conf"
qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 --walltime=03:00:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
  --batchqueue regular --keep_env --batchopts "--bbf=bb.conf"

The game, of course, is to balance throughput and wait time, although in general the debug queues work quite well, even on the DECam images (with ~60 CCDs each).

Alternatively, one could use the shared queue with

qdo launch dr8b-calibs-decam 1 --cores_per_worker 8 --walltime=04:00:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
  --batchqueue shared --keep_env --batchopts "--bbf=bb.conf -a 0-99"

which may also work well in production.

Note that for the 90prime and mosaic cameras (which only have 4 CCDs) a more typical request would be

qdo launch dr8b-calibs-mosaic 512 --cores_per_worker 4 --walltime=00:30:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
  --batchqueue debug --keep_env --batchopts "--bbf=bb.conf"

For the record, dr8-zpts.sh will write files out with the following directory structure (all relative to $DW_PERSISTENT_STRIPED_DR8/dr8b):

zpts
  90prime
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
  decam
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
  mosaic
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
calib
  90prime
    psfex
    psfex-merged/?????/90prime-????????.fits
    se
    splinesky
    splinesky-merged/?????/90prime-????????.fits
  decam
    psfex
    psfex-merged/?????/decam-????????.fits
    se
    splinesky
    splinesky-merged/?????/decam-????????.fits
  mosaic
    psfex
    psfex-merged/?????/mosaic-????????.fits
    se
    splinesky
    splinesky-merged/?????/mosaic-????????.fits

The only files we care about, however, are all the files in the zpts, splinesky-merged, and psfex-merged directories; the files in the SExtractor (se), psfex, and splinesky directories are intermittent and will be deleted in the future.

B2. Some useful qdo commands.

The following qdo commands may be useful:

qdo list 
qdo do dr8b-calibs-decam --script dr8-zpts.sh # run the script interactively
qdo status dr8b-calibs-decam                  # check on current status
qdo retry dr8b-calibs-decam                   # re-load failed jobs (presumably after debugging the code)
qdo recover dr8b-calibs-decam --dead          # re-load jobs that hung because the queue timed out

B3. Rsync everything to project.

Once all calibrations are done the necessary outputs should be copied to project:

cd $DW_PERSISTENT_SCRATCH/dr8b
rsync -auv zpts $LEGACY_SURVEY_DIR >> rsync-zpts.log 2>&1 &
rsync -auvR calib/*/*-merged $LEGACY_SURVEY_DIR >> rsync-calib.log 2>&1 &

B4. Build the merged survey-ccds and annotated-ccds files (and their KD-tree cousins).

  • ToDo: Document a slurm script for doing this.
  • ToDo: Before merging, validate the input image list and output catalogs. For DR8b there's a small script misc/validate-calibs.py, but it won't scale well.

Next, we need to merge all the individual zeropoint files to generate the all-powerful survey-ccds and annotated-ccds files, which we can accomplish with dr8-merge-zpts.sh:

./$LEGACYPIPE_DIR/doc/dr8/dr8-merge-zpts.sh

This script builds a simple ASCII file list of the individual zeropoint tables (ignoring files with the "debug" suffix) and passes them to legacyzpts/legacy_zeropoints_merge.py and, subsequently, legacypipe/create_kdtrees.py, to create the following files:

annotated-ccds-dr8b-decam-nocuts.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.fits  
annotated-ccds-dr8b-decam-nocuts.kd.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.kd.fits

B5. Update ccd_cuts and apply depth cuts (optional).

  • ToDo: Document this step (see notebook/notes by Adam and Dustin).

B6. Generate QA of the input CCDs tables.

  • ToDo: Generate a standard set of plots showing the coverage of the CCDs files, scatter in zeropoints, comparisons to previous DRs, etc.

C. Run the legacypipe pipeline!

C1. Set up the runbrick-decam and runbrick-90prime-mosaic directories.

The pipeline has to be run separately for the DECam and 90Prime+Mosaic datasets, so we need to create and set up different dedicated directories. For example, for DECam, do:

mkdir -p runbrick-decam
cd runbrick-decam
ln -s ../bb.conf bb.conf
ln -s ../calib calib
ln -s ../code code
ln -s ../images images
ln -s ../survey-bricks.fits.gz survey-bricks.fits.gz
ln -s ../survey-ccds-dr8b-decam-nocuts.kd.fits survey-ccds-dr8b-decam-nocuts.kd.fits

Next, we need a shell script which sets up runbrick specifically for this input directory. Fortunately, all we have to do is copy our generic shell script which sets up our code and dependencies

cp $LEGACYPIPE_DIR/doc/dr8/dr8-env.sh ./dr8-env-decam.sh

but then change the LEGACY_SURVEY_DIR environment variable to

export =/global/project/projectdirs/cosmo/work/legacysurvey/dr8b/runbrick-decam

And that's it! Setting up the runbrick-90prime-mosaic directory is analogous.

C2. Load the set of qdo tasks (lists of bricks).

  • ToDo: These instructions aren't quite right because the various regions use different survey-ccd files.

With our I/O directory set up and our survey-ccds table in hand, we are ready to run legacypipe. First, create a list of bricks to load into qdo using legacypipe/queue-calibs.py. In DR8a and DR8b we focused on a set of test regions. The unique set of bricks in these regions were separately determined by Adam Myers, but can be rederived with, e.g.,

cd /global/project/projectdirs/cosmo/work/legacysurvey/dr8b/runbrick-decam
source dr8-env-decam.sh
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-s82 > bricks-test-s82
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-sgc > bricks-test-hsc-sgc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-ngc > bricks-test-hsc-ngc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-edr > bricks-test-edr

To keep the top-level directory tidy, these files should be copied to a new directory, dr8b/brick-lists.

Next, create a qdo queue for each test region (so that completing a particular region can be prioritized) with

qdo load dr8b-test-s82-decam ./bricks-test-s82
qdo load dr8b-test-hsc-sgc-decam ./bricks-test-hsc-sgc
qdo load dr8b-test-hsc-ngc-decam ./bricks-test-hsc-ngc
qdo load dr8b-test-edr-decam ./bricks-test-edr

C4. Launch runbrick.

  • ToDo: Add more info here.
qdo launch dr8b-test-hsc-ngc-decam 256 --cores_per_worker 8 --walltime=00:30:00 \
  --script ./dr8-runbrick-decam.sh --batchqueue debug --keep_env --batchopts "--bbf=bb.conf"

C3. Rsync the output catalogs back to project.

  • ToDo: Add more info here.
cd $DW_PERSISTENT_SCRATCH/dr8b/runbrick-decam
rsync -auv tractor* $LEGACY_SURVEY_DIR >> rsync-tractor.log 2>&1 &
rsync -auv coadd $LEGACY_SURVEY_DIR >> rsync-coadd.log 2>&1 &
rsync -auv metrics $LEGACY_SURVEY_DIR >> rsync-metrics.log 2>&1 &

D. Update the viewer

  • ToDo: Add more info here.
  • Edit /global/project/projectdirs/cosmo/webapp/viewer-dev/load-layer.py and then run it (takes a long time...)
  • Then “touch wsgi.py” and then reload legacysurvey.org.