dr8b

The goal of this notebook is to document how we ran dr8b, including calibration files, with the updated data model and using the burst buffer + Docker image on Cori.

John Moustakas
Siena College
2019 March

Many thanks to Dustin Lang, Adam Myers, Eddie Schlafly, David Schlegel, Martin Landriau, and Stephen Bailey.

A. Preliminaries

A1. Define $LEGACY_SURVEY_DIR.

First, choose a new, empty top-level working directory on project and create it:

mkdir -p /global/project/projectdirs/cosmo/work/legacysurvey/dr8b
cd /global/project/projectdirs/cosmo/work/legacysurvey/dr8b

Hereafter I will assume that all scripts are launched from this working directory, unless otherwise specified (starting in Section C).

Next, create soft links to the CP-reduced imaging data and the WISE background models:

mkdir -p images calib/wise
ln -s /global/project/projectdirs/cosmo/staging/90prime images/90prime
ln -s /global/project/projectdirs/cosmo/staging/mosaic images/mosaic
ln -s /global/project/projectdirs/cosmo/staging/decam images/decam
ln -s /project/projectdirs/cosmo/work/wise/unwise_catalog/dr1/mod calib/wise/modelsky

And finally grab a copy of the DR-independent survey-bricks file, which we will need below:

cp /global/project/projectdirs/cosmo/work/legacysurvey/dr7/survey-bricks.fits.gz .

A2. Access (or create) the burst buffer.

Initially, Dustin Lang created a 40TB burst-buffer reservation called "DR8". You can think of this as a mounted external drive, where large files can be written and read without the significant overhead associated with $SCRATCH, although eventually the results will be copied onto project, as described below.

To access this file system for the first time you have to create a configuration file (just once).

echo "#DW persistentdw name=DR8" > bb.conf

The files in the burst buffer can be accessed only from an interactive Cori node. Let's do this (just once) and make a dedicated subdirectory to keep our outputs tidy:

salloc -N 1 -C haswell -q interactive -t 00:10:00 --bbf=bb.conf
mkdir -p $DW_PERSISTENT_STRIPED_DR8/dr8b

Note that the $DW_PERSISTENT_STRIPED_DR8 environment variable must be used always, as every user will have a different absolute path.

For the record, a new reservation can be made (if necessary) by submitting the following SLURM script to the queue:

#! /bin/bash
#SBATCH -q debug
#SBATCH -N 1
#SBATCH -C haswell
#SBATCH -t 00:05:00
#BB create_persistent name=BBNAME capacity=50000GB access_mode=striped type=scratch

where BBNAME is the desired name of the reservation.

A3. Set up software dependencies.

ToDo: Add instructions for compiling tractor and astrometry.net, if needed (separate notebook).
Note: In dr8b we had to rely on a local installations of tractor and astrometry.net (especially on edison), but here we document the ideal setup.

First, make sure your environment is clean! In other words, ensure that you don't explicitly load any modules or source any code in your initialization files (e.g., $HOME/.bash_profile.ext or $HOME/.bashrc.ext).

Most of the code dependencies we need are in the Docker container, including tractor and astrometry.net, described below, but we usually depend on a recent (hopefully tagged) version of legacyzpts, legacypipe, and qdo. To keep everything tidy, check out the code into a dedicated directory, although note some of the shell scripts below assume this directory!

mkdir -p code ; cd code
git clone git@github.com:legacysurvey/legacyzpts.git
git clone git@github.com:legacysurvey/legacypipe.git
git clone https://bitbucket.org/berkeleylab/qdo.git

Be sure to check out the appropriate branches of these software packages as necessary.

Next, pull the latest Docker container

shifterimg pull docker:legacysurvey/legacypipe:latest

which one can enter (if desired) with the command

shifter --image docker:legacysurvey/legacypipe:latest bash

Although we use the Docker container for production, for a lot of preparatory work and debugging it's often convenient to load all the software dependencies and myriad environment variables on the command line, outside of the Docker container. We provide a simple shell script for this purpose in legacypipe which we recommend copying to the local working directory for convenience of editing, temporarily changing paths, etc.:

cp code/legacypipe/doc/dr8/dr8-env.sh ./
source dr8-env.sh

This shell script has a single environment variable, $CODE_DIR for pointing to the locally checked out code, which should be updated as necessary.

Finally, you will also need to add the following lines (and get the appropriate database password) in order to complete the qdo setup (for the desiconda user these lines are in the .bashrc.ext file):

export QDO_BACKEND=postgres
export QDO_BATCH_PROFILE=cori
export QDO_DB_HOST=nerscdb03.nersc.gov
export QDO_DB_NAME=desirun
export QDO_DB_USER=desirun_admin
export QDO_DB_PASS=XXXXXXXX

A4. Create the input image lists.

  • ToDo: Update to the final flat-files.

For DR8 Eddie Schlafly and David Schlegel inspected all the DECam, 90prime, and mosaic imaging data on-disk and created three FITS files which should be used to define the input data:

  • dr8-ondisk-decam.fits
  • dr8-ondisk-90prime.fits
  • dr8-ondisk-mosaic.fits

We use these tables (cutting on QKEEP==True) to create our input image lists. To keep everything tidy, create a new image-lists subdirectory, copy (ToDo: specify location) the dr8-ondisk-*.fits files there, source the dr8-env.sh file, and then run the following snippet of code in an ipython session.

import os
import fitsio
for camera in ('decam', 'mosaic', '90prime'):
    data = fitsio.read('dr8-ondisk-{}.fits'.format(camera), upper=True)
    with open('image-list-{}.txt'.format(camera), 'w') as flist:
        for imfile in data[data['QKEEP']]['FILENAME']:
            flist.write('{}\n'.format(os.path.join(camera, imfile.decode('utf-8').strip())))

The resulting output files are:

  • image-list-decam.txt (121123 images in DR8, 5515 in DR8b)
  • image-list-90prime.txt (34206 images in DR8, 1073 in DR8b)
  • image-list-mosaic.txt (61049 images in DR8, 1374 in DR8b)

B. Generate the calibration files.

B1. Load and launch qdo tasks (lists of images).

We use qdo to manage the myriad of tasks (one task = one image), and the shifter script dr8-zpts-shifter.sh to produce the calibration files we need. First, load the tasks into the database with

qdo load dr8b-calibs-decam ./image-list-decam.txt
qdo load dr8b-calibs-90prime ./image-list-90prime.txt
qdo load dr8b-calibs-mosaic ./image-list-mosaic.txt

Next, copy the relevant scripts into the working directory:

cp $LEGACYPIPE_DIR/doc/dr8/dr8-env-shifter.sh ./
cp $LEGACYPIPE_DIR/doc/dr8/dr8-zpts-shifter.sh ./

The script dr8-env-shifter.sh does not need to be modified, but in the dr8-zpts-shifter.sh be sure to check that the dr, LEGACY_SURVEY_DIR, and CODE_DIR variables are correct and up to date.

Then, utilize various queues to get everything done. For example, for DECam it's not crazy to request 256 MPI tasks with 8 cores per task (equivalent to 256/8=64 nodes on cori). Using the debug and regular queue for 30 and 120 minutes, respectively, (and the burst buffer) would look like

QDO_BATCH_PROFILE=cori-shifter qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 \
  --walltime=00:30:00 --batchqueue=debug --keep_env --script ./dr8-zpts-shifter.sh \
  --batchopts "--image=docker:legacysurvey/legacypipe:latest --bbf=bb.conf"

QDO_BATCH_PROFILE=cori-shifter qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 \
  --walltime=02:00:00 --batchqueue=regular --keep_env --script ./dr8-zpts-shifter.sh \
  --batchopts "--image=docker:legacysurvey/legacypipe:latest --bbf=bb.conf"

The game, of course, is to balance throughput and wait time, although in general the debug queues work quite well, even on the DECam images (with ~60 CCDs each).

Alternatively, one could use the shared queue with

QDO_BATCH_PROFILE=cori-shifter qdo launch dr8b-calibs-decam 1 --cores_per_worker 8 \
  --walltime=04:00:00 --batchqueue=shared --keep_env --script ./dr8-zpts-shifter.sh \
  --batchopts "--image=docker:legacysurvey/legacypipe:latest -a 0-99 --bbf=bb.conf"

which may also work well in production.

Note that for the 90prime and mosaic cameras (which only have 4 CCDs) a more typical request would be

QDO_BATCH_PROFILE=cori-shifter qdo launch dr8b-calibs-mosaic 512 --cores_per_worker 4 \
  --walltime=00:30:00 --batchqueue=debug --keep_env --script ./dr8-zpts-shifter.sh \
  --batchopts "--image=docker:legacysurvey/legacypipe:latest --bbf=bb.conf"

For the record, files will be written out with the following directory structure (all relative to $DW_PERSISTENT_STRIPED_DR8/dr8b):

zpts
  90prime
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
  decam
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
  mosaic
    CP*/[image-file]-annotated.fits
    CP*/[image-file]-photom.fits
    CP*/[image-file]-survey.fits
calib
  90prime
    psfex
    psfex-merged/?????/90prime-????????.fits
    se
    splinesky
    splinesky-merged/?????/90prime-????????.fits
  decam
    psfex
    psfex-merged/?????/decam-????????.fits
    se
    splinesky
    splinesky-merged/?????/decam-????????.fits
  mosaic
    psfex
    psfex-merged/?????/mosaic-????????.fits
    se
    splinesky
    splinesky-merged/?????/mosaic-????????.fits

The only files we care about, however, are all the files in the zpts, splinesky-merged, and psfex-merged directories; the files in the SExtractor (se), psfex, and splinesky directories are intermittent and will be deleted in the future.

B2. Some useful qdo commands.

The following qdo commands may be useful:

qdo list 
qdo status dr8b-calibs-decam                  # check on current status
qdo retry dr8b-calibs-decam                   # re-load failed jobs (presumably after debugging the code)
qdo recover dr8b-calibs-decam --dead          # re-load jobs that hung because the queue timed out

B3. Rsync everything to project.

Once all calibrations are done the necessary outputs should be copied to project:

cd $DW_PERSISTENT_SCRATCH/dr8b
rsync -auv zpts $LEGACY_SURVEY_DIR >> rsync-zpts.log 2>&1 &
rsync -auvR calib/*/*-merged $LEGACY_SURVEY_DIR >> rsync-calib.log 2>&1 &

B4. Build the merged survey-ccds and annotated-ccds files (and their KD-tree cousins).

  • ToDo: Document a slurm script for doing this.
  • ToDo: Before merging, validate the input image list and output catalogs. For DR8b there's a small script misc/validate-calibs.py, but it won't scale well.

Next, we need to merge all the individual zeropoint files to generate the all-powerful survey-ccds and annotated-ccds files, which we can accomplish with dr8-merge-zpts.sh:

./$LEGACYPIPE_DIR/doc/dr8/dr8-merge-zpts.sh

This script builds a simple ASCII file list of the individual zeropoint tables (ignoring files with the "debug" suffix) and passes them to legacyzpts/legacy_zeropoints_merge.py and, subsequently, legacypipe/create_kdtrees.py, to create the following files:

annotated-ccds-dr8b-decam-nocuts.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.fits  
annotated-ccds-dr8b-decam-nocuts.kd.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.kd.fits

B5. Update ccd_cuts and apply depth cuts (optional).

  • ToDo: Document this step (see notebook/notes by Adam and Dustin).

B6. Generate QA of the input CCDs tables.

  • ToDo: Generate a standard set of plots showing the coverage of the CCDs files, scatter in zeropoints, comparisons to previous DRs, etc.

C. Run the legacypipe pipeline!

C1. Set up the runbrick-decam and runbrick-90prime-mosaic directories.

The pipeline has to be run separately for the DECam and 90Prime+Mosaic datasets, so we need to create and set up different dedicated directories. For example, for DECam, do:

mkdir -p runbrick-decam
cd runbrick-decam
ln -s ../bb.conf bb.conf
ln -s ../calib calib
ln -s ../code code
ln -s ../images images
ln -s ../survey-bricks.fits.gz survey-bricks.fits.gz
ln -s ../survey-ccds-dr8b-decam-nocuts.kd.fits survey-ccds-dr8b-decam-nocuts.kd.fits

C2. Set up the shifter script.

Next, we need two shell scripts which set up runbrick specifically for this input directory and camera:

cp $LEGACYPIPE_DIR/doc/dr8/dr8-env-shifter.sh ./
cp $LEGACYPIPE_DIR/doc/dr8/dr8-runbrick-shifter-decam.sh ./

However, be sure to verify that the dr, camera, release, LEGACY_SURVEY_DIR, and CODE_DIR environment variables are all correct and up to date.

And that's it! Setting up the runbrick-90prime-mosaic directory is analogous.

C3. Load the set of qdo tasks (lists of bricks).

  • ToDo: These instructions aren't quite right because the various regions use different survey-ccd files.

With our I/O directory set up and our survey-ccds table in hand, we are ready to run legacypipe. First, create a list of bricks to load into qdo using legacypipe/queue-calibs.py. In dr8a and dr8b we focused on a set of test regions. The unique set of bricks in these regions were separately determined by Adam Myers, but can be rederived with, e.g.,

cd /global/project/projectdirs/cosmo/work/legacysurvey/dr8b/runbrick-decam
source dr8-env-decam.sh
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-s82 > bricks-test-s82
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-sgc > bricks-test-hsc-sgc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-ngc > bricks-test-hsc-ngc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-edr > bricks-test-edr

To keep the top-level directory tidy, these files should be copied to a new directory, dr8b/brick-lists.

Next, create a qdo queue for each test region (so that completing a particular region can be prioritized) with

qdo load dr8b-test-s82-decam ./bricks-test-s82
qdo load dr8b-test-hsc-sgc-decam ./bricks-test-hsc-sgc
qdo load dr8b-test-hsc-ngc-decam ./bricks-test-hsc-ngc
qdo load dr8b-test-edr-decam ./bricks-test-edr

C3. Launch runbrick.

  • ToDo: Add more info here.
QDO_BATCH_PROFILE=cori-shifter qdo launch dr8b-test-hsc-ngc-decam 256 --cores_per_worker 8 \
  --walltime=00:30:00 --batchqueue=debug --keep_env --script ./dr8-runbrick-shifter-decam.sh \
  --batchopts "--image=docker:legacysurvey/legacypipe:latest --bbf=bb.conf"

C5. Rsync the output catalogs back to project.

  • ToDo: Add more info here.
cd $DW_PERSISTENT_SCRATCH/dr8b/runbrick-decam
rsync -auv tractor* $LEGACY_SURVEY_DIR >> rsync-tractor.log 2>&1 &
rsync -auv coadd $LEGACY_SURVEY_DIR >> rsync-coadd.log 2>&1 &
rsync -auv metrics $LEGACY_SURVEY_DIR >> rsync-metrics.log 2>&1 &

D. Update the viewer

  • ToDo: Add more info here.
  • Edit /global/project/projectdirs/cosmo/webapp/viewer-dev/load-layer.py and then run it (takes a long time...)
  • Then “touch wsgi.py” and then reload legacysurvey.org.