The goal of this notebook is to document how we ran DR8b, including calibration files, with the updated data model and using the burst buffer on Cori.
John Moustakas
Siena College
2019 March 10
Many thanks to Dustin Lang, Adam Myers, Eddie Schlafly, David Schlegel, Martin Landriau, and Stephen Bailey.
First, choose a new, empty top-level directory on project
and create it:
export LEGACY_SURVEY_DIR=/global/project/projectdirs/cosmo/work/legacysurvey/dr8b
mkdir -p $LEGACY_SURVEY_DIR
cd $LEGACY_SURVEY_DIR
Note: the same $LEGACY_SURVEY_DIR
environment variable will need to be defined within the scripts below so that they are self-contained.
Hereafter I will assume that all scripts are launched from the $LEGACY_SURVEY_DIR directory.
Next, create soft links to the CP-reduced imaging data and the WISE background models:
mkdir -p images calib/wise
ln -s /global/project/projectdirs/cosmo/staging/90prime images/90prime
ln -s /global/project/projectdirs/cosmo/staging/mosaic images/mosaic
ln -s /global/project/projectdirs/cosmo/staging/decam images/decam
ln -s /project/projectdirs/cosmo/work/wise/unwise_catalog/dr1/mod calib/wise/modelsky
And finally grab a copy of the survey-bricks
file, which we will need below:
cp /global/project/projectdirs/cosmo/work/legacysurvey/dr7/survey-bricks.fits.gz .
Initially, Dustin Lang created a 40TB burst-buffer reservation called "DR8". You can think of this as a mounted external drive, where large files can be written and read without the significant overhead associated with $SCRATCH
, although eventually the results will be copied onto project
, as described below.
To access this file system for the first time you have to create a configuration file (just once).
echo "#DW persistentdw name=DR8" > bb.conf
The files in the burst buffer can be accessed only from an interactive Cori node. Let's do this (just once) and make a dedicated subdirectory to keep our outputs tidy:
salloc -N 1 -C haswell -q interactive -t 00:10:00 --bbf=bb.conf
mkdir -p $DW_PERSISTENT_STRIPED_DR8/dr8b
Note that the $DW_PERSISTENT_STRIPED_DR8
environment variable must be used always, as every user will have a different absolute path.
For the record, a new reservation can be made (if necessary) by submitting the following SLURM script to the queue:
#! /bin/bash
#SBATCH -q debug
#SBATCH -N 1
#SBATCH -C haswell
#SBATCH -t 00:05:00
#BB create_persistent name=BBNAME capacity=50000GB access_mode=striped type=scratch
where BBNAME
is the desired name of the reservation.
ToDo: Add Docker instructions (link to Adam's NB).
ToDo: Add instructions for compiling tractor and astrometry.net, if needed (separate notebook).
Note: In DR8b we had to rely on a local installations of qdo
and tractor
and astrometry.net
(on edison), but here we document the ideal setup.
Most of the code dependencies are in the desiconda
imaging stack (like tractor and astrometry.net), which we will source in our setup script, below, but we usually depend on a recent (hopefully tagged) version of legacyzpts and legacypipe.
cd $LEGACY_SURVEY_DIR
mkdir -p code ; cd code
git clone git@github.com:legacysurvey/legacyzpts.git ; cd legacyzpts
git checkout tags/dr8.0 ; cd ..
git clone git@github.com:legacysurvey/legacypipe.git
cd ..
Next, make a local copy (for convenience of editing, temporarily changing paths, etc.) of the bash script we use to set up all the necessary dependencies and myriad environment variables:
cp code/legacypipe/doc/dr8/dr8-env.sh .
Be sure to update this script with the appropriate $LEGACY_SURVEY_DIR
path, and you will also need to add the following lines (and get the appropriate database password) in order to complete the qdo
setup (for the desiconda user these lines are in the .bashrc.ext
file):
export QDO_BACKEND=postgres
export QDO_BATCH_PROFILE=cori
export QDO_DB_HOST=nerscdb03.nersc.gov
export QDO_DB_NAME=desirun
export QDO_DB_USER=desirun_admin
export QDO_DB_PASS=ask_someone_on_the_imaging_team
For DR8 Eddie Schlafly and David Schlegel inspected all the DECam
, 90prime
, and mosaic
imaging data on-disk and created three FITS files which should be used to define the input data:
We use these tables (cutting on QKEEP==True
) to create our input image lists. To keep everything tidy, create a new $LEGACY_SURVEY_DIR/image-lists
subdirectory, copy the dr8-ondisk-*.fits
files there, source the dr8-env.sh
file, and then run the following snippet of code in an ipython
session.
import os
import fitsio
for camera in ('decam', 'mosaic', '90prime'):
data = fitsio.read('dr8-ondisk-{}.fits'.format(camera), upper=True)
with open('image-list-{}.txt'.format(camera), 'w') as flist:
for imfile in data[data['QKEEP']]['FILENAME']:
flist.write('{}\n'.format(os.path.join(camera, imfile.decode('utf-8').strip())))
The resulting output files are:
We use qdo
to manage the myriad of tasks (one task = one image), and the shell script dr8-zpts.sh
to produce the calibration files we need. First, load the tasks into the database with
qdo load dr8b-calibs-decam ./image-list-decam.txt
qdo load dr8b-calibs-90prime ./image-list-90prime.txt
qdo load dr8b-calibs-mosaic ./image-list-mosaic.txt
Finally, utilize various queues to get everything done. For example, for DECam
it's not crazy to request 256 MPI tasks with 8 cores per task (equivalent to 256/8=64 nodes on cori
). Using the debug
and regular
queue for 30 and 180 minutes, respectively, (and the burst buffer) would look like
qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 --walltime=00:30:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
--batchqueue debug --keep_env --batchopts "--bbf=bb.conf"
qdo launch dr8b-calibs-decam 256 --cores_per_worker 8 --walltime=03:00:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
--batchqueue regular --keep_env --batchopts "--bbf=bb.conf"
The game, of course, is to balance throughput and wait time, although in general the debug queues work quite well, even on the DECam images (with ~60 CCDs each).
Alternatively, one could use the shared queue with
qdo launch dr8b-calibs-decam 1 --cores_per_worker 8 --walltime=04:00:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
--batchqueue shared --keep_env --batchopts "--bbf=bb.conf -a 0-99"
which may also work well in production.
Note that for the 90prime
and mosaic
cameras (which only have 4 CCDs) a more typical request would be
qdo launch dr8b-calibs-mosaic 512 --cores_per_worker 4 --walltime=00:30:00 --script $LEGACYPIPE_DIR/doc/dr8/dr8-zpts.sh \
--batchqueue debug --keep_env --batchopts "--bbf=bb.conf"
For the record, dr8-zpts.sh
will write files out with the following directory structure (all relative to $DW_PERSISTENT_STRIPED_DR8/dr8b
):
zpts
90prime
CP*/[image-file]-annotated.fits
CP*/[image-file]-photom.fits
CP*/[image-file]-survey.fits
decam
CP*/[image-file]-annotated.fits
CP*/[image-file]-photom.fits
CP*/[image-file]-survey.fits
mosaic
CP*/[image-file]-annotated.fits
CP*/[image-file]-photom.fits
CP*/[image-file]-survey.fits
calib
90prime
psfex
psfex-merged/?????/90prime-????????.fits
se
splinesky
splinesky-merged/?????/90prime-????????.fits
decam
psfex
psfex-merged/?????/decam-????????.fits
se
splinesky
splinesky-merged/?????/decam-????????.fits
mosaic
psfex
psfex-merged/?????/mosaic-????????.fits
se
splinesky
splinesky-merged/?????/mosaic-????????.fits
The only files we care about, however, are all the files in the zpts
, splinesky-merged
, and psfex-merged
directories; the files in the SExtractor (se), psfex, and splinesky directories are intermittent and will be deleted in the future.
The following qdo
commands may be useful:
qdo list
qdo do dr8b-calibs-decam --script dr8-zpts.sh # run the script interactively
qdo status dr8b-calibs-decam # check on current status
qdo retry dr8b-calibs-decam # re-load failed jobs (presumably after debugging the code)
qdo recover dr8b-calibs-decam --dead # re-load jobs that hung because the queue timed out
Next, we need to merge all the individual zeropoint files to generate the all-powerful survey-ccds and annotated-ccds files, which we can accomplish with dr8-merge-zpts.sh
:
./$LEGACYPIPE_DIR/doc/dr8/dr8-merge-zpts.sh
This script builds a simple ASCII file list of the individual zeropoint tables (ignoring files with the "debug" suffix) and passes them to legacyzpts/legacy_zeropoints_merge.py
and, subsequently, legacypipe/create_kdtrees.py
, to create the following files:
annotated-ccds-dr8b-decam-nocuts.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.fits
annotated-ccds-dr8b-decam-nocuts.kd.fits
annotated-ccds-dr8b-90prime-mosaic-nocuts.kd.fits
The pipeline has to be run separately for the DECam and 90Prime+Mosaic datasets, so we need to create and set up different dedicated directories. For example, for DECam, do:
mkdir -p runbrick-decam
cd runbrick-decam
ln -s ../bb.conf bb.conf
ln -s ../calib calib
ln -s ../code code
ln -s ../images images
ln -s ../survey-bricks.fits.gz survey-bricks.fits.gz
ln -s ../survey-ccds-dr8b-decam-nocuts.kd.fits survey-ccds-dr8b-decam-nocuts.kd.fits
Next, we need a shell script which sets up runbrick
specifically for this input directory. Fortunately, all we have to do is copy our generic shell script which sets up our code and dependencies
cp $LEGACYPIPE_DIR/doc/dr8/dr8-env.sh ./dr8-env-decam.sh
but then change the LEGACY_SURVEY_DIR
environment variable to
export =/global/project/projectdirs/cosmo/work/legacysurvey/dr8b/runbrick-decam
And that's it! Setting up the runbrick-90prime-mosaic
directory is analogous.
survey-ccd
files.With our I/O directory set up and our survey-ccds
table in hand, we are ready to run legacypipe
. First, create a list of bricks to load into qdo
using legacypipe/queue-calibs.py
. In DR8a
and DR8b
we focused on a set of test regions. The unique set of bricks in these regions were separately determined by Adam Myers, but can be rederived with, e.g.,
cd /global/project/projectdirs/cosmo/work/legacysurvey/dr8b/runbrick-decam
source dr8-env-decam.sh
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-s82 > bricks-test-s82
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-sgc > bricks-test-hsc-sgc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-hsc-ngc > bricks-test-hsc-ngc
python $LEGACYPIPE_DIR/py/legacypipe/queue-calibs.py --region dr8-test-edr > bricks-test-edr
To keep the top-level directory tidy, these files should be copied to a new directory, dr8b/brick-lists
.
Next, create a qdo
queue for each test region (so that completing a particular region can be prioritized) with
qdo load dr8b-test-s82-decam ./bricks-test-s82
qdo load dr8b-test-hsc-sgc-decam ./bricks-test-hsc-sgc
qdo load dr8b-test-hsc-ngc-decam ./bricks-test-hsc-ngc
qdo load dr8b-test-edr-decam ./bricks-test-edr
cd $DW_PERSISTENT_SCRATCH/dr8b/runbrick-decam
rsync -auv tractor* $LEGACY_SURVEY_DIR >> rsync-tractor.log 2>&1 &
rsync -auv coadd $LEGACY_SURVEY_DIR >> rsync-coadd.log 2>&1 &
rsync -auv metrics $LEGACY_SURVEY_DIR >> rsync-metrics.log 2>&1 &
/global/project/projectdirs/cosmo/webapp/viewer-dev/load-layer.py
and then run it (takes a long time...)