This Notebook describes how to produce Target Selection files for a Data Release, focusing on Data Release 7 (DR7) of the Legacy Surveys. In addition, the final section contains tips and tricks for working with the code.
First, ensure that your environment matches a standard DESI environment. For example:
module unload desimodules
source /project/projectdirs/desi/software/desi_environment.sh 18.7
desitarget
relies on desiutil
and desimodel
, so you may also need to set up a wider DESI environment, as detailed at:
https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop/JuneMeeting
It may also be useful to set up some additional environment variables that are used in some versions of the desitarget
code (you could also place these in your .bash_profile.ext
file):
export DESIMODEL=$HOME/git/desimodel
export DUST_DIR=/project/projectdirs/desi/software/edison/dust/v0_1/maps
export GAIA_DIR=/project/projectdirs/desi/target/gaia_dr2
Here, I've set DESIMODEL
to a reasonable location. For a more detailed description of checking out the desimodel
data files from svn see:
https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/Laptop/JuneMeeting#Datafilesfordesimodel
Here is an example of running an entire DR7 release using file paths at NERSC and a tagged version of the desitarget
code (0.22.0
for this example).
First set up the needed directories and a version number:
export LSDIR='/global/project/projectdirs/cosmo/data/legacysurvey/dr7'
export TARGDIR='/project/projectdirs/desi/target/catalogs/dr7.1/0.22.0'
export WEBDIR='/project/projectdirs/desi/www/users/adamyers'
export DR='7.1'
export VERSION='0.22.0'
Now swap the currently loaded version of desitarget
for the tagged version:
module swap desitarget/$VERSION
Make the directory for output targets, if it doesn't exist:
mkdir $TARGDIR
From this point forward, we will be running production-level code at NERSC. You may want to familiarize yourself with how to run batch jobs at NERSC. A straightforward way to run the code is to request interactive nodes. For example, to reserve one interactive node for 2 hours and 5 minutes:
salloc -N 1 -C haswell -t 02:05:00 --qos interactive -L SCRATCH,project
Now, to produce the file of targets:
select_targets $LSDIR/sweep/$DR/ $TARGDIR/targets-dr$DR-$VERSION.fits
(this takes a little over an hour on 1 Cori node, running with the default 32 processes per node).
You will now have produced an output targets file in $TARGDIR/targets-dr$DR-$VERSION.fits
.
You should remember to change permissions on the file if other people will use it, e.g.:
chgrp desi $TARGDIR/targets-dr$DR-$VERSION.fits ; chmod 660 $TARGDIR/targets-dr$DR-$VERSION.fits
In addition to the select_targets
command line code, there are select_cmx_targets
and select_sv_targets
versions that can be used for commissioning and SV.
desitarget
already parallelizes sweeps files across processors on a single node. But, it can further parallelize target selection by HEALPixel across multiple nodes. In addition, this form of parallelization can be used to produce smaller target files without having to write a single monolithic output file of all targets. The bundlefiles
option can be used to produce a slurm script, and some general guidance, for parallelizing target selection across nodes using HEALPixels.
First, as in the previous section, set up the needed directories and a version number. For example:
export LSDIR='/global/project/projectdirs/cosmo/data/legacysurvey/dr7'
export DR='7.1'
module swap desitarget/$VERSION
Now, pass select_targets
the bundlefiles
option and an nside
corresponding to the HEALPixels across which you want to parallelize the code. The bundlefiles
option specifies the number of input sweeps files that will be parallelized across (i.e. for bundlefiles=100
the input sweep files will be bundled across HEALPixels so that approximately 100 files touch each pixel. The optimal nside
is likely to correspond to the area covered by a sweep file (i.e. nside=8
for about 50 square degrees). Some trial and error as to the value of bundlefiles
might be necessary given the available resources, however. But, for instance:
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/blat.fits --bundlefiles 100 --nside 2
will produce the following output:
INFO:cuts.py:2306:select_targets: Running on the main survey
#######################################################
Numbers of bricks or files in each set of HEALPixels:
['17: 42', '18: 40', '21: 18', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['19: 38', '27: 38', '30: 24', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['4: 36', '8: 37', '28: 1', '35: 26', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['12: 25', '25: 36', '26: 36', '33: 3', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['0: 23', '16: 25', '22: 19', '23: 10', '31: 23', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['2: 13', '5: 14', '9: 15', '10: 15', '14: 2', '20: 6', '24: 16', '45: 2', '47: 17', 'Total: 100', 'Estimated time to run in hours (for 32 processors per node): 0.28h']
['6: 13', '13: 13', '29: 12', '34: 5', '39: 11', '43: 12', 'Total: 66', 'Estimated time to run in hours (for 32 processors per node): 0.20h']
#######################################################
Possible salloc command if you want to run on the interactive queue:
salloc -N 7 -C haswell -t 01:00:00 --qos interactive -L SCRATCH,project
#######################################################
Example shell script for slurm:
#!/bin/bash -l
#SBATCH -q regular
#SBATCH -N 7
#SBATCH -t 01:00:00
#SBATCH -L SCRATCH,project
#SBATCH -C haswell
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-17,18,21.fits --numproc 32 --nside 2 --healpixels 17,18,21 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-19,27,30.fits --numproc 32 --nside 2 --healpixels 19,27,30 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-4,8,28,35.fits --numproc 32 --nside 2 --healpixels 4,8,28,35 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-12,25,26,33.fits --numproc 32 --nside 2 --healpixels 12,25,26,33 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-0,16,22,23,31.fits --numproc 32 --nside 2 --healpixels 0,16,22,23,31 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-2,5,9,10,14,20,24,45,47.fits --numproc 32 --nside 2 --healpixels 2,5,9,10,14,20,24,45,47 &
srun -N 1 select_targets /global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1 $CSCRATCH/targets-dr7-hp-6,13,29,34,39,43.fits --numproc 32 --nside 2 --healpixels 6,13,29,34,39,43 &
wait
#gather_targets '$CSCRATCH/targets-dr7-hp-17,18,21.fits;$CSCRATCH/targets-dr7-hp-19,27,30.fits;$CSCRATCH/targets-dr7-hp-4,8,28,35.fits;$CSCRATCH/targets-dr7-hp-12,25,26,33.fits;$CSCRATCH/targets-dr7-hp-0,16,22,23,31.fits;$CSCRATCH/targets-dr7-hp-2,5,9,10,14,20,24,45,47.fits;$CSCRATCH/targets-dr7-hp-6,13,29,34,39,43.fits' $CSCRATCH/targets-dr7.fits targets
The top half of this output tells you the approximate time it will take to process the inputs, and an salloc
command for requesting nodes. The second half of provides an example slurm script to run on the requested nodes. So, to use this information, you would place everything below #!/bin/bash -l
in a file called, say, slurm.script
. You would then execute salloc -N 7 -C haswell -t 01:00:00 --qos interactive -L SCRATCH,project
at the command line, wait to be allocated nodes, and then run slurm.script
on your allocated nodes.
A monolothic output file of all of the targets can be produced by removing the comment (#
) before the gather_targets
command in the last line of the slurm script. However, this should not prove necessary, as desitarget
has tools for working with the HEALPixel-split parallelized output (see the next section useful commands for reading and producing subsets of targets
).
Parallelized across nodes in this manner, it takes of order 20 minutes to run targeting (for nside=2
), as compared to about 1.5 hours for a basic run of select_targets
.
Note that the information in this section also applies to the select_sv_targets
command line code that runs targets for SV.
desitarget
has tools for producing subsets of targets limited by Right Ascension (RA) and Declination (Dec), or by HEALPixel. In addition, desitarget has wrappers to read subsets of the HEALPixel-partitioned targets produced from a parallelized run of select_targets
, as described in the previous section.
As a general reminder, note that desihub
code always uses the NESTED
HEALPixel scheme.
To demonstrate this functionality, lets first produce a monolithic set of targets, as detailed in the Basic production of target files at the command line
section. You can use any version of the code after 0.27.1
for this example (provided you use a consistent version throughout this section) and we'll store the outputs on scratch disk. The commands you'll need are (in brief, see the previous sections for more detail):
export LSDIR='/global/project/projectdirs/cosmo/data/legacysurvey/dr7'
export DR='7.1'
salloc -N 1 -C haswell -t 02:00:00 --qos interactive -L SCRATCH,project
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/master-targets.fits
Let's also produce a HEALPixel-split version of the files as detailed in the Producing target files in parallel
section. To do this, run select_targets
with the bundlefiles
option, for example:
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/blat.fits --bundlefiles 100 --nside 2
and execute the produced slurm script on a set of interactive nodes. Finally, for the purposes of the rest of this tutorial, move the output files (targets-dr7-hp-0,16,22,23,31.fits
, targets-dr7-hp-17,18,21.fits
, etc.) to a directory called $CSCRATCH/targsplit
.
desitarget
has several useful wrappers to read the output of the parallelized or "bundled" code. Options include reading in a subset of HEALPixels, a "box" in RA/Dec or a cap (a "circle" on the sky) specified by RA/Dec/radius in degrees. Let's enter the Python prompt and try each of these options:
In [1]:
# Set-up the directory of HEALPixel-split files we created in the previous subsection.
import os
hpdirname = os.path.join(os.getenv("SCRATCH"), "targsplit")
# The various pieces of code for reading in targets.
from desitarget.io import read_targets_in_hp, read_targets_in_box, read_targets_in_cap
# Read targets by HEALPixel.
nside, pixlist = 16, [2303, 2557, 3063]
hptargs = read_targets_in_hp(hpdirname, nside, pixlist)
# Read targets in an RAmin, RAmax, Decmin, Decmax "rectangle".
radecbox = [120,122,32,34]
boxtargs = read_targets_in_box(hpdirname, radecbox)
# Read targets in an RA, Dec, radius (degrees) "circle".
radecrad = [121,33,0.5]
captargs = read_targets_in_cap(hpdirname, radecrad)
The default option for extracting targets is to return all columns in a target file, but it's also possible to return just a subset of columns by passing them as a list:
In [2]:
columns = ["DESI_TARGET", "TARGETID"]
hptargs = read_targets_in_hp(hpdirname, nside, pixlist, columns=columns)
boxtargs = read_targets_in_box(hpdirname, radecbox, columns=columns)
captargs = read_targets_in_cap(hpdirname, radecrad, columns=columns)
It is relatively straightforward to check that these commands worked, using the master file of targets that we created in the previous subsection. It is, of course, much quicker to read in subsets of targets rather than the entire master file!
In [3]:
import fitsio
mastername = os.path.join(os.getenv("SCRATCH"), "master-targets.fits")
mastertargs = fitsio.read(mastername, "TARGETS")
In [4]:
# Note that the geomask module has several useful functions
# for testing whether objects are in a cap, box, pixel etc.
from desitarget.geomask import is_in_cap
ii = is_in_cap(mastertargs, radecrad)
# Did we return the exact same set of TARGETIDs?
capset = set(captargs["TARGETID"])
masterset = set(mastertargs[ii]["TARGETID"])
# Are the sets identical?
capset == masterset
Out[4]:
It's certainly true that reading in a subset of targets from previosuly processed HEALPixel-split files is quicker than reading in all of the targets. But, it may also be even quicker in certain situations to only process that subset of targets in the first place.
desitarget
can process target files only in certain regions of the sky, to completely circumvent the need to produce very large target files. As in the previous section, options include running in a subset of HEALPixels, a "box" in RA/Dec or a cap (a "circle" on the sky) specified by RA/Dec/radius in degrees. Although many of the commands described below are probably sufficiently fast to run on a login node, let's grab an interactive node for this section of the tutorial:
salloc -N 1 -C haswell -t 00:20:00 --qos interactive -L SCRATCH,project
As before, we'll set up some environment variables that point to the location of Legacy Surveys imaging Data Release files at NERSC:
export LSDIR='/global/project/projectdirs/cosmo/data/legacysurvey/dr7'
export DR='7.1'
First, let's try the HEALPixels option. We'll produce just the targets in HEALPixel 16 for nside=2
.
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/targets-hp-16.fits --nside 2 --healpixels 16
Note that this only takes about 4 minutes to run (on the default 32 processors), as compared to an hour or more when running the full set of targets.
There are similar options for an (RAmin, RAmax, Decmin, Decmax) "box":
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/targets-radecbox-120-122-32-34.fits --radecbox 120,122,32,34
and for a cap (a "circle" on the sky) specified by RA/Dec/radius in degrees:
select_targets $LSDIR/sweep/$DR/ $CSCRATCH/targets-radecrad-121-33-0.5.fits --radecrad 121,33,0.5
These last two commands should run even faster (in 1-2 minutes) as they are querying much smaller areas than for the HEALPixel example.
Note that all of the above options will work with the SV version of the code (select_sv_targets
) in addition to the main survey version of the code (select_targets
).
We can check that these commands returned the expected outcomes using the same method as we used in the previous subsection to verify reading in subsets of targets. For example, returning to the Python prompt:
In [5]:
import os, fitsio
from desitarget.geomask import is_in_box
mastername = os.path.join(os.getenv("SCRATCH"), "master-targets.fits")
mastertargs = fitsio.read(mastername, "TARGETS")
radecbox = [120, 122, 32, 34]
ii = is_in_box(mastertargs, radecbox)
masterset = set(mastertargs[ii]["TARGETID"])
boxname = os.path.join(os.getenv("SCRATCH"), "targets-radecbox-120-122-32-34.fits")
boxtargs = fitsio.read(boxname, "TARGETS")
boxset = set(boxtargs["TARGETID"])
# Did we return the exact same set of TARGETIDs?
boxset == masterset
Out[5]:
In addition to the basic targets file, you may want some of the other desitarget
products.
To produce a random catalog:
select_randoms $LSDIR $CSCRATCH/blat.fits --bundlebricks 20000
Then, try a few different numbers in place of 20000 until you produce a script that
is balanced across nodes. Once you have a balanced script, follow the onscreen instructions
and move the output file to $TARGDIR/randoms-dr$DR-$VERSION.fits
. This takes about 1.5-2.5
hours to run on about 10 Cori nodes with 32 processes per node.
To produce a HEALPixel map of weights from a random catalog:
make_imaging_weight_map $TARGDIR/randoms-dr$DR-$VERSION.fits $TARGDIR/targets-dr$DR-$VERSION.fits $TARGDIR/pixweight-dr$DR-$VERSION.fits
This takes about 20 minutes to run on one Cori node with 1 process per node.
To use a HEALPixel weight file, a random catalog, and a target catalog to produce a set of QA webpages (e.g. see the DR7 QA pages:
run_target_qa $TARGDIR/targets-dr$DR-$VERSION.fits $WEBDIR/desitargetQA-dr$DR-$VERSION -w $TARGDIR/pixweight-dr$DR-$VERSION.fits
This takes about 1.5 hours on one Cori node with 1 process per node.
To select sky targets for a data release:
select_skies $LSDIR $CSCRATCH/blat.fits --bundlebricks 10000
Then, try a few different numbers in place of 10000 until you produce a script that
is balanced across nodes. Once you have a balanced script, follow the onscreen instructions
and move the output file to $TARGDIR/skies-dr$DR-$VERSION.fits
). This takes about 3-4
hours to run on about 15 Cori nodes with 32 processes per node.
Finally, to produce a set of GFA targets:
select_gfas $LSDIR/sweep/$DR $TARGDIR/gfas-dr$DR-$VERSION.fits
This takes about 15 minutes to run on one Cori node with 32 processes per node.
Again, remember to change permissions on the file if other people will use it, e.g.:
chgrp desi *fits ; chmod 660 *fits
-h
flag that will
describe how to use that code, e.g.:select_targets -h
select_targets $LSDIR/sweep/$DR/ $TARGDIR/targets-dr$DR-$VERSION.fits --tcnames QSO
the tcnames
flag should be interpreted as "target class names."
select_targets $LSDIR/sweep/$DR/ $TARGDIR/targets-dr$DR-$VERSION.fits --tcnames LRG,ELG
select_targets
also exist for commissioning (cmx) and Survey Validation (SV). An example of running the commissioning version of the code is:select_cmx_targets $LSDIR/sweep/$DR/ $TARGDIR/cmx-targets-dr$DR-$VERSION.fits
The commissioning code runs very fast, so has no options beyond the input directory and output file.
An example of running the SV version of the code is:
select_sv_targets $LSDIR/sweep/$DR/ $TARGDIR/sv2-targets-dr$DR-$VERSION.fits --iteration 2
The SV code has many of the same options as select_targets
but has the additional flag --iteration
to denote which version of SV is being run (e.g., the first iteration of SV corresponds to --iteration 1
).
tcnames
in the previous section can be used to produce web pages
for a subset of targets. For example:run_target_qa $TARGDIR/targets-dr$DR-$VERSION.fits $WEBDIR/desitargetQA-dr$DR-$VERSION -w $TARGDIR/pixweight-dr$DR-$VERSION.fits --bitnames LRG,ELG
(note that eventually --bitnames
may be renamed --tcnames
for consistency).
-w
flag, e.g.:run_target_qa $TARGDIR/targets-dr$DR-$VERSION.fits $WEBDIR/desitargetQA-dr$DR-$VERSION
Without the map, the code will produce the same output but with equal weighting for all areas of the imaging footprint.
run_target_qa $TARGDIR/targets-dr$DR-$VERSION.fits $WEBDIR/desitargetQA-dr$DR-$VERSION --noplots
run_target_qa
produces plots of systematics across the imaging footprint by default. It can take some time to produce these plots, but they can be turned off with the --nosystematics flag:run_target_qa $TARGDIR/targets-dr$DR-$VERSION.fits $WEBDIR/desitargetQA-dr$DR-$VERSION -w $TARGDIR/pixweight-dr$DR-$VERSION.fits --nosystematics.
The main module that performs the selection work for desitarget is called cuts. It has many useful functions for target selection "under the hood" that could be called directly from Python. For example the command line call:
export LSDIR='/global/project/projectdirs/cosmo/data/legacysurvey/dr7'
export TARGDIR='/project/projectdirs/desi/target/catalogs/dr7.1/0.22.0'
export DR='7.1'
export VERSION='0.22.0'
select_targets $LSDIR/sweep/$DR/ $TARGDIR/targets-dr$DR-$VERSION.fits --tcnames LRG,ELG
Is equivalent to the following set of commands at the Python prompt:
from desitarget.cuts import select_targets
from desitarget import io
# ADM inputs
nside = io.desitarget_nside()
src = '/global/project/projectdirs/cosmo/data/legacysurvey/dr7/sweep/7.1'
dest = '/project/projectdirs/desi/target/catalogs/dr7.1/0.22.0/targets-dr7.1-0.22.0.fits'
infiles = io.list_sweepfiles(src)
# ADM note that the 'main', here, is to distinguish from 'cmx' and 'svX' (with X=1,2,3 etc.)
# ADM for commissioning and SV. The code would default to 'main', anyway, without this keyword.
targets = select_targets(infiles, tcnames=["LRG", "ELG"], survey='main')
# ADM write the targets to file.
io.write_targets(dest, targets, indir=src, survey="main", nside=nside)