This notebook serves as a tutorial for the ugali
pipeline analysis. Its purpose is to document the steps taken to perform an end-to-end likelihood analysis on a raw data set. The ugali pipeline is comprised of the following steps:
run_01.0_download_data.py
- This step is used for downloading data from the databaserun_02.0_preprocess.py
- This step preprocesses the input data into the structure ugali expectsrun_03.0_likelihood.py
- This step runs the likelihood grid search over the data setrun_04.0_peak_finder.py
- Takes the output likelihood grid and finds peaks in 3Drun_05.0_followup.py
- This step runs mcmc parameter fittingrun_06.0_simulate.py
- Runs and analyzes simulations
In [ ]:
%matplotlib inline
import os
import numpy as np
import healpy as hp
import fitsio
In [ ]:
# Create the working directory and chdir into it
!mkdir -p work
os.chdir('./work')
In [ ]:
# Download the test data
!wget https://github.com/DarkEnergySurvey/ugali/releases/download/v1.7.0/ugali-test-data.tar.gz
!tar -xzf ugali-test-data.tar.gz && rm -f ugali-test-data.tar.gz
For easier access, we'll create symlinks to some of the ugali
components that we'll be using.
In [ ]:
# Links to package components
!rm -f ugali && ln -s ../ugali
!rm -f pipeline && ln -s ugali/pipeline
# Link to configuration and source model files
!rm -f config.yaml && ln -s ../tests/config.yaml
!rm -f srcmdl.yaml && ln -s ../tests/srcmdl.yaml
Let's take a look at what we've got
In [ ]:
!ls -lh *
The components of our working directory are now:
healpix
- this directory contains the catalog data divided into files by healpix pixel (and labeled as such)mask
- mask of the magnitude limit in "sparse" healpix representation corresponding to each catalog pixelpipeline
- pipeline scriptsconfig.yaml
- pipeline configuration filesrcmdl.yaml
- source model fileWe can examine some of these constituents to get a better feel for the files we are working with.
In [ ]:
# Investigate the catalog
data = fitsio.read('healpix/catalog_hpx0687.fits')
print(data.dtype.names)
print(data[:3])
# Displaying the catalog as a healpix map
nside=4096
c = np.zeros(hp.nside2npix(nside))
pix,cts = np.unique(data['PIX4096'],return_counts=True)
c[pix] = cts
hp.cartview(c,lonra=[53.5,55],latra=[-55,-53.5])
In [ ]:
# Investigate the magnitude limit mask
mask = fitsio.read('mask/maglim_g_hpx0687.fits')
print(mask.dtype.names)
print(mask[:3])
# Displaying the catalog as a healpix map
m = np.zeros(hp.nside2npix(nside))
m[mask['PIXEL']] = mask['MAGLIM']
hp.cartview(m,lonra=[53.5,55],latra=[-55,-53.5])
The configuration file is the key to the ugali pipeline analysis. We are going to use config.yaml
. This file contains the path to the catalog and mask files that we will use, as well as other configuration parameters for the various pipeline steps. In subsequent steps we will refer to specific configuration parameters in this file.
In [ ]:
!head -n 25 config.yaml
While there is a pipeline step for downloading some data sets (i.e., from SDSS and DES), this is mostly vistigial code that has been left as an example for the user. In almost all cases that user will provide their own data set from some arbitrary source of data.
The requirements of the input data are kept fairly minimal, and can be found in the config['catalog']
section:
catalog:
dirname: ./healpix # directory of catalog files
basename: "catalog_hpx%04i.fits" # catalog file basename format
objid_field : COADD_OBJECT_ID # unique object identifier
lon_field : RA # name of longitude field
lat_field : DEC # name of latitude field
# Color always defined as mag_1 - mag_2
mag_1_band : &band1 g # name of mag_1 filter band
mag_1_field : WAVG_MAG_PSF_G # name of mag_1 field
mag_err_1_field : WAVG_MAGERR_PSF_G # name of magerr_1 field
mag_2_band : &band2 r # name of mag_2 filter band
mag_2_field : WAVG_MAG_PSF_R # name of mag_2 field
mag_err_2_field : WAVG_MAGERR_PSF_R # name of magerr_2 field
# True = band 1 is detection band; False = band 2 is detection band
band_1_detection : &detection True # detection occurs in band_1?
mc_source_id_field: MC_SOURCE_ID # field for simulated objects
# Additional selection parameters to get a sample of stars
selection : "(np.abs(self.data['WAVG_SPREAD_MODEL_I']) < 0.003)"
objid_field
- This is some unique identifier for objects in the cataloglon_field, lat_field
- The longitude and latitude of each object in the catalogs. This is usually RA
and DEC
with coordsys = CEL
; however, Galactic coordinates are also supported (though less well tested).mag_[1,2]_band, mag_[1,2]_field, mag_err_[1,2]_field
- These are columns corresponding to the magnitude and magnitude error of each object in the catalog. Magnitudes are assumed to be extiniction corrected.selection
- This column can be used to specify any additional selection (e.g., star-galaxy classification, quality cuts) that need to be applied to the catalog before analyzing.In order to run ugali
the data needs to be preprocessed
into a prescribed directory structure. This step involves assembling the data products into a file structure that ugali assumes. This step in the pipeline makes use of the run_02.0_preprocess.py
script. You can run one step at a time or multiple steps at once using the --run <XXX>
option. For example:
python pipeline/run_02.0_preprocess.py config.yaml -r pixelize
will run just pixelize
, while
python pipeline/run_02.0_preprocess.py config.yaml -r pixelize -r split
will run both pixelize
and split
in order.
This is a general feature of the ugali
pipeline.
Input data can be organized in any arbitrary structure; however, ugali
expects the data to be spatially sorted into healpix pixels and stored in files with appropriate names. The input catalog data files are specified in config.yaml
in config['data']['dirname']
data:
dirname: /home/s1/kadrlica/projects/bliss/dsphs/v2/raw
The code will glob
for all files in dirname
ending in *.fits
.
The output resolution of the pixelized catalog files is defined in config.yaml
coords:
nside_catalog : 32 # Size of patches for catalog binning
The output pixelized files will be placed in config['catalog']['dirname']
conforming to config['catalog']['basename']
.
catalog:
dirname: /home/s1/kadrlica/projects/bliss/dsphs/v2/healpix
basename: "cat_hpx_%05i.fits"
In this test case the input data is already pixelized, so this is a trivial operation.
In [ ]:
!python pipeline/run_02.0_preprocess.py config.yaml -r pixelize
Ugali relies on some knowledge of the footprint of the data being analyzed. This is used to calculated the model predicted stellar counts and to exclude regions that are uncovered by a specific data set.
The mask takes as input the footprint of the data set, specified in:
data:
footprint: ./maps/footprint_n4096_equ.fits.gz
The magnitude limit mask can be derived from the footprint in 3 different ways:
ugali/utils/constants.py
). This is run with the -r simple
option to run_02.0_preprocess.py
.-r maglim
optionThe output of this step will be placed in:
mask:
dirname : ./mask
basename_1 : "maglim_g_hpx%04i.fits"
basename_2 : "maglim_r_hpx%04i.fits"
In [ ]:
!python pipeline/run_02.0_preprocess.py config.yaml -r simple
In [ ]:
!python pipeline/run_03.0_likelihood.py --help
The command that we will run is
python pipeline/run_03.0_likelihood.py config.yaml -r scan -q local
Breaking this down:
python pipeline/run_03.0_likelihood.py
- the script itselfconfig.yaml
- the first positional argument of all the pipeline scripts is the configuration file-r scan
- the component of the pipeline we want to run; in this case the grid scan-q local
- we are going to execute this component locallyNow to run the script...
In [ ]:
!python pipeline/run_03.0_likelihood.py config.yaml -r scan -q local
It will take a while for the script to run. The result will be a set of output files in the scan
directory.
In [ ]:
!ls -lh scan
The next step is to merge the output likelihood files
In [ ]:
!python pipeline/run_03.0_likelihood.py config.yaml -r merge -q local
In [ ]:
import os
os.chdir('../work')
After the likelihood scan is run we are left with a merged healpix file containing the likelihood evaluated at each healpix coordinate and distance modulus. We can search this likelihood cube for over densities in the likelihood corresponding to satellite candidates.
In [ ]:
!python pipeline/run_04.0_peak_finder.py config.yaml
After identifying satellite candidates, we would like to fit their observed parameters (location, distance, size, luminosity, etc.). This is implemented by calculating the posterior probability with MCMC sampling. The first step is to create a "source model" file for parameterizing the parameters of the satelliite model. An example source model is provided in
In [ ]:
!python pipeline/run_05.0_followup.py config.yaml -t targets.txt -q local -r mcmc