Updated LRG Templates

ToDo: Update to DR5

The goal of this notebook is to use observations of a large number of luminous red galaxies (LRGs) with DECaLS/DR3 grzW1W2 photometry and ancillary spectroscopic redshifts to generate a set of spectroscopic templates suitable for spectroscopic simulations, targeting investigations, and other DESI-related projects.

The parent sample consists of 2,511,248 LRGs selected from the official DECaLS/DR3.1 target catalog targets-dr3.1-0.14.0.fits using the selection criterion

DESI_TARGET & desi_mask.LRG != 0

Subsequently, Rongpu Zhou (Pittsburgh) assembled spectroscopic redshifts for the sample from SDSS/Main, SDSS/BOSS, AGES, DEEP2, and VIPERS, supplemented with COSMOS medium-band photometric redshifts, yielding precise redshifts for 62,177 objects (2.4% of the parent sample). Using these data, Zhou used a random forest technique (described elsewhere) to compute photometric redshifts for the complete sample, producing the catalog dr3.1-0.14.0-lrg-rf-photoz-0.2.fits, which we use as input into the generation of the spectral templates.

Specifically, we generate templates using the following procedure:

  1. We focus exclusively on the subset of 62,177 LRGs with spectroscopic redshifts since we find that the training sample is fairly representative of the full sample in the four-dimensional space of g-r, r-z, z-band magnitude, and redshift. (Future versions of these templates may utilize the random forest photometric redshifts.)
  2. We use the Bayesian spectral energy distribution (SED) modeling code iSEDfit and the CKC14z simple stellar population models of Conroy et al. to model the observed SEDs of the parent sample of galaxies. The outputs of this code are the best-fitting (maximum likelihood) spectroscopic template and the marginalized posterior distributions on a large number of quantities of interest, stored in the file legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01.fits. In addition, we compute K-corrected absolute magnitudes in the DECaLS g-, r-, and z-band bandpasses and store them in the file legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01_kcorr.z0.0.fits. The code that generates these files is desi_legacysurvey_lrg_isedfit.
  3. Next, the notebook below documents how we select a "representative" subset of these galaxies to serve as our LRG template set.
  4. Finally, we generate the templates themselves using the IDL script build_lrg_basis_templates.

In [1]:
import os
import numpy as np

from astropy.table import Table, hstack
from scipy.spatial import cKDTree as KDTree

In [2]:
import matplotlib.pyplot as plt
from corner import corner

In [3]:
%matplotlib inline

Establish the random and pathnames.

In [4]:
seed = 123
rand = np.random.RandomState(seed)

In [5]:
isedfit_version = 'v1.0'
templates_version = 'v2.0'

In [6]:
isedfit_dir = os.path.join(os.getenv('DESI_ROOT'), 'spectro', 'templates', 'lrg_templates', 'isedfit', isedfit_version)

In [7]:
metadata_outfile = os.path.join(isedfit_dir, 'lrg-templates-isedfit-{}.fits'.format(isedfit_version))

Read the iSEDfit and absolute magnitude catalogs.

In [8]:
def read_isedfit():
    """Read the iSEDfit fitting results."""
    isedfile = os.path.join(isedfit_dir, 'legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01.fits.gz')
    kcorrfile = os.path.join(isedfit_dir, 'legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01_kcorr.z0.0.fits.gz')

    print('Reading {}'.format(isedfile))
    ised = Table.read(isedfile)
    print('Reading {}'.format(kcorrfile))
    kcorr = Table.read(kcorrfile)
    snrmin = 3.0
    chi2min = 10
    keep = np.where(
        (ised['CHI2'] < chi2min) * 
        (np.sum(ised['MAGGIES'] * np.sqrt(ised['IVARMAGGIES']) > snrmin, axis=1) == 5)
    print('Read {} galaxies with chi2 < {} and S/N > {} in all 5 photometric bands.'.format(
        len(keep), chi2min, snrmin))
    cat = dict()
    cat['redshift'] = kcorr['Z'][keep].data
    cat['Mstar'] = ised['MSTAR_50'][keep].data
    cat['Mg'] = kcorr['ABSMAG'][keep, 0].data
    cat['Mr'] = kcorr['ABSMAG'][keep, 1].data
    cat['Mz'] = kcorr['ABSMAG'][keep, 2].data
    cat['gr'] = cat['Mg'] - cat['Mr']
    cat['rz'] = cat['Mr'] - cat['Mz']

    with np.errstate(invalid='ignore'):
        cat['grobs'] = -2.5 * np.log10( ised['MAGGIES'][keep, 0].data / ised['MAGGIES'][keep, 1].data )
        cat['rzobs'] = -2.5 * np.log10( ised['MAGGIES'][keep, 1].data / ised['MAGGIES'][keep, 2].data )
        cat['zW1obs'] = -2.5 * np.log10( ised['MAGGIES'][keep, 2].data / ised['MAGGIES'][keep, 3].data )
    #mm = - 2.5 * np.log10(ised['MAGGIES'][keep, 0].data)
    #_ = plt.hist(mm, bins=100)
    kcorr.remove_columns(['Z', 'ISEDFIT_ID', 'MAGGIES', 'IVARMAGGIES'])
    out = hstack([ised[keep], kcorr[keep]])
    return cat, out

In [9]:
cat, ised = read_isedfit()

Reading /Users/ioannis/work/desi/spectro/templates/lrg_templates/isedfit/v1.0/legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01.fits.gz
Reading /Users/ioannis/work/desi/spectro/templates/lrg_templates/isedfit/v1.0/legacysurvey_lrg_ckc14z_kroupa01_charlot_sfhgrid01_kcorr.z0.0.fits.gz
Read 48760 galaxies with chi2 < 10 and S/N > 3.0 in all 5 photometric bands.

In [10]:

dict_keys(['Mz', 'redshift', 'rzobs', 'Mg', 'Mr', 'zW1obs', 'gr', 'Mstar', 'rz', 'grobs'])

In [11]:

<Table length=2>
07.10186257679-3.841642608270.4933051.7658e-09 .. 2.19342e-082.10202e+20 .. 1.63934e+171.95907e-09 .. 2.72933e-0803005113.40343.79652e+112.49584e+0911.35728.433798.250550.09161570.007751650.2084180.506265-1.052.1077-3.42061-3.42061-15.0-15.0-1.0-1.0-1.00-1.0-1.0-1.0-1.011.33047.997436.936510.1240430.008730570.3142160.357754-1.0-3.42019-3.42019-15.0-13.4455-1.0-1.0-1.011.3197.492196.911420.2906270.00894760.366890.349315-1.0-2.85242-3.17479-10.7666-10.4481-1.0-1.0-1.00.0592420.8562110.864430.1995980.002190140.3238970.09187060.00.7350660.4933062.934093.001250. .. 0.0714089-22.1503 .. -23.29988778.52 .. 9104.59-22.1392 .. -23.3014decam_g.par .. decam_z.par
17.23270630501-3.739003233830.5961241.21947e-09 .. 3.73271e-083.41204e+20 .. 1.61952e+171.38542e-09 .. 4.38815e-0801746114.812937.38799e+113.82096e+0911.65276.852196.734490.05884570.01442480.1744840.521158-1.052.1897-3.13147-3.13147-15.0-15.0-1.0-1.0-1.00-1.0-1.0-1.0-1.011.64976.919776.728350.1009160.01442480.4067310.254227-1.0-3.12706-3.13482-15.0-15.0-1.0-1.0-1.011.65237.065516.693760.1858810.01410040.3726660.320061-1.0-2.96507-3.12227-12.9086-12.6098-1.0-1.0-1.00.02305530.3532240.3654860.1137960.001430680.1888520.07775560.00.2490840.0244882.430042.541530. .. 0.191121-22.7973 .. -24.05729283.86 .. 18576.3-22.7938 .. -24.0487decam_g.par .. decam_z.par

Visualize the sample in the rest-frame and the observed frame.

In [12]:
zlim, Mrlim, grlim, rzlim = (0.2, 1.1), (-21, -25), (0.2, 1.2), (0.2, 0.8)

In [13]:
def qa_rest_corner():
    fig = corner(np.vstack((cat['redshift'], cat['Mr'], cat['gr'], cat['rz'])).T, 
                 labels=('Redshift', r'$M_{0.0r}$', r'$^{0.0}(g - r)$', r'$^{0.0}(r - z)$'),
                 range=(zlim, Mrlim, grlim, rzlim))

Corner plot showing the inter-relationship of various rest-frame quantities.

In [14]:

In [15]:
def qa_rest():
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    ax1.hexbin(cat['redshift'], cat['Mr'], mincnt=1, bins='log', 
    ax2.hexbin(cat['redshift'], cat['gr'], mincnt=1, bins='log', 
    ax2.set_ylabel(r'$^{0.0}(g - r)$')
    ax3.hexbin(cat['rz'], cat['gr'], mincnt=1, bins='log', 
    ax3.set_xlabel(r'$^{0.0}(r - z)$')
    ax3.set_ylabel(r'$^{0.0}(g - r)$')
    hb = ax4.hexbin(cat['Mr'], cat['gr'], mincnt=1, bins='log', 
    ax4.set_ylabel(r'$^{0.0}(g - r)$')
    cax = fig.add_axes([0.88, 0.15, 0.02, 0.7])
    fig.colorbar(hb, cax=cax, label=r'$\log_{10}$ (Number of Galaxies)')

    for aa in (ax1, ax2, ax3, ax4):
    plt.subplots_adjust(wspace=0.2, right=0.85)

Another version of the previous plot but focusing on particular relationships.

In [16]:

Now let's look at observed color-color space.

In [17]:
grobslim, rzobslim = (0.5, 3), (0.8, 2.2)

In [18]:
def qa_obs():
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    ax1.hexbin(cat['rzobs'], cat['grobs'], mincnt=1, bins='log', 
    ax1.set_xlabel(r'$(r - z)_{\rm obs}$')
    ax1.set_ylabel(r'$(g - r)_{\rm obs}$')
    hb = ax2.hexbin(cat['rzobs'], cat['zW1obs'], mincnt=1, bins='log', 
    cb = plt.colorbar(hb)
    cb.set_label(r'$\log_{10}$ (Number of Galaxies)')

    ax2.set_ylabel(r'$(r - z)_{\rm obs}$')
    ax2.set_xlabel(r'$(z - W1)_{\rm obs}$')
    for aa in (ax1, ax2):

In [19]:

Group similar templates.

Here's the black magic. The question is: what minimum set templates can be used to describe this population of LRGs. This problem is NP hard, although there are methods for solving it (see, e.g., the SetCoverPy algorithm). Here we opt for a much simpler procedure.

Based on the preceding plots, we have opted to use the rest-frame g-r and r-z color together with the r-band absolute magnitude to identify "like" galaxies. Specifically, we simply choose a random subset of NPERBIN galaxies in each 3D bin (containing at least MINCOUNT objects) of rest-frame color-color-magnitude space. This procedure has the benefit of being relatively simple and yielding diversity of templates.

In addition, the NPERBIN and MINCOUNT parameters (as well as the size of the bins) can be tuned to yield the desired number of total templates. Here, we opt for approximately ~1100 templates.

In [20]:
def group_templates(X_in, lim_in, delta_in):
    """Use scipy's binned_statistic_dd method to assigned galaxies to 
    fixed input bins.    
    from scipy.stats import binned_statistic_dd
    nbins = [np.ceil(np.ptp(lim) / delta).astype('int') for lim, delta in zip(lim_in, delta_in)]

    X = np.vstack( X_in ).T

    stat, edges, num = binned_statistic_dd(X, _, statistic='count', bins=nbins, range=lim_in)
    index = [np.where(num == ii)[0] for ii in np.unique(num)]
    count = np.array([len(aa) for aa in index])
    return index, count

The unused code in the next cell demonstrates how, e.g., redshift can be used in the template selection.

In [21]:
if False:
    X_in = (cat['redshift'], cat['gr'], cat['rz'])
    lim_in = (zlim, grlim, rzlim)
    delta_in = (0.05, 0.05, 0.05)
    X_in = (cat['Mr'], cat['gr'], cat['rz'])
    lim_in = (np.flip(Mrlim, 0), grlim, rzlim)
    delta_in = (0.1, 0.05, 0.05)

In [22]:
%time index, count = group_templates(X_in, lim_in, delta_in)

CPU times: user 201 ms, sys: 10.2 ms, total: 212 ms
Wall time: 232 ms

In [23]:
print('Identified {} color-color bins with at least one galaxy:'.format(len(index)))
for mnmx in ((1, 5), (5, 10), (10, 30), (30, 100), (100, count.max()+1)):
    cnt = np.sum( (count >= mnmx[0]) * (count < mnmx[1]) ).astype('int')
    print('  {:03} bins with {} - {}'.format(cnt, mnmx[0], mnmx[1]))

Identified 924 color-color bins with at least one galaxy:
  457 bins with 1 - 5
  111 bins with 5 - 10
  141 bins with 10 - 30
  110 bins with 30 - 100
  105 bins with 100 - 1194

In [24]:
def choose_templates(mincount=3, nperbin=2):
    """Return the indices of the chosen templates and the responsibility, 
    or the number of galaxies in that template's bin.
    if nperbin > mincount:
        print('NPERBIN should be less than or equal to MINCOUNT.')
    iarch = list()
    norm = list()
    good = np.where(count >= mincount)[0]
    for gg in good:
        iarch.append(rand.choice(index[gg], nperbin, replace=False))
    iarch = np.hstack(iarch)
    resp = count[good]
    print('Chose {} templates to represent {} galaxies.'.format(len(iarch), np.sum(resp)))
    return iarch, resp

In [25]:
iarch, resp = choose_templates()

Chose 1142 templates to represent 48294 galaxies.

In [26]:
def _markers():
    size = 50 * (1+(resp - resp.min()) / resp.ptp())
    shade = (cat['gr'][iarch] - cat['gr'][iarch].min()) / cat['gr'][iarch].ptp()
    col = plt.cm.coolwarm(shade)
    return size, col

In [27]:
def qa_templates():
    """Generate a color-color plot with the symbol size scaled by the responsibility.
    size, col = _markers()

    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

    ax1.scatter(cat['rzobs'], cat['grobs'], s=10, c='lightgray', alpha=0.5, edgecolor='k')
    ax1.scatter(cat['rzobs'][iarch], cat['grobs'][iarch], c=col, marker='o', s=size, 
                alpha=0.9, edgecolor='k')
    ax1.set_xlabel(r'$(r - z)_{\rm obs}$')
    ax1.set_ylabel(r'$(g - r)_{\rm obs}$')

    ax2.scatter(cat['zW1obs'], cat['rzobs'], s=10, c='lightgray', alpha=0.5, edgecolor='k')
    ax2.scatter(cat['zW1obs'][iarch], cat['rzobs'][iarch], c=col, marker='o', s=size, 
                alpha=0.9, edgecolor='k')
    ax2.set_ylabel(r'$(r - z)_{\rm obs}$')
    ax2.set_xlabel(r'$(z - W1)_{\rm obs}$')

    ax3.scatter(cat['rz'], cat['gr'], s=10, c='lightgray', alpha=0.5, edgecolor='k')
    ax3.scatter(cat['rz'][iarch], cat['gr'][iarch], c=col, marker='o', s=size, 
                alpha=0.9, edgecolor='k')
    ax3.set_xlabel(r'$^{0.0}(r - z)$')
    ax3.set_ylabel(r'$^{0.0}(g - r)$')
    ax4.scatter(cat['Mr'], cat['gr'], s=10, c='lightgray', alpha=0.5, edgecolor='k')
    ax4.scatter(cat['Mr'][iarch], cat['gr'][iarch], c=col, marker='o', s=size, 
                alpha=0.9, edgecolor='k')
    ax4.set_ylabel(r'$^{0.0}(g - r)$')
    for aa in (ax1, ax2, ax3, ax4):

Symbol size is proportional to "responsibility" (nperbin) and color is proportional to rest-frame g-r color.

In [28]: