Sorting Objects in Instance Catalogs

Bryce Kalmbach

This notebook provides a series of commands that take a Twinkles Phosim Instance Catalog and creates different pandas dataframes for different types of objects in the catalog. It first separates the full sets of objects in the Instance Catalogs before picking out the sprinkled strongly lensed systems for further analysis. The complete object dataframes contain:

  • Stars: All stars in the Instance Catalog
  • Galaxies: All bulge and disk components of galaxies in the Instance Catalog
  • AGN: All AGN in the Instance Catalog
  • SNe: The supernovae that are present in the Instance Catalog

Then there are sprinkled strongly lensed systems dataframes containing:

  • Sprinkled AGN galaxies: The images of the lensed AGNs
  • Lens Galaxies: These are the foreground galaxies in the lens system.
  • (Not Default) Sprinkled AGN Host galaxies: While these were turned off in Run 1 of Twinkles the original motivation for this notebook was to find these objects in a catalog to help development of lensed hosts at the DESC 2017 SLAC Collaboration Meeting Hack Day.

Requirements

If you already have an instance catalog from Twinkles on hand all you need now are:

  • Pandas
  • Numpy

In [1]:
import pandas as pd
import numpy as np

Parsing the Instance Catalog

Here we run through the instance catalog and store which rows belong to which class of object. This is necessary since the catalog objects do not all have the same number of properties so we cannot just import them all and then sort within a dataframe.


In [2]:
filename = 'twinkles_phosim_input_230.txt'

In [3]:
i = 0
not_star_rows = []
not_galaxy_rows = []
not_agn_rows = []
not_sne_rows = []
with open(filename, 'r') as f:
    for line in f:
        new_str = line.split(' ')
        #Skip through the header
        if len(new_str) < 4:
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
            i+=1
            continue
        if new_str[5].startswith('starSED'):
            #star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('galaxySED'):
            #galaxy_rows.append(i)
            not_star_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('agnSED'):
            #agn_rows.append(i)
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('spectra_files'):
            #sne_rows.append(i)
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
        i += 1

Populating Dataframes

Now we load the dataframes for the overall sets of objects.


In [4]:
df_star = pd.read_csv(filename, delimiter=' ', header=None,
                      names = ['prefix', 'uniqueId', 'raPhosim', 'decPhoSim', 
                               'phoSimMagNorm', 'sedFilepath', 'redshift', 
                               'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                               'spatialmodel', 'internalExtinctionModel',
                               'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                      skiprows=not_star_rows)

In [5]:
df_star[:3]


Out[5]:
prefix uniqueId raPhosim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa raOffset decOffset spatialmodel internalExtinctionModel galacticExtinctionModel galacticAv galacticRv
0 object 16285577220 52.904543 -27.188795 25.270918 starSED/wDs/bergeron_5000_90.dat_5100.gz 0 0 0 0 0 0 point none CCM 0.024576 3.1
1 object 16288599044 52.965934 -27.207791 24.626965 starSED/wDs/bergeron_4500_80.dat_4700.gz 0 0 0 0 0 0 point none CCM 0.023116 3.1
2 object 16291377156 52.780645 -27.258553 22.887791 starSED/wDs/bergeron_5500_80.dat_5700.gz 0 0 0 0 0 0 point none CCM 0.034913 3.1

In [6]:
df_galaxy = pd.read_csv(filename, delimiter=' ', header=None, 
                        names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim', 
                              'phoSimMagNorm', 'sedFilepath',
                              'redshift', 'shear1', 'shear2', 'kappa', 
                              'raOffset', 'decOffset', 'spatialmodel', 
                              'majorAxis', 'minorAxis', 'positionAngle', 'sindex',
                              'internalExtinctionModel', 'internalAv', 'internalRv',
                              'galacticExtinctionModel', 'galacticAv', 'galacticRv'], 
                        skiprows=not_galaxy_rows)

In [7]:
df_galaxy[:3]


Out[7]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa ... majorAxis minorAxis positionAngle sindex internalExtinctionModel internalAv internalRv galacticExtinctionModel galacticAv galacticRv
0 object 140314 53.012914 -27.540978 17.560391 galaxySED/Inst.32E09.02Z.spec.gz 0.104306 0 0 0 ... 0.000016 0.000014 2.348059 4 CCM 0.5 3.1 CCM 0.027387 3.1
1 object 145434 53.071411 -27.316559 15.659390 galaxySED/Inst.25E09.02Z.spec.gz 0.079068 0 0 0 ... 0.000024 0.000024 3.151106 4 CCM 0.6 3.1 CCM 0.021410 3.1
2 object 148506 53.173385 -27.282062 16.810310 galaxySED/Burst.32E09.02Z.spec.gz 0.079037 0 0 0 ... 0.000008 0.000007 4.081670 4 CCM 0.6 3.1 CCM 0.018425 3.1

3 rows × 23 columns


In [8]:
df_agn = pd.read_csv(filename, delimiter=' ', header=None, 
                     names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim', 
                            'phoSimMagNorm', 'sedFilepath', 'redshift', 
                            'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                            'spatialmodel', 'internalExtinctionModel',
                            'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                     skiprows = not_agn_rows)

In [9]:
df_agn[:3]


Out[9]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa raOffset decOffset spatialmodel internalExtinctionModel galacticExtinctionModel galacticAv galacticRv
0 object 3560476 52.900385 -27.306074 22.419379 agnSED/agn.spec.gz 0.203887 0 0 0 0 0 point none CCM 0.027625 3.1
1 object 9830428 53.104649 -27.402024 22.365648 agnSED/agn.spec.gz 0.153405 0 0 0 0 0 point none CCM 0.023402 3.1
2 object 14275612 52.768229 -27.536270 25.258118 agnSED/agn.spec.gz 0.303009 0 0 0 0 0 point none CCM 0.030570 3.1

In [10]:
df_sne = pd.read_csv(filename, delimiter=' ', header=None, 
                     names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim',
                            'phoSimMagNorm', 'shorterFileNames', 'redshift',
                            'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                            'spatialmodel', 'internalExtinctionModel',
                            'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                     skiprows = not_sne_rows)

In [11]:
df_sne[:3]


Out[11]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm shorterFileNames redshift shear1 shear2 kappa raOffset decOffset spatialmodel internalExtinctionModel galacticExtinctionModel galacticAv galacticRv
0 object 49307690 52.979646 -27.228909 25.950281 spectra_files/specFile_48152_59580.1409_i.dat 0.5446 0 0 0 0 0 point none CCM 0.023483 3.1
1 object 58179626 52.729770 -27.305787 24.247262 spectra_files/specFile_56816_59580.1409_i.dat 0.9345 0 0 0 0 0 point none CCM 0.033048 3.1
2 object 68080682 53.196934 -27.564026 22.229618 spectra_files/specFile_66485_59580.1409_i.dat 0.3143 0 0 0 0 0 point none CCM 0.027329 3.1

Sort out sprinkled Strong Lensing Systems

Now we will pick out the pieces of strongly lensed systems that were sprinkled into the instance catalogs for the Twinkles project.

Lensed AGN

We start with the Lensed AGN. In Twinkles Instance Catalogs the lensed AGN have larger uniqueIds than normal since we added information about the systems into the uniqueIds. We use this to find them in the AGN dataframe.


In [12]:
sprinkled_agn = df_agn[df_agn['uniqueId'] > 20000000000]

Below we see a pair of lensed images from a double.


In [13]:
sprinkled_agn[:2]


Out[13]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa raOffset decOffset spatialmodel internalExtinctionModel galacticExtinctionModel galacticAv galacticRv
338 object 213934178332 53.054049 -27.703151 21.105901 agnSED/agn.spec.gz 0.48 0 0 0 0 0 point none CCM 0.029784 3.1
339 object 213934179356 53.053503 -27.702761 22.614289 agnSED/agn.spec.gz 0.48 0 0 0 0 0 point none CCM 0.029791 3.1

Now we will extract the extra information we have stored in the uniqueId. This information is the Twinkles System number in our custom OM10 catalog in the data directory in Twinkles and the Twinkles Image Number which identifies which image in that particular system refers to that line in the catalog.


In [14]:
# This step undoes the step in CatSim that gives each component of a galaxy a different offset
twinkles_nums = []
for agn_id in sprinkled_agn['uniqueId']:
    twinkles_ids = np.right_shift(agn_id-28, 10)
    twinkles_nums.append(twinkles_ids)

In [15]:
#This parses the information added in the last 4 digits of the unshifted ID
twinkles_system_num = []
twinkles_img_num = []
for lens_system in twinkles_nums:
    lens_system = str(lens_system)
    twinkles_id = lens_system[-4:]
    twinkles_id = np.int(twinkles_id)
    twinkles_base = np.int(np.floor(twinkles_id/4))
    twinkles_img = twinkles_id % 4
    twinkles_system_num.append(twinkles_base)
    twinkles_img_num.append(twinkles_img)

We once again look at the two images we showed earlier. We see that they are image 0 and image 1 from Twinkles System 24.


In [16]:
print twinkles_system_num[:2], twinkles_img_num[:2]


[24, 24] [0, 1]

We now add this information into our sprinkled AGN dataframe and reset the indices.


In [17]:
sprinkled_agn = sprinkled_agn.reset_index(drop=True)
sprinkled_agn['twinkles_system'] = twinkles_system_num
sprinkled_agn['twinkles_img_num'] = twinkles_img_num

In [18]:
sprinkled_agn.iloc[:2, [1, 2, 3, -2, -1]]


Out[18]:
uniqueId raPhoSim decPhoSim twinkles_system twinkles_img_num
0 213934178332 53.054049 -27.703151 24 0
1 213934179356 53.053503 -27.702761 24 1

The last step is to now add a column with the lens galaxy uniqueId for each system so that we can cross-reference between the lensed AGN and the lens galaxy dataframe we will create next. We start by finding the uniqueIds for the lens galaxies.


In [19]:
#The lens galaxy ids do not have the extra 4 digits at the end so we remove them
#and then do the shift back to the `uniqueID`.
lens_gal_ids = np.left_shift((np.array(twinkles_nums))/10000, 10) + 26

In [20]:
sprinkled_agn['lens_galaxy_uID'] = lens_gal_ids

We now see that the same system has the same lens galaxy uniqueId as we expect.


In [21]:
sprinkled_agn.iloc[:2, [1, 2, 3, -3, -2, -1]]


Out[21]:
uniqueId raPhoSim decPhoSim twinkles_system twinkles_img_num lens_galaxy_uID
0 213934178332 53.054049 -27.703151 24 0 21393434
1 213934179356 53.053503 -27.702761 24 1 21393434

Lens Galaxies

Now we will create a dataframe with the Lens Galaxies.


In [22]:
lens_gal_locs = []
for idx in lens_gal_ids:
    lens_gal_locs.append(np.where(df_galaxy['uniqueId'] == idx)[0])

lens_gals = df_galaxy.iloc[np.unique(lens_gal_locs)]
lens_gals = lens_gals.reset_index(drop=True)

We now have the lens galaxies in their own dataframe that can be joined on the lensed AGN dataframe by the uniqueId.


In [23]:
lens_gals[:1]


Out[23]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa ... majorAxis minorAxis positionAngle sindex internalExtinctionModel internalAv internalRv galacticExtinctionModel galacticAv galacticRv
0 object 21393434 53.053595 -27.702812 17.830105 galaxySED/Inst.25E09.02Z.spec.gz 0.184 0 0 0 ... 0.000008 0.000006 0.992165 4 CCM 0.2 3.1 CCM 0.02979 3.1

1 rows × 23 columns

And we can check how many systems there are by checking the length of this dataframe.


In [24]:
len(lens_gals)


Out[24]:
198

Showing that we 198 systems in the Twinkles field!

Lensed AGN Host Galaxies (Not in Twinkles 1 catalogs)

In Twinkles 1 catalogs we do not have host galaxies around our lensed AGN, but in the future we will want to be able to include this. We experimented with this at the 2017 DESC SLAC Collaboration Meeting Hack Day since Nan Li, Matt Wiesner and others are working adding lensed hosts into images.

Therefore, I have included the capacity to find the host galaxies here for future use.

To start we once again cut based on the uniqueId which will be larger than a normal galaxy.


In [25]:
host_gals = df_galaxy[df_galaxy['uniqueId'] > 178465239310]

In [26]:
host_gals = df_galaxy[df_galaxy['uniqueId'] > 170000000000]

In [27]:
host_gals[:2]


Out[27]:
prefix uniqueId raPhoSim decPhoSim phoSimMagNorm sedFilepath redshift shear1 shear2 kappa ... majorAxis minorAxis positionAngle sindex internalExtinctionModel internalAv internalRv galacticExtinctionModel galacticAv galacticRv
8028 object 213934178330 53.054049 -27.703151 20.079109 galaxySED/Inst.50E09.04Z.spec.gz 0.48 0 0 0 ... 0.000005 0.000005 5.347467 4 CCM 0.2 3.1 CCM 0.029784 3.1
8029 object 213934179354 53.053503 -27.702761 20.079109 galaxySED/Inst.50E09.04Z.spec.gz 0.48 0 0 0 ... 0.000005 0.000005 5.347467 4 CCM 0.2 3.1 CCM 0.029791 3.1

2 rows × 23 columns

Then like the lensed AGN we add in the info from the longer Ids and the lens galaxy info along with resetting the index.


In [28]:
twinkles_gal_nums = []
for gal_id in host_gals['uniqueId']:
    twinkles_ids = np.right_shift(gal_id-26, 10)
    twinkles_gal_nums.append(twinkles_ids)

In [29]:
host_twinkles_system_num = []
host_twinkles_img_num = []
for host_gal in twinkles_gal_nums:
    host_gal = str(host_gal)
    host_twinkles_id = host_gal[-4:]
    host_twinkles_id = np.int(host_twinkles_id)
    host_twinkles_base = np.int(np.floor(host_twinkles_id/4))
    host_twinkles_img = host_twinkles_id % 4
    host_twinkles_system_num.append(host_twinkles_base)
    host_twinkles_img_num.append(host_twinkles_img)

In [30]:
host_lens_gal_ids = np.left_shift((np.array(twinkles_gal_nums))/10000, 10) + 26

In [31]:
host_gals = host_gals.reset_index(drop=True)
host_gals['twinkles_system'] = host_twinkles_system_num
host_gals['twinkles_img_num'] = host_twinkles_img_num
host_gals['lens_galaxy_uID'] = host_lens_gal_ids

In [32]:
host_gals.iloc[:2, [1, 2, 3, -3, -2, -1]]


Out[32]:
uniqueId raPhoSim decPhoSim twinkles_system twinkles_img_num lens_galaxy_uID
0 213934178330 53.054049 -27.703151 24 0 21393434
1 213934179354 53.053503 -27.702761 24 1 21393434

Notice that there are different numbers of sprinkled AGN and host galaxy entries.


In [33]:
len(sprinkled_agn), len(host_gals)


Out[33]:
(468, 844)

This is because some host galaxies have both bulge and disk components, but not all do. The example we have been using does have both components and thus we have four entries for the doubly lensed system in the host galaxy dataframe.


In [34]:
host_gals[host_gals['lens_galaxy_uID'] == 21393434].iloc[:, [1, 2, 3, -3, -2, -1]]


Out[34]:
uniqueId raPhoSim decPhoSim twinkles_system twinkles_img_num lens_galaxy_uID
0 213934178330 53.054049 -27.703151 24 0 21393434
1 213934179354 53.053503 -27.702761 24 1 21393434
70 213934178331 53.054049 -27.703151 24 0 21393434
71 213934179355 53.053503 -27.702761 24 1 21393434

Final Thoughts

The main point of being able to break up the instance catalogs like this is for validation and future development. Being able to find the sprinkled input for Twinkles images helps us validate what appears in our output catalogs. Storing this input in pandas dataframes makes it easy to find and compare against the output catalogs that are accessed using tools in the DESC Monitor. In addition, this is a useful tool for future development like the creation of lensed images for the AGN host galaxies that we hope to add in the next iteration of Twinkles.