Sorting Objects in Instance Catalogs

Bryce Kalmbach

This notebook provides a series of commands that take a Twinkles Phosim Instance Catalog and creates different pandas dataframes for different types of objects in the catalog. It first separates the full sets of objects in the Instance Catalogs before picking out the sprinkled strongly lensed systems for further analysis. The complete object dataframes contain:

Stars: All stars in the Instance Catalog
Galaxies: All bulge and disk components of galaxies in the Instance Catalog
AGN: All AGN in the Instance Catalog
SNe: The supernovae that are present in the Instance Catalog

Then there are sprinkled strongly lensed systems dataframes containing:

Sprinkled AGN galaxies: The images of the lensed AGNs
Lens Galaxies: These are the foreground galaxies in the lens system.
(Not Default) Sprinkled AGN Host galaxies: While these were turned off in Run 1 of Twinkles the original motivation for this notebook was to find these objects in a catalog to help development of lensed hosts at the DESC 2017 SLAC Collaboration Meeting Hack Day.

Requirements

If you already have an instance catalog from Twinkles on hand all you need now are:

Pandas
Numpy



In [1]:

    
import pandas as pd
import numpy as np

Parsing the Instance Catalog

Here we run through the instance catalog and store which rows belong to which class of object. This is necessary since the catalog objects do not all have the same number of properties so we cannot just import them all and then sort within a dataframe.



In [2]:

    
filename = 'twinkles_phosim_input_230.txt'



In [3]:

    
i = 0
not_star_rows = []
not_galaxy_rows = []
not_agn_rows = []
not_sne_rows = []
with open(filename, 'r') as f:
    for line in f:
        new_str = line.split(' ')
        #Skip through the header
        if len(new_str) < 4:
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
            i+=1
            continue
        if new_str[5].startswith('starSED'):
            #star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('galaxySED'):
            #galaxy_rows.append(i)
            not_star_rows.append(i)
            not_agn_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('agnSED'):
            #agn_rows.append(i)
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_sne_rows.append(i)
        elif new_str[5].startswith('spectra_files'):
            #sne_rows.append(i)
            not_star_rows.append(i)
            not_galaxy_rows.append(i)
            not_agn_rows.append(i)
        i += 1

Populating Dataframes

Now we load the dataframes for the overall sets of objects.



In [4]:

    
df_star = pd.read_csv(filename, delimiter=' ', header=None,
                      names = ['prefix', 'uniqueId', 'raPhosim', 'decPhoSim', 
                               'phoSimMagNorm', 'sedFilepath', 'redshift', 
                               'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                               'spatialmodel', 'internalExtinctionModel',
                               'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                      skiprows=not_star_rows)



In [5]:

    
df_star[:3]









    Out[5]:






  
    
      
      prefix
      uniqueId
      raPhosim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      raOffset
      decOffset
      spatialmodel
      internalExtinctionModel
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      0
      object
      16285577220
      52.904543
      -27.188795
      25.270918
      starSED/wDs/bergeron_5000_90.dat_5100.gz
      0
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.024576
      3.1
    
    
      1
      object
      16288599044
      52.965934
      -27.207791
      24.626965
      starSED/wDs/bergeron_4500_80.dat_4700.gz
      0
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.023116
      3.1
    
    
      2
      object
      16291377156
      52.780645
      -27.258553
      22.887791
      starSED/wDs/bergeron_5500_80.dat_5700.gz
      0
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.034913
      3.1



In [6]:

    
df_galaxy = pd.read_csv(filename, delimiter=' ', header=None, 
                        names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim', 
                              'phoSimMagNorm', 'sedFilepath',
                              'redshift', 'shear1', 'shear2', 'kappa', 
                              'raOffset', 'decOffset', 'spatialmodel', 
                              'majorAxis', 'minorAxis', 'positionAngle', 'sindex',
                              'internalExtinctionModel', 'internalAv', 'internalRv',
                              'galacticExtinctionModel', 'galacticAv', 'galacticRv'], 
                        skiprows=not_galaxy_rows)



In [7]:

    
df_galaxy[:3]









    Out[7]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      ...
      majorAxis
      minorAxis
      positionAngle
      sindex
      internalExtinctionModel
      internalAv
      internalRv
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      0
      object
      140314
      53.012914
      -27.540978
      17.560391
      galaxySED/Inst.32E09.02Z.spec.gz
      0.104306
      0
      0
      0
      ...
      0.000016
      0.000014
      2.348059
      4
      CCM
      0.5
      3.1
      CCM
      0.027387
      3.1
    
    
      1
      object
      145434
      53.071411
      -27.316559
      15.659390
      galaxySED/Inst.25E09.02Z.spec.gz
      0.079068
      0
      0
      0
      ...
      0.000024
      0.000024
      3.151106
      4
      CCM
      0.6
      3.1
      CCM
      0.021410
      3.1
    
    
      2
      object
      148506
      53.173385
      -27.282062
      16.810310
      galaxySED/Burst.32E09.02Z.spec.gz
      0.079037
      0
      0
      0
      ...
      0.000008
      0.000007
      4.081670
      4
      CCM
      0.6
      3.1
      CCM
      0.018425
      3.1
    
  

3 rows × 23 columns



In [8]:

    
df_agn = pd.read_csv(filename, delimiter=' ', header=None, 
                     names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim', 
                            'phoSimMagNorm', 'sedFilepath', 'redshift', 
                            'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                            'spatialmodel', 'internalExtinctionModel',
                            'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                     skiprows = not_agn_rows)



In [9]:

    
df_agn[:3]









    Out[9]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      raOffset
      decOffset
      spatialmodel
      internalExtinctionModel
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      0
      object
      3560476
      52.900385
      -27.306074
      22.419379
      agnSED/agn.spec.gz
      0.203887
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.027625
      3.1
    
    
      1
      object
      9830428
      53.104649
      -27.402024
      22.365648
      agnSED/agn.spec.gz
      0.153405
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.023402
      3.1
    
    
      2
      object
      14275612
      52.768229
      -27.536270
      25.258118
      agnSED/agn.spec.gz
      0.303009
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.030570
      3.1



In [10]:

    
df_sne = pd.read_csv(filename, delimiter=' ', header=None, 
                     names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim',
                            'phoSimMagNorm', 'shorterFileNames', 'redshift',
                            'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
                            'spatialmodel', 'internalExtinctionModel',
                            'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
                     skiprows = not_sne_rows)



In [11]:

    
df_sne[:3]









    Out[11]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      shorterFileNames
      redshift
      shear1
      shear2
      kappa
      raOffset
      decOffset
      spatialmodel
      internalExtinctionModel
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      0
      object
      49307690
      52.979646
      -27.228909
      25.950281
      spectra_files/specFile_48152_59580.1409_i.dat
      0.5446
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.023483
      3.1
    
    
      1
      object
      58179626
      52.729770
      -27.305787
      24.247262
      spectra_files/specFile_56816_59580.1409_i.dat
      0.9345
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.033048
      3.1
    
    
      2
      object
      68080682
      53.196934
      -27.564026
      22.229618
      spectra_files/specFile_66485_59580.1409_i.dat
      0.3143
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.027329
      3.1

Sort out sprinkled Strong Lensing Systems

Now we will pick out the pieces of strongly lensed systems that were sprinkled into the instance catalogs for the Twinkles project.

Lensed AGN

We start with the Lensed AGN. In Twinkles Instance Catalogs the lensed AGN have larger uniqueIds than normal since we added information about the systems into the uniqueIds. We use this to find them in the AGN dataframe.



In [12]:

    
sprinkled_agn = df_agn[df_agn['uniqueId'] > 20000000000]

Below we see a pair of lensed images from a double.



In [13]:

    
sprinkled_agn[:2]









    Out[13]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      raOffset
      decOffset
      spatialmodel
      internalExtinctionModel
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      338
      object
      213934178332
      53.054049
      -27.703151
      21.105901
      agnSED/agn.spec.gz
      0.48
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.029784
      3.1
    
    
      339
      object
      213934179356
      53.053503
      -27.702761
      22.614289
      agnSED/agn.spec.gz
      0.48
      0
      0
      0
      0
      0
      point
      none
      CCM
      0.029791
      3.1

Now we will extract the extra information we have stored in the uniqueId. This information is the Twinkles System number in our custom OM10 catalog in the data directory in Twinkles and the Twinkles Image Number which identifies which image in that particular system refers to that line in the catalog.



In [14]:

    
# This step undoes the step in CatSim that gives each component of a galaxy a different offset
twinkles_nums = []
for agn_id in sprinkled_agn['uniqueId']:
    twinkles_ids = np.right_shift(agn_id-28, 10)
    twinkles_nums.append(twinkles_ids)



In [15]:

    
#This parses the information added in the last 4 digits of the unshifted ID
twinkles_system_num = []
twinkles_img_num = []
for lens_system in twinkles_nums:
    lens_system = str(lens_system)
    twinkles_id = lens_system[-4:]
    twinkles_id = np.int(twinkles_id)
    twinkles_base = np.int(np.floor(twinkles_id/4))
    twinkles_img = twinkles_id % 4
    twinkles_system_num.append(twinkles_base)
    twinkles_img_num.append(twinkles_img)

We once again look at the two images we showed earlier. We see that they are image 0 and image 1 from Twinkles System 24.



In [16]:

    
print twinkles_system_num[:2], twinkles_img_num[:2]









    



[24, 24] [0, 1]

We now add this information into our sprinkled AGN dataframe and reset the indices.



In [17]:

    
sprinkled_agn = sprinkled_agn.reset_index(drop=True)
sprinkled_agn['twinkles_system'] = twinkles_system_num
sprinkled_agn['twinkles_img_num'] = twinkles_img_num



In [18]:

    
sprinkled_agn.iloc[:2, [1, 2, 3, -2, -1]]









    Out[18]:






  
    
      
      uniqueId
      raPhoSim
      decPhoSim
      twinkles_system
      twinkles_img_num
    
  
  
    
      0
      213934178332
      53.054049
      -27.703151
      24
      0
    
    
      1
      213934179356
      53.053503
      -27.702761
      24
      1

The last step is to now add a column with the lens galaxy uniqueId for each system so that we can cross-reference between the lensed AGN and the lens galaxy dataframe we will create next. We start by finding the uniqueIds for the lens galaxies.



In [19]:

    
#The lens galaxy ids do not have the extra 4 digits at the end so we remove them
#and then do the shift back to the `uniqueID`.
lens_gal_ids = np.left_shift((np.array(twinkles_nums))/10000, 10) + 26



In [20]:

    
sprinkled_agn['lens_galaxy_uID'] = lens_gal_ids

We now see that the same system has the same lens galaxy uniqueId as we expect.



In [21]:

    
sprinkled_agn.iloc[:2, [1, 2, 3, -3, -2, -1]]









    Out[21]:






  
    
      
      uniqueId
      raPhoSim
      decPhoSim
      twinkles_system
      twinkles_img_num
      lens_galaxy_uID
    
  
  
    
      0
      213934178332
      53.054049
      -27.703151
      24
      0
      21393434
    
    
      1
      213934179356
      53.053503
      -27.702761
      24
      1
      21393434

Lens Galaxies

Now we will create a dataframe with the Lens Galaxies.



In [22]:

    
lens_gal_locs = []
for idx in lens_gal_ids:
    lens_gal_locs.append(np.where(df_galaxy['uniqueId'] == idx)[0])

lens_gals = df_galaxy.iloc[np.unique(lens_gal_locs)]
lens_gals = lens_gals.reset_index(drop=True)

We now have the lens galaxies in their own dataframe that can be joined on the lensed AGN dataframe by the uniqueId.



In [23]:

    
lens_gals[:1]









    Out[23]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      ...
      majorAxis
      minorAxis
      positionAngle
      sindex
      internalExtinctionModel
      internalAv
      internalRv
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      0
      object
      21393434
      53.053595
      -27.702812
      17.830105
      galaxySED/Inst.25E09.02Z.spec.gz
      0.184
      0
      0
      0
      ...
      0.000008
      0.000006
      0.992165
      4
      CCM
      0.2
      3.1
      CCM
      0.02979
      3.1
    
  

1 rows × 23 columns

And we can check how many systems there are by checking the length of this dataframe.



In [24]:

    
len(lens_gals)









    Out[24]:





198

Showing that we 198 systems in the Twinkles field!

Lensed AGN Host Galaxies (Not in Twinkles 1 catalogs)

In Twinkles 1 catalogs we do not have host galaxies around our lensed AGN, but in the future we will want to be able to include this. We experimented with this at the 2017 DESC SLAC Collaboration Meeting Hack Day since Nan Li, Matt Wiesner and others are working adding lensed hosts into images.

Therefore, I have included the capacity to find the host galaxies here for future use.

To start we once again cut based on the uniqueId which will be larger than a normal galaxy.



In [25]:

    
host_gals = df_galaxy[df_galaxy['uniqueId'] > 178465239310]



In [26]:

    
host_gals = df_galaxy[df_galaxy['uniqueId'] > 170000000000]



In [27]:

    
host_gals[:2]









    Out[27]:






  
    
      
      prefix
      uniqueId
      raPhoSim
      decPhoSim
      phoSimMagNorm
      sedFilepath
      redshift
      shear1
      shear2
      kappa
      ...
      majorAxis
      minorAxis
      positionAngle
      sindex
      internalExtinctionModel
      internalAv
      internalRv
      galacticExtinctionModel
      galacticAv
      galacticRv
    
  
  
    
      8028
      object
      213934178330
      53.054049
      -27.703151
      20.079109
      galaxySED/Inst.50E09.04Z.spec.gz
      0.48
      0
      0
      0
      ...
      0.000005
      0.000005
      5.347467
      4
      CCM
      0.2
      3.1
      CCM
      0.029784
      3.1
    
    
      8029
      object
      213934179354
      53.053503
      -27.702761
      20.079109
      galaxySED/Inst.50E09.04Z.spec.gz
      0.48
      0
      0
      0
      ...
      0.000005
      0.000005
      5.347467
      4
      CCM
      0.2
      3.1
      CCM
      0.029791
      3.1
    
  

2 rows × 23 columns

Then like the lensed AGN we add in the info from the longer Ids and the lens galaxy info along with resetting the index.



In [28]:

    
twinkles_gal_nums = []
for gal_id in host_gals['uniqueId']:
    twinkles_ids = np.right_shift(gal_id-26, 10)
    twinkles_gal_nums.append(twinkles_ids)



In [29]:

    
host_twinkles_system_num = []
host_twinkles_img_num = []
for host_gal in twinkles_gal_nums:
    host_gal = str(host_gal)
    host_twinkles_id = host_gal[-4:]
    host_twinkles_id = np.int(host_twinkles_id)
    host_twinkles_base = np.int(np.floor(host_twinkles_id/4))
    host_twinkles_img = host_twinkles_id % 4
    host_twinkles_system_num.append(host_twinkles_base)
    host_twinkles_img_num.append(host_twinkles_img)



In [30]:

    
host_lens_gal_ids = np.left_shift((np.array(twinkles_gal_nums))/10000, 10) + 26



In [31]:

    
host_gals = host_gals.reset_index(drop=True)
host_gals['twinkles_system'] = host_twinkles_system_num
host_gals['twinkles_img_num'] = host_twinkles_img_num
host_gals['lens_galaxy_uID'] = host_lens_gal_ids



In [32]:

    
host_gals.iloc[:2, [1, 2, 3, -3, -2, -1]]









    Out[32]:






  
    
      
      uniqueId
      raPhoSim
      decPhoSim
      twinkles_system
      twinkles_img_num
      lens_galaxy_uID
    
  
  
    
      0
      213934178330
      53.054049
      -27.703151
      24
      0
      21393434
    
    
      1
      213934179354
      53.053503
      -27.702761
      24
      1
      21393434

Notice that there are different numbers of sprinkled AGN and host galaxy entries.



In [33]:

    
len(sprinkled_agn), len(host_gals)









    Out[33]:





(468, 844)

This is because some host galaxies have both bulge and disk components, but not all do. The example we have been using does have both components and thus we have four entries for the doubly lensed system in the host galaxy dataframe.



In [34]:

    
host_gals[host_gals['lens_galaxy_uID'] == 21393434].iloc[:, [1, 2, 3, -3, -2, -1]]









    Out[34]:






  
    
      
      uniqueId
      raPhoSim
      decPhoSim
      twinkles_system
      twinkles_img_num
      lens_galaxy_uID
    
  
  
    
      0
      213934178330
      53.054049
      -27.703151
      24
      0
      21393434
    
    
      1
      213934179354
      53.053503
      -27.702761
      24
      1
      21393434
    
    
      70
      213934178331
      53.054049
      -27.703151
      24
      0
      21393434
    
    
      71
      213934179355
      53.053503
      -27.702761
      24
      1
      21393434

Final Thoughts

The main point of being able to break up the instance catalogs like this is for validation and future development. Being able to find the sprinkled input for Twinkles images helps us validate what appears in our output catalogs. Storing this input in pandas dataframes makes it easy to find and compare against the output catalogs that are accessed using tools in the DESC Monitor. In addition, this is a useful tool for future development like the creation of lensed images for the AGN host galaxies that we hope to add in the next iteration of Twinkles.

	prefix	uniqueId	raPhosim	decPhoSim	phoSimMagNorm	sedFilepath	spatialmodel	internalExtinctionModel	galacticExtinctionModel	galacticAv	galacticRv
0	object	16285577220	52.904543	-27.188795	25.270918	starSED/wDs/bergeron_5000_90.dat_5100.gz	point	none	CCM	0.024576	3.1
1	object	16288599044	52.965934	-27.207791	24.626965	starSED/wDs/bergeron_4500_80.dat_4700.gz	point	none	CCM	0.023116	3.1
2	object	16291377156	52.780645	-27.258553	22.887791	starSED/wDs/bergeron_5500_80.dat_5700.gz	point	none	CCM	0.034913	3.1

	prefix	uniqueId	raPhoSim	decPhoSim	phoSimMagNorm	sedFilepath	redshift	...	majorAxis	minorAxis	positionAngle	sindex	internalExtinctionModel	internalAv	internalRv	galacticExtinctionModel	galacticAv	galacticRv
0	object	140314	53.012914	-27.540978	17.560391	galaxySED/Inst.32E09.02Z.spec.gz	0.104306	...	0.000016	0.000014	2.348059	4	CCM	0.5	3.1	CCM	0.027387	3.1
1	object	145434	53.071411	-27.316559	15.659390	galaxySED/Inst.25E09.02Z.spec.gz	0.079068	...	0.000024	0.000024	3.151106	4	CCM	0.6	3.1	CCM	0.021410	3.1
2	object	148506	53.173385	-27.282062	16.810310	galaxySED/Burst.32E09.02Z.spec.gz	0.079037	...	0.000008	0.000007	4.081670	4	CCM	0.6	3.1	CCM	0.018425	3.1

	prefix	uniqueId	raPhoSim	decPhoSim	phoSimMagNorm	sedFilepath	redshift	spatialmodel	internalExtinctionModel	galacticExtinctionModel	galacticAv	galacticRv
0	object	3560476	52.900385	-27.306074	22.419379	agnSED/agn.spec.gz	0.203887	point	none	CCM	0.027625	3.1
1	object	9830428	53.104649	-27.402024	22.365648	agnSED/agn.spec.gz	0.153405	point	none	CCM	0.023402	3.1
2	object	14275612	52.768229	-27.536270	25.258118	agnSED/agn.spec.gz	0.303009	point	none	CCM	0.030570	3.1

	prefix	uniqueId	raPhoSim	decPhoSim	phoSimMagNorm	shorterFileNames	redshift	spatialmodel	internalExtinctionModel	galacticExtinctionModel	galacticAv	galacticRv
0	object	49307690	52.979646	-27.228909	25.950281	spectra_files/specFile_48152_59580.1409_i.dat	0.5446	point	none	CCM	0.023483	3.1
1	object	58179626	52.729770	-27.305787	24.247262	spectra_files/specFile_56816_59580.1409_i.dat	0.9345	point	none	CCM	0.033048	3.1
2	object	68080682	53.196934	-27.564026	22.229618	spectra_files/specFile_66485_59580.1409_i.dat	0.3143	point	none	CCM	0.027329	3.1

	prefix	uniqueId	raPhoSim	decPhoSim	phoSimMagNorm	sedFilepath	redshift	shear1	shear2	kappa	raOffset	decOffset	spatialmodel	internalExtinctionModel	galacticExtinctionModel	galacticAv	galacticRv
338	object	213934178332	53.054049	-27.703151	21.105901	agnSED/agn.spec.gz	0.48	0	0	0	0	0	point	none	CCM	0.029784	3.1
339	object	213934179356	53.053503	-27.702761	22.614289	agnSED/agn.spec.gz	0.48	0	0	0	0	0	point	none	CCM	0.029791	3.1

	prefix	uniqueId	raPhoSim	decPhoSim	phoSimMagNorm	sedFilepath	redshift	shear1	shear2	kappa	...	majorAxis	minorAxis	positionAngle	sindex	internalExtinctionModel	internalAv	internalRv	galacticExtinctionModel	galacticAv	galacticRv
8028	object	213934178330	53.054049	-27.703151	20.079109	galaxySED/Inst.50E09.04Z.spec.gz	0.48	0	0	0	...	0.000005	0.000005	5.347467	4	CCM	0.2	3.1	CCM	0.029784	3.1
8029	object	213934179354	53.053503	-27.702761	20.079109	galaxySED/Inst.50E09.04Z.spec.gz	0.48	0	0	0	...	0.000005	0.000005	5.347467	4	CCM	0.2	3.1	CCM	0.029791	3.1