Bryce Kalmbach
This notebook provides a series of commands that take a Twinkles Phosim Instance Catalog and creates different pandas dataframes for different types of objects in the catalog. It first separates the full sets of objects in the Instance Catalogs before picking out the sprinkled strongly lensed systems for further analysis. The complete object dataframes contain:
Then there are sprinkled strongly lensed systems dataframes containing:
If you already have an instance catalog from Twinkles on hand all you need now are:
In [1]:
import pandas as pd
import numpy as np
In [2]:
filename = 'twinkles_phosim_input_230.txt'
In [3]:
i = 0
not_star_rows = []
not_galaxy_rows = []
not_agn_rows = []
not_sne_rows = []
with open(filename, 'r') as f:
for line in f:
new_str = line.split(' ')
#Skip through the header
if len(new_str) < 4:
not_star_rows.append(i)
not_galaxy_rows.append(i)
not_agn_rows.append(i)
not_sne_rows.append(i)
i+=1
continue
if new_str[5].startswith('starSED'):
#star_rows.append(i)
not_galaxy_rows.append(i)
not_agn_rows.append(i)
not_sne_rows.append(i)
elif new_str[5].startswith('galaxySED'):
#galaxy_rows.append(i)
not_star_rows.append(i)
not_agn_rows.append(i)
not_sne_rows.append(i)
elif new_str[5].startswith('agnSED'):
#agn_rows.append(i)
not_star_rows.append(i)
not_galaxy_rows.append(i)
not_sne_rows.append(i)
elif new_str[5].startswith('spectra_files'):
#sne_rows.append(i)
not_star_rows.append(i)
not_galaxy_rows.append(i)
not_agn_rows.append(i)
i += 1
In [4]:
df_star = pd.read_csv(filename, delimiter=' ', header=None,
names = ['prefix', 'uniqueId', 'raPhosim', 'decPhoSim',
'phoSimMagNorm', 'sedFilepath', 'redshift',
'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
'spatialmodel', 'internalExtinctionModel',
'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
skiprows=not_star_rows)
In [5]:
df_star[:3]
Out[5]:
In [6]:
df_galaxy = pd.read_csv(filename, delimiter=' ', header=None,
names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim',
'phoSimMagNorm', 'sedFilepath',
'redshift', 'shear1', 'shear2', 'kappa',
'raOffset', 'decOffset', 'spatialmodel',
'majorAxis', 'minorAxis', 'positionAngle', 'sindex',
'internalExtinctionModel', 'internalAv', 'internalRv',
'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
skiprows=not_galaxy_rows)
In [7]:
df_galaxy[:3]
Out[7]:
In [8]:
df_agn = pd.read_csv(filename, delimiter=' ', header=None,
names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim',
'phoSimMagNorm', 'sedFilepath', 'redshift',
'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
'spatialmodel', 'internalExtinctionModel',
'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
skiprows = not_agn_rows)
In [9]:
df_agn[:3]
Out[9]:
In [10]:
df_sne = pd.read_csv(filename, delimiter=' ', header=None,
names=['prefix', 'uniqueId', 'raPhoSim', 'decPhoSim',
'phoSimMagNorm', 'shorterFileNames', 'redshift',
'shear1', 'shear2', 'kappa', 'raOffset', 'decOffset',
'spatialmodel', 'internalExtinctionModel',
'galacticExtinctionModel', 'galacticAv', 'galacticRv'],
skiprows = not_sne_rows)
In [11]:
df_sne[:3]
Out[11]:
In [12]:
sprinkled_agn = df_agn[df_agn['uniqueId'] > 20000000000]
Below we see a pair of lensed images from a double.
In [13]:
sprinkled_agn[:2]
Out[13]:
Now we will extract the extra information we have stored in the uniqueId
. This information is the Twinkles System number in our custom OM10 catalog in the data
directory in Twinkles and the Twinkles Image Number which identifies which image in that particular system refers to that line in the catalog.
In [14]:
# This step undoes the step in CatSim that gives each component of a galaxy a different offset
twinkles_nums = []
for agn_id in sprinkled_agn['uniqueId']:
twinkles_ids = np.right_shift(agn_id-28, 10)
twinkles_nums.append(twinkles_ids)
In [15]:
#This parses the information added in the last 4 digits of the unshifted ID
twinkles_system_num = []
twinkles_img_num = []
for lens_system in twinkles_nums:
lens_system = str(lens_system)
twinkles_id = lens_system[-4:]
twinkles_id = np.int(twinkles_id)
twinkles_base = np.int(np.floor(twinkles_id/4))
twinkles_img = twinkles_id % 4
twinkles_system_num.append(twinkles_base)
twinkles_img_num.append(twinkles_img)
We once again look at the two images we showed earlier. We see that they are image 0 and image 1 from Twinkles System 24.
In [16]:
print twinkles_system_num[:2], twinkles_img_num[:2]
We now add this information into our sprinkled AGN dataframe and reset the indices.
In [17]:
sprinkled_agn = sprinkled_agn.reset_index(drop=True)
sprinkled_agn['twinkles_system'] = twinkles_system_num
sprinkled_agn['twinkles_img_num'] = twinkles_img_num
In [18]:
sprinkled_agn.iloc[:2, [1, 2, 3, -2, -1]]
Out[18]:
The last step is to now add a column with the lens galaxy uniqueId
for each system so that we can cross-reference between the lensed AGN and the lens galaxy dataframe we will create next. We start by finding the uniqueId
s for the lens galaxies.
In [19]:
#The lens galaxy ids do not have the extra 4 digits at the end so we remove them
#and then do the shift back to the `uniqueID`.
lens_gal_ids = np.left_shift((np.array(twinkles_nums))/10000, 10) + 26
In [20]:
sprinkled_agn['lens_galaxy_uID'] = lens_gal_ids
We now see that the same system has the same lens galaxy uniqueId
as we expect.
In [21]:
sprinkled_agn.iloc[:2, [1, 2, 3, -3, -2, -1]]
Out[21]:
Now we will create a dataframe with the Lens Galaxies.
In [22]:
lens_gal_locs = []
for idx in lens_gal_ids:
lens_gal_locs.append(np.where(df_galaxy['uniqueId'] == idx)[0])
lens_gals = df_galaxy.iloc[np.unique(lens_gal_locs)]
lens_gals = lens_gals.reset_index(drop=True)
We now have the lens galaxies in their own dataframe that can be joined on the lensed AGN dataframe by the uniqueId
.
In [23]:
lens_gals[:1]
Out[23]:
And we can check how many systems there are by checking the length of this dataframe.
In [24]:
len(lens_gals)
Out[24]:
Showing that we 198 systems in the Twinkles field!
In Twinkles 1 catalogs we do not have host galaxies around our lensed AGN, but in the future we will want to be able to include this. We experimented with this at the 2017 DESC SLAC Collaboration Meeting Hack Day since Nan Li, Matt Wiesner and others are working adding lensed hosts into images.
Therefore, I have included the capacity to find the host galaxies here for future use.
To start we once again cut based on the uniqueId
which will be larger than a normal galaxy.
In [25]:
host_gals = df_galaxy[df_galaxy['uniqueId'] > 178465239310]
In [26]:
host_gals = df_galaxy[df_galaxy['uniqueId'] > 170000000000]
In [27]:
host_gals[:2]
Out[27]:
Then like the lensed AGN we add in the info from the longer Ids and the lens galaxy info along with resetting the index.
In [28]:
twinkles_gal_nums = []
for gal_id in host_gals['uniqueId']:
twinkles_ids = np.right_shift(gal_id-26, 10)
twinkles_gal_nums.append(twinkles_ids)
In [29]:
host_twinkles_system_num = []
host_twinkles_img_num = []
for host_gal in twinkles_gal_nums:
host_gal = str(host_gal)
host_twinkles_id = host_gal[-4:]
host_twinkles_id = np.int(host_twinkles_id)
host_twinkles_base = np.int(np.floor(host_twinkles_id/4))
host_twinkles_img = host_twinkles_id % 4
host_twinkles_system_num.append(host_twinkles_base)
host_twinkles_img_num.append(host_twinkles_img)
In [30]:
host_lens_gal_ids = np.left_shift((np.array(twinkles_gal_nums))/10000, 10) + 26
In [31]:
host_gals = host_gals.reset_index(drop=True)
host_gals['twinkles_system'] = host_twinkles_system_num
host_gals['twinkles_img_num'] = host_twinkles_img_num
host_gals['lens_galaxy_uID'] = host_lens_gal_ids
In [32]:
host_gals.iloc[:2, [1, 2, 3, -3, -2, -1]]
Out[32]:
Notice that there are different numbers of sprinkled AGN and host galaxy entries.
In [33]:
len(sprinkled_agn), len(host_gals)
Out[33]:
This is because some host galaxies have both bulge and disk components, but not all do. The example we have been using does have both components and thus we have four entries for the doubly lensed system in the host galaxy dataframe.
In [34]:
host_gals[host_gals['lens_galaxy_uID'] == 21393434].iloc[:, [1, 2, 3, -3, -2, -1]]
Out[34]:
The main point of being able to break up the instance catalogs like this is for validation and future development. Being able to find the sprinkled input for Twinkles images helps us validate what appears in our output catalogs. Storing this input in pandas dataframes makes it easy to find and compare against the output catalogs that are accessed using tools in the DESC Monitor. In addition, this is a useful tool for future development like the creation of lensed images for the AGN host galaxies that we hope to add in the next iteration of Twinkles.