This is a notebook to explore opSim outputs in different ways, mostly useful to supernova analysis. We will look at the opsim output called (minion_1016
)[https://confluence.lsstcorp.org/display/SIM/Summary+Table+Column+Descriptions].
Other relevant OpSim outputs are listed on a confluence page. All of the newer outputs are in the format of sqlite databases (zipped up to save space), while older OpSim versions are in ASCII files. The database table which I think has all of the information we will need is the Summary
Table. The quantities in the columns of this table are described (here)[https://confluence.lsstcorp.org/display/SIM/Summary+Table+Column+Descriptions]. The OpSim outputs are official products that the LSST project provides. This notebook will demonstrate a way of exploring these outputs (which is by no means official or project supplied).
Note:
The Summary
Table is only about a GB in size, and has ~ million rows. For newish laptops, this is a small table and can be easily read into memory. For more constrained memory systems or computations that require a lot of memory, this may be a bad thing to do.
Gotcha
: The column obsHistID
is unique identifier of OpSim observations. Very often, you might end up with a table with multiple rows with the same obsHistID, but with other columns (like propID) having different values. These are not different observations, and a SN light curve corresponding to these observations should include only one of these.
The LSST observations are over roughly half of the sky (~20000 sq degrees, or ~2\pi
). This is covered by
overlapping pointings of the telescope each covering (~10 sq degrees). These pointings are currently thought of as being located on a grid along with dithers. The grid locations of the pointings are assigned a unique integer called fieldID
which is associated with its location :(fieldRa
, fieldDec
). The actual location of the pointings (including dithers) are in (ditheredRA
, ditheredDec
). There are a number of columns that hold quantities of similar information, and the exact definitions of each quantity is provided in the description. For most purposes filtSkyBrightness
and FWHMeff
rather than the other similar looking quantities are probably recommended. PropID
(also described there) refers to proposals under which the observation was taken. For the basic SN purposes, we will want the proposals (Wide Fast Deep (WFD)) and (Deep Drilling Fields (DDF)). The propID for these quantities differs from one opsim output to another. To find out what these are, you can use OpSim utils as in this notebook or look at the PROPOSAL
table (5 lines) like this : and pick the integers corresponding to DDcosmology1.conf and Universal ...
SELECT * FROM PROPOSAL;
362|../conf/survey/GalacticPlaneProp.conf|WL|27692624|enigma|1189
363|../conf/survey/SouthCelestialPole-18.conf|WL|27692368|enigma|1189
364|../conf/survey/Universal-18-0824B.conf|WLTSS|27692112|enigma|1189
365|../conf/survey/NorthEclipticSpur-18c.conf|WLTSS|27692240|enigma|1189
366|../conf/survey/DDcosmology1.conf|WLTSS|29065424|enigma|1189
In [1]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
In [2]:
# Required packages sqlachemy, pandas (both are part of anaconda distribution, or can be installed with a python installer)
# One step requires the LSST stack, can be skipped for a particular OPSIM database in question
import opsimsummary as oss
import opsimsummary.summarize_opsim as so
from sqlalchemy import create_engine
import pandas as pd
print so.__file__
In [3]:
# This step requires LSST SIMS package MAF. The main goal of this step is to set DD and WFD to integer keys that
# label an observation as Deep Drilling or for Wide Fast Deep.
# If you want to skip this step, you can use the next cell by uncommenting it, and commenting out this cell, if all you
# care about is the database used in this example. But there is no guarantee that the numbers in the cell below will work
# on other versions of opsim database outputs
from lsst.sims.maf import db
from lsst.sims.maf.utils import opsimUtils
In [4]:
# DD = 56
# WFD = 54
Here we will use the opsim output minion_1016 I have downloaded this database, unzipped and use the variable dbname to point to its location
In [5]:
# Change dbname to point at your own location of the opsim output
dbname = '/Users/rbiswas/data/LSST/OpSimData/minion_1016_sqlite.db'
opsdb = db.OpsimDatabase(dbname)
propID, propTags = opsdb.fetchPropInfo()
DD = propTags['DD'][0]
WFD = propTags['WFD'][0]
In [6]:
print("The propID for the Deep Drilling Field {0:2d}".format(DD))
print("The propID for the Wide Fast Deep Field {0:2d}".format(WFD))
Here we will read the opsim database into a pandas.DataFrame
In [7]:
engine = create_engine('sqlite:///' + dbname)
The opsim database is a large file (approx 4.0 GB), but still possible to read into memory on new computers. You usually only need the Summary Table, which is about 900 MB. If you are only interested in the Deep Drilling Fields, you can use the read_sql_query to only select information pertaining to Deep Drilling Observations. This has a memory footprint of about 40 MB. Obviously, you can reduce this further by narrowing down the columns to those of interest only. For the entire Summary Table, this step takes a few minutes on my computer.
If you are going to do the read from disk step very often, you can further reduce the time used by storing the output on disk as a hdf5 file and reading that into memory
We will look at three different Summaries of OpSim Runs. A summary of the
In [8]:
# Load to a dataframe
# Summary = pd.read_hdf('storage.h5', 'table') # This loads the table from a hdf file which I store as intermediate result, this is extremely quick
# Summary = pd.read_sql_table('Summary', engine, index_col='obsHistID'), #'loads all of the summary table'
# EnigmaDeep = pd.read_sql_query('SELECT * FROM SUMMARY WHERE PROPID is 366', engine) # loads only the DDF
If we knew ahead of time the proposal ID, then we could have done this quicker using
In [9]:
OpSim_combined = pd.read_sql_query('SELECT * FROM SUMMARY WHERE PROPID in ({0}, {1})'.format(DD, WFD), engine, index_col='obsHistID')
In [10]:
OpSim_combined.head()
Out[10]:
This could have duplicates unlike in the case of the OpSim Deep. This is because there are now two proposal IDs both of which may correspond to the same observation. We can check that this is indeed the case by:
In [11]:
len(OpSim_combined) == len(OpSim_combined.index.unique())
Out[11]:
Dropping the duplicate pointings can be done in the following way. The reset_index() makes 'obsHistID' an ordinary column rather than an index, drop_duplicates
drops duplicate rows where
In [12]:
OpSim_combined.reset_index().drop_duplicates(subset='obsHistID', inplace=True)
In [13]:
OpSim_combined.head()
Out[13]:
In [14]:
len(OpSim_combined) == len(OpSim_combined.index.unique())
Out[14]:
In [15]:
OpSim_Deep = pd.read_sql_query('SELECT * FROM SUMMARY WHERE PROPID is ' + str(DD), engine, index_col='obsHistID')
We can also sub-select this from the all-encompassing Summay Table. This can be done in two way:
In [16]:
OpSim_combined.propID.unique()
Out[16]:
In [17]:
OpSim_Deep.propID.unique()
Out[17]:
In [18]:
# Get fieldID closest to ra, dec
fieldIDFromRADec = oss.fieldID(OpSim_combined, np.radians(190.), np.radians(-83.0))
print fieldIDFromRADec
In [19]:
OpSimDeepSummary = so.SummaryOpsim(OpSim_Deep)
OpSimCombinedSummary = so.SummaryOpsim(OpSim_combined)
In [20]:
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(111, projection='mollweide');
fig = OpSimDeepSummary.showFields(ax=fig.axes[0], marker='o', s=40)
In [21]:
DDF_fieldIDs = OpSim_Deep.fieldID.unique()
DDF_fieldIDs
Out[21]:
In [22]:
grouped = OpSim_Deep[['night', 'fieldID']].groupby(['night'])
In [23]:
fig, ax = plt.subplots()
grouped.agg({'fieldID': lambda x: x.unique().size}).hist(ax=ax)
Out[23]:
In [24]:
fig.savefig('DeepOnly_uniqueDDFFields.png')
What about if we count WFD visits to these fields ?
In [25]:
OpSimCombined_DDF_Fields = OpSim_combined.query('fieldID in @DDF_fieldIDs')
In [26]:
len(OpSimCombined_DDF_Fields)
Out[26]:
In [27]:
OpSimCombined_DDF_Fields.fieldID.unique()
Out[27]:
In [28]:
combinedGrouped = OpSimCombined_DDF_Fields.groupby('night')
In [29]:
fig, ax = plt.subplots()
combinedGrouped.agg({'fieldID': lambda x: x.unique().size}).hist(ax=ax)
Out[29]:
In [30]:
fig.savefig('Combined_DDFvisits.png')
In [31]:
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(111, projection='mollweide');
opsimFieldSummary = so.SummaryOpsim(OpSim_combined.query('fieldID==347'))
fig = opsimFieldSummary.showFields(ax=fig.axes[0], marker='o', s=40)
In [32]:
OpSimCombinedSummary.showFields(ax=ax, marker='o', color='r', s=8)
Out[32]:
In [33]:
fieldIDFromRADec = oss.fieldID(OpSim_Deep, np.radians(53.), np.radians(-28.))
print fieldIDFromRADec
In [34]:
fieldIDFromRADec = oss.fieldID(OpSim_Deep, np.radians(0.), np.radians(-45.))
print fieldIDFromRADec
In [35]:
fieldIDFromRADec = oss.fieldID(OpSim_combined, np.radians(53.), np.radians(-28.))
print fieldIDFromRADec
In [36]:
fieldIDFromRADec = oss.fieldID(OpSim_combined, np.radians(85.7), np.radians(-14.4))
print fieldIDFromRADec
In [57]:
fieldList = OpSimCombinedSummary.fieldIds
In [38]:
OpSim_combined.columns
Out[38]:
In [39]:
DDF_fieldIDs
Out[39]:
In [40]:
selectedFields = OpSim_combined.query('fieldID in @DDF_fieldIDs')[['expMJD', 'propID', 'filter', 'fieldRA', 'fieldDec', 'fieldID', 'fiveSigmaDepth']]#.head()
In [41]:
selectedFields.head()
Out[41]:
In [42]:
selectedFields.to_hdf('DDF_Fields_Info.hdf', 'summaryInfo')
In [43]:
# CHECK
read_table = pd.read_hdf('DDF_Fields_Info.hdf', 'summaryInfo')
In [44]:
read_table.head()
Out[44]:
In [45]:
from pandas.util.testing import assert_frame_equal
In [46]:
assert_frame_equal(read_table, selectedFields)
We can visualize the cadence during the first season using the cadence plot for a particular field: The following plot shows how many visits we have in different filters on a particular night:
In [47]:
firstSeasonDeep = OpSimDeepSummary.cadence_plot(fieldID=1427, observedOnly=False, sql_query='night < 366')
In [48]:
firstSeasonCombined = OpSimCombinedSummary.cadence_plot(fieldID=1427, observedOnly=False, sql_query='night < 366')
In [80]:
firstSeasonCombined[0].savefig('minion_1427.pdf')
In [53]:
firstSeason_main[0].savefig('minion_1430.pdf')
In [49]:
firstSeason = OpSimDeepSummary.cadence_plot(fieldID=744, observedOnly=False, sql_query='night < 732',
nightMin=0, nightMax=732)
In [58]:
tenCadence = OpSimCombinedSummary.cadence_plot(fieldID=fieldList[2000], observedOnly=False, sql_query='night < 3500', nightMax=3500)
Suppose we have a supernova with a peak around a particular MJD of 49540, and we want to see what the observations happened around it:
In [51]:
SN = OpSimDeepSummary.cadence_plot(summarydf=OpSimDeepSummary.df, fieldID=1427, #racol='fieldRA', deccol='fieldDec',
observedOnly=False, mjd_center=59640., mjd_range=[-30., 50.])
# ax = plt.gca()
# ax.axvline(49540, color='r', lw=2.)
# ax.xaxis.get_major_formatter().set_useOffset(False)
In [52]:
SN[0].savefig('SN_observaton.pdf')
In [ ]:
SN_matrix.sum(axis=1).sum()
In [ ]:
EnigmaDeep.query('fieldID == 744 and expMJD < 49590 and expMJD > 49510').expMJD.size
In [41]:
SN_matrix[SN_matrix > 0.5] = 1
In [52]:
SN_matrix.sum().sum()
Out[52]:
In [55]:
len(SN_matrix.sum(axis=1).dropna())
Out[55]:
In [56]:
nightlySN_matrix = SN_matrix.copy(deep=True)
In [57]:
nightlySN_matrix[SN_matrix > 0.5] =1
In [64]:
nightlySN_matrix.sum(axis=1).dropna().sum()
Out[64]:
In [69]:
nightlySN_matrix.sum(axis=1).dropna().size
Out[69]:
In [ ]:
nightlySN_matrix.sum(ax)