The python likelihood tools are a very powerful set of analysis tools that expand upon the command line tools provided with the Fermi Science Tools package. Not only can you perform all of the same likelihood analysis with the python tools that you can with the standard command line tools but you can directly access all of the model parameters. You can more easily script a standard analysis like light curve generation. There are also a few things built into the python tools that are not available from the command line like the calculation of upper limits.
There are many user contributed packages built upon the python backbone of the Science Tools and we are going to highlight and use a few of them in this tutorial like likeSED, make3FGLxml, and the LATAnalysisScripts.
This sample analysis is based on the PG 1553+113 analysis performed by the LAT team and described in Abdo, A. A. et al. 2010, ApJ, 708, 1310. At certain points we will refer to this article as well as the Cicerone. After you complete this tutorial you should be able to reproduce all of the data analysis performed in this publication including generating a spectrum (individual bins and a butterfly plot) and produce a light curve with the python tools. This tutorial assumes you have the most recent ScienceTools installed. We will also make significant use of python, so you might want to familiarize yourself with python (there's a beginner's guide at http://wiki.python.org/moin/BeginnersGuide. This tutorial also assumes that you've gone through the non-python based unbinned likelihood thread. This tutorial should take approximately 8 hours to complete (depending on your computer's speed) if you do everything but there are some steps you can skip along the way which shave off about 4 hours of that.
Note: This tutorial is generated from a jupyter notebook which you can download and run yourself (the prefereed method). You can also run individual commands listed on this page. If you do that, be aware that some commands must be executed in an ipython/jupyter environment.
For this thread the original data were extracted from the LAT data server with the following selections (these selections are similar to those in the paper):
We've provided direct links to the event files as well as the spacecraft data file if you don't want to take the time to use the download server. For more information on how to download LAT data please see the Extract LAT Data tutorial.
Make a working directory and then download all of the files into that directory.
In [3]:
!mkdir working
In [4]:
import urllib
In [6]:
url_base = "https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/pyLikelihood/"
datafiles = ["L1504241622054B65347F25_PH00.fits",
"L1504241622054B65347F25_PH01.fits",
"L1504241622054B65347F25_SC00.fits",]
In [10]:
for datafile in datafiles:
urllib.urlretrieve(url_base+datafile,"working/"+datafile)
In [11]:
ls working/
Now, you'll first need to make a file list with the names of your input event files:
In [13]:
ls -1 working/*PH*.fits > working/PG1553_events.list
In [15]:
mv working/L1504241622054B65347F25_SC00.fits working/PG1553_SC.fits
In the following analysis we've assumed that you've named your list of data files PG1553_events.list and the spacecraft file PG1553_SC.fits.
In [16]:
ls working/
You could follow the unbinned likelihood tutorial to perform your event selections using gtlike, gtmktime etc. directly from the command line, and then use pylikelihood later. But we're going to go ahead and use python. The gt_apps module provides methods to call these tools from within python. This'll get us used to using python.
So, let's jump into python:
Ok, we want to run gtselect inside but we first need to import the gt_apps module to gain access to it.
In [14]:
import gt_apps as my_apps
Now, you can see what objects are part of the gt_apps module by executing:
In [18]:
help(my_apps)
Which brings up the help documentation for that module (type 'x' to exit). The python object for gtselect is called filter and we first need to set all of it's options. This is very similar to calling gtselect form the command line and inputting all of the variables interactively. It might not seem that convenient to do it this way but it's really nice once you start building up scripts (see Building a Light Curve) and reading back options and such. For example, towards the end of this thread, we'll want to generate a light curve and we'll have to run the likelihood analysis for each datapoint. It'll be much easier to do all of this within python and change the tmin and tmax in an iterative fashion. Note that these python objects are just wrappers for the standalone tools so if you want any information on their options, see the corresponding documentation for the standalone tool.
In [19]:
my_apps.filter['evclass'] = 128
my_apps.filter['evtype'] = 3
my_apps.filter['ra'] = 238.929
my_apps.filter['dec'] = 11.1901
my_apps.filter['rad'] = 10
my_apps.filter['emin'] = 100
my_apps.filter['emax'] = 300000
my_apps.filter['zmax'] = 90
my_apps.filter['tmin'] = 239557417
my_apps.filter['tmax'] = 256970880
my_apps.filter['infile'] = '@working/PG1553_events.list'
my_apps.filter['outfile'] = 'working/PG1553_filtered.fits'
Once this is done, run gtselect:
In [20]:
my_apps.filter.run()
Note that you can see exactly what gtselect will do if you run it by typing:
In [21]:
my_apps.filter.command()
Out[21]:
You have access to any of the inputs by directly accessing the filter['OPTIONS']
options.
Next, you need to run gtmktime. This is accessed within python via the maketime object:
In [25]:
my_apps.maketime['scfile'] = 'working/PG1553_SC.fits'
my_apps.maketime['filter'] = '(DATA_QUAL>0)&&(LAT_CONFIG==1)'
my_apps.maketime['roicut'] = 'no'
my_apps.maketime['evfile'] = 'working/PG1553_filtered.fits'
my_apps.maketime['outfile'] = 'working/PG1553_filtered_gti.fits'
my_apps.maketime.run()
We're using the most conservative and most commonly used cuts described in detail in the Cicerone.
At this point, you could make a counts map of the events we just selected using gtbin (it's called evtbin within python) and I won't discourage you but we're going to go ahead and create a livetime cube and exposure map. This might take a few minutes to complete so if you want to create a counts map and have a look at it, get these processes going and open another terminal to work on your counts map (see the likelihood tutorial for an example of running gtbin to produce a counts map).
This step will take approximately 15 - 30 minutes to complete so if you want to just download the PG1553_ltCube from us you can skip this step.
In [26]:
my_apps.expCube['evfile'] = 'working/PG1553_filtered_gti.fits'
my_apps.expCube['scfile'] = 'working/PG1553_SC.fits'
my_apps.expCube['outfile'] = 'working/PG1553_ltCube.fits'
my_apps.expCube['zmax'] = 90
my_apps.expCube['dcostheta'] = 0.025
my_apps.expCube['binsz'] = 1
my_apps.expCube.run()
While you're waiting, you might have noticed that not all of the command line science tools have an equivalent object in gt_apps. This is easy to fix. Say you want to use gtltcubesun from within python. Just make it a GtApp:
In [27]:
from GtApp import GtApp
expCubeSun = GtApp('gtltcubesun','Likelihood')
expCubeSun.command()
Out[27]:
In [28]:
my_apps.expMap['evfile'] = 'working/PG1553_filtered_gti.fits'
my_apps.expMap['scfile'] = 'working/PG1553_SC.fits'
my_apps.expMap['expcube'] = 'working/PG1553_ltCube.fits'
my_apps.expMap['outfile'] = 'working/PG1553_expMap.fits'
my_apps.expMap['irfs'] = 'CALDB'
my_apps.expMap['srcrad'] = 20
my_apps.expMap['nlong'] = 120
my_apps.expMap['nlat'] = 120
my_apps.expMap['nenergies'] = 37
my_apps.expMap.run()
We need to create an XML file with all of the sources of interest within the Region of Interest (ROI) of PG 1553+113 so we can correctly model the background. For more information on the format of the model file and how to create one, see the likelihood analysis tutorial. We'll use the user contributed tool make3FGLxml.py
to create a model file based on the LAT 4-year LAT catalog. You'll need to download the FITS version of this file at http://fermi.gsfc.nasa.gov/ssc/data/access/lat/4yr_catalog/ and get the make3FGLxml.py tool from the user contributed software page and put them both in your working directory. Also make sure you have the most recent galactic diffuse and isotropic model files which can be found at http://fermi.gsfc.nasa.gov/ssc/data/access/lat/BackgroundModels.html. In the following we assume that you have the galactic diffuse and isotropic files in your Science Tools install and we just make local symbolic links.
In [29]:
urllib.urlretrieve('http://fermi.gsfc.nasa.gov/ssc/data/analysis/user/make3FGLxml.py','make3FGLxml.py')
Out[29]:
In [31]:
urllib.urlretrieve('https://fermi.gsfc.nasa.gov/ssc/data/access/lat/4yr_catalog/gll_psc_v16.fit',
'working/gll_psc_v16.fit')
Out[31]:
In [46]:
!ln -s $FERMI_DIR/refdata/fermi/galdiffuse/iso_P8R2_SOURCE_V6_v06.txt working/iso_P8R2_SOURCE_V6_v06.txt
In [49]:
!ln -s $FERMI_DIR/refdata/fermi/galdiffuse/gll_iem_v06.fits working/gll_iem_v06.fits
In [53]:
ls working
Now that we have all of the files we need, you can generate your model file:
In [1]:
from make3FGLxml import *
In [17]:
mymodel = srcList('working/gll_psc_v16.fit','working/PG1553_filtered_gti.fits','working/PG1553_model.xml')
In [18]:
mymodel.makeModel('working/gll_iem_v06.fits',
'working/gll_iem_v06',
'working/iso_P8R2_SOURCE_V6_v06.txt',
'working/iso_P8R2_SOURCE_V6_v06')
For more information on the make3FGLxml.py module, see the [usage notes(/ssc/data/analysis/user/readme_make3FGLxml.txt)
You should now have a file called 'PG1553_model.xml'. Open it up with your favorite text editor and take a look at it. It should look like PG1553_model.xml. There should be seven sources within 4 degrees of the ROI center, nine sources between 4 and 8 degrees, and eight sources between 8 and 12 degrees (4 of which are beyond 10 degrees and frozen). In all, there are 38 sources beyond 10 degrees. The script designates these as outside of the ROI (which is 10 degrees) and instructs us to leave all of the variables for these sources fixed. We'll agree with this (these sources are outside of our ROI but could still affect our fit since photons from these sources could fall within our ROI due to the LAT PSF). At the bottom of the file, the two diffuse sources are listed (the galactic diffuse and extragalactic isotropic).
Notice that we've deviated a bit from the published paper here. In the paper, the LAT team only included two sources; one from the 0FGL catalog and another, non-catalog source. This is because the later LAT catalogs had not been released at the time. However, these 3FGL sources are still in the data we've downloaded and should be modeled.
Back to looking at our XML model file, notice that all of the sources have been set up with various spectral models (see the Cicerone for details on the different spectral models) and the module we ran filled in the values for the spectrum from the 3FGL catalog. Also notice that PG 1553+113 is listed in the model file as 3FGL J1555.7+1111 with all of the parameters filled in for us. It's actually offset from the center of our ROI by 0.008 degrees. How nice! The only problem with this is that trying to use the 3FGL model for for multiple years of data (Log Parabola) will cause us some issues for such a short time duration as we are analyzing here. Therefore we want to change the model for 3FGL J155.7+1111 to a simple power-law for the purposes of this analysis thread. You'll have to modify the relevant python scripts on your own to match whatever source model may be relevant for your data.
In the xml file, change the entry for 3FGL J1555.7+1111 from
<source ROI_Center_Distance="0.008" name="3FGL J1555.7+1111" type="PointSource">
<spectrum apply_edisp="false" type="LogParabola">
<!-- Source is 0.00802140915217 degrees away from ROI center -->
<parameter free="1" max="1e4" min="1e-4" name="norm" scale="1e-12" value="4.82531809648"/>
<parameter free="1" max="5.0" min="0.0" name="alpha" scale="1.0" value="1.60433"/>
<parameter free="1" max="10.0" min="0.0" name="beta" scale="1.0" value="0.0388407"/>
<parameter free="0" max="5e5" min="30" name="Eb" scale="1.0" value="1491.38"/>
</spectrum>
<spatialModel type="SkyDirFunction">
<parameter free="0" max="360.0" min="-360.0" name="RA" scale="1.0" value="238.936"/>
<parameter free="0" max="90" min="-90" name="DEC" scale="1.0" value="11.1939"/>
</spatialModel>
to
<source ROI_Center_Distance="0.008" name="3FGL J1555.7+1111" type="PointSource">
<spectrum apply_edisp="false" type="PowerLaw">
<!-- Source is 0.00802140915217 degrees away from ROI center -->
<parameter free="1" max="1e4" min="1e-4" name="Prefactor" scale="1e-11" value="1.0"/>
<parameter free="1" max="10.0" min="0.0" name="Index" scale="-1.0" value="2.5"/>
<parameter free="0" max="5e5" min="30" name="Scale" scale="1.0" value="500"/>
</spectrum>
<spatialModel type="SkyDirFunction">
<parameter free="0" max="360.0" min="-360.0" name="RA" scale="1.0" value="238.936"/>
<parameter free="0" max="90" min="-90" name="DEC" scale="1.0" value="11.1939"/>
</spatialModel>
</source>
The diffuse source responses tell the likelihood fitter what the expected contribution would be for each diffuse source, given the livetime associated with each event. The source model XML file must contain all of the diffuse sources to be fit. The gtdiffrsp tool will add one column to the event data file for each diffuse source. The diffuse response depends on the instrument response function (IRF), which must be in agreement with the selection of events, i.e. the event class and event type we are using in our analysis. Since we are using SOURCE class, CALDB should use the P8R2_SOURCE_V6 IRF for this tool.
If the diffuse responses are not precomputed using gtdiffrsp, then the gtlike tool will compute them at runtime (during the next step). However, as this step is very computationally intensive (often taking ~hours to complete), and it is very likely you will need to run gtlike more than once, it is probably wise to precompute these quantities.
In [19]:
my_apps.diffResps['evfile'] = 'working/PG1553_filtered_gti.fits'
my_apps.diffResps['scfile'] = 'working/PG1553_SC.fits'
my_apps.diffResps['srcmdl'] = 'working/PG1553_model.xml'
my_apps.diffResps['irfs'] = 'CALDB'
my_apps.diffResps.run()
It's time to actually run the likelihood analysis now. First, you need to import the pyLikelihood module and then the UnbinnedAnalysis functions (there's also a binned analysis module that you can import to do binned likelihood analysis which behaves almost exactly the same as the unbinned analysis module). For more details on the pyLikelihood module, check out the pyLikelihood Usage Notes.
In [20]:
import pyLikelihood
from UnbinnedAnalysis import *
obs = UnbinnedObs('working/PG1553_filtered_gti.fits',
'working/PG1553_SC.fits',
expMap='working/PG1553_expMap.fits',
expCube='working/PG1553_ltCube.fits',
irfs='CALDB')
like = UnbinnedAnalysis(obs,'working/PG1553_model.xml',optimizer='NewMinuit')
By now, you'll have two objects, 'obs', an UnbinnedObs object and like, an UnbinnedAnalysis object. You can view these objects attributes and set them from the command line in various ways. For example:
In [21]:
print obs
In [22]:
print like
or you can get directly at the objects attributes and methods by:
In [23]:
dir(like)
Out[23]:
or get even more details by executing:
In [24]:
help(like)
There are a lot of attributes and here you start to see the power of using pyLikelihood since you'll be able (once the fit is done) to access any of these attributes directly within python and use them in your own scripts. For example, you can see that the like object has a 'tol' attribute which we can read back to see what it is and then set it to what we want it to be.
In [25]:
like.tol
Out[25]:
In [26]:
like.tolType
Out[26]:
The tolType can be '0' for relative or '1' for absolute.
In [27]:
like.tol = 0.0001
Now, we're ready to do the actual fit. This next step will take approximately 10 minutes to complete. We're doing something a bit fancy here. We're getting the minimizating object (and calling it likeobj
) from the logLike object so that we can access it later. We pass this object to the fit routine so that it knows which fitting object to use. We're also telling the code to calculate the
covariance matrix so we can get at the errors.
In [28]:
likeobj = pyLike.NewMinuit(like.logLike)
like.fit(verbosity=0,covar=True,optObject=likeobj)
Out[28]:
The number that is printed out here is the -log(Likelihood) of the total fit to the data. You can print the results of the fit by accessing like.model. You can also access the fit for a particular source by doing the following (the source name must match that in the XML model file).
Note there is a bug in the XML file that puts a trailing space in the source name.
In [52]:
like.model['3FGL J1555.7+1111 ']
Out[52]:
You can plot the results of the fit by executing the plot command. The results are shown below:
In [59]:
like.setPlotter(plotter='python')
In [60]:
%matplotlib inline
In [ ]: