Likelihood Analysis with fermipy

The python likelihood tools are a very powerful set of analysis tools that expand upon the command line tools provided with the Fermi Science Tools package. Not only can you perform all of the same likelihood analysis with the python tools that you can with the standard command line tools but you can directly access all of the model parameters. You can more easily script a standard analysis. There are also a few things built into the python tools that are not available from the command line like the calculation of upper limits.

There are many user contributed packages built upon the python backbone of the Science Tools and this thread will highlight the use of the fermipy package.

This sample analysis is based on the PG 1553+113 analysis performed by the LAT team and described in Abdo, A. A. et al. 2010, ApJ, 708, 1310 and closely follows the Likelihood Analysis with Python thread. This tutorial assumes you have the most recent ScienceTools installed and fermipy installed on top of it. For instructions on installing fermipy and the Fermi ScienceTools you should consult the fermipy Installation Instructions. We will also make significant use of python, so you might want to familiarize yourself with python including matplotlib and other libraries (there's a beginner's guide at http://wiki.python.org/moin/BeginnersGuide). This tutorial also assumes that you've gone through the non-python based Unbinned Likelihood Tutorial.

Get the Data

For this thread the original data were extracted from the LAT data server with the following selections (these selections are similar to those in the paper):

  • Search Center (RA,Dec) = (238.929,11.1901)
  • Radius = 30 degrees
  • Start Time (MET) = 239557417 seconds (2008-08-04T15:43:37)
  • Stop Time (MET) = 256970880 seconds (2009-02-22T04:48:00)
  • Minimum Energy = 100 MeV
  • Maximum Energy = 300000 MeV

Once you exectute the query you can download the data and put it in your working directory. You'll need to change the names of the files to match the filenames that the data server gave you. Alternatively you can run with the following tarball which already contains the downloaded files as well as all of the ancillary files that will be generated for the analysis. In this example the output of the analysis will go into a subdirectory called pg1553.


In [1]:
import os
if os.path.isfile('../data/pg1553.tar.gz'):
    !tar xzf ../data/pg1553.tar.gz
else:
    !curl -OL https://raw.githubusercontent.com/fermiPy/fermipy-extras/master/data/pg1553.tar.gz
    !tar xzf pg1553.tar.gz

If you would like to download the raw data files used in this analysis (FT1 and FT2) you can do so by executing the following cell.


In [2]:
if os.path.isdir('../data'):
    os.environ['DATADIR'] = '../data'
else:
    !mkdir pg1553/data
    !mkdir pg1553/data/ft1
    !mkdir pg1553/data/ft2
    os.environ['DATADIR'] = 'pg1553/data'
    !curl -o pg1553/data/ft1/L1504241622054B65347F25_PH00.fits -OL https://raw.githubusercontent.com/fermiPy/fermipy-extras/master/data/ft1/L1504241622054B65347F25_PH00.fits
    !curl -o pg1553/data/ft1/L1504241622054B65347F25_PH01.fits -OL https://raw.githubusercontent.com/fermiPy/fermipy-extras/master/data/ft1/L1504241622054B65347F25_PH01.fits

    
#!curl -o pg1553/L1504241622054B65347F25_PH00.fits http://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/pyLikelihood/L1504241622054B65347F25_PH00.fits
#!curl -o pg1553/L1504241622054B65347F25_PH01.fits http://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/pyLikelihood/L1504241622054B65347F25_PH01.fits
#!curl -o pg1553/L1504241622054B65347F25_SC00.fits http://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/pyLikelihood/L1504241622054B65347F25_SC00.fits

Make a file list

You'll then need to make a file list with the names of your input event files. You can either just make one with a text editor or do the following from the command line.


In [3]:
!ls -1 $DATADIR/ft1/*PH*.fits > pg1553/ft1.txt

Make a config file

fermipy bases its analysis on a configuration file (in yaml format). We're just going to use a really simple config file for a standard analysis. There are many many more options which you can use or you can modify these options after the fact within the analysis chain.

Make a config file named 'config.yaml' like the following. For more details on the config file see config.html. You will probably need to customize this a bit since your files might not be in the same place or named the same. The galactic and isotropic diffuse will need to be located on your system (they are included in the science tools or can be downloaded from the FSSC). In the following example we set the path to these files with the environment variable FERMI_DIFFUSE_DIR. If FERMI_DIFFUSE_DIR is not defined fermipy will look for the location of these files within the FSSC STs distribution.

data:
  evfile : PG1553.lst
  scfile : L1511031432171D651E7F63_SC00.fits

binning:
  roiwidth   : 10.0
  binsz      : 0.1
  binsperdec : 8

selection :
  emin : 100
  emax : 300000
  zmax    : 90
  evclass : 128
  evtype  : 3
  target : '3FGL J1555.7+1111'

gtlike:
  edisp : True
  irfs : 'P8R2_SOURCE_V6'
  edisp_disable : ['isodiff','galdiff']

model:
  src_roiwidth : 15.0
  galdiff  : '$FERMI_DIFFUSE_DIR/gll_iem_v06.fits'
  isodiff  : '$FERMI_DIFFUSE_DIR/iso_P8R2_SOURCE_V6_v06.txt'
  catalogs : ['3FGL']

Start the analysis

Next, you create an analysis script and run the setup steps which include running the selections and generating exposure maps etc. This will take a bit.

This is where the magic happens. fermipy will load the point source model, create your xml file for you, decide on all the appropriate cuts and binnings and just go. All of this is configurable from python or from the config file. And, if you need to rerun things, it's smart enough to not overwrite files if it doesn't need to.

Load up some useful modules


In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

Import the GTAnalysis module from fermipy

You start by importing the module and then creating an instance of the analysis object from our config file. When instantiating the analysis object we can override any options defined in the configuration file by passing keyword arguments to the object constructor. Here we explicitly set the verbosity parameter to 3 (INFO) which supresses DEBUG output. When we create the object, it spits out a bunch of information about all of the parameters that were used. You can see there are many more options than the ones we chose.


In [5]:
from fermipy.gtanalysis import GTAnalysis
gta = GTAnalysis('pg1553/config.yaml',logging={'verbosity': 3})
matplotlib.interactive(True)


2017-04-02 18:56:53 INFO    GTAnalysis.__init__(): 
--------------------------------------------------------------------------------
fermipy version 0.14.0+3.g8cc8.dirty 
ScienceTools version ScienceTools-11-05-02
WARNING: UnitsWarning: 'photon/cm**2/MeV/s' contains multiple slashes, which is discouraged by the FITS standard [astropy.units.format.generic]
WARNING: UnitsWarning: 'photon/cm**2/s' contains multiple slashes, which is discouraged by the FITS standard [astropy.units.format.generic]
WARNING: UnitsWarning: The unit 'erg' has been deprecated in the FITS standard. Suggested: cm2 g s-2. [astropy.units.format.utils]
WARNING: UnitsWarning: 'erg/cm**2/s' contains multiple slashes, which is discouraged by the FITS standard [astropy.units.format.generic]

The setup routine

This gets everything ready for the likelihood analysis including instantiating the pylikelihood object. Note that fermipy will skip generating any ancillary files that already exist in the working directory. In the sample tarball these files have already been produced in order to speed up this stage of the analysis. If you want to see the behavior of fermipy when running from an empty working directory you can delete one or more of these files before running setup.


In [6]:
gta.setup()


2017-04-02 18:56:57 INFO    GTAnalysis.setup(): Running setup.
2017-04-02 18:56:57 INFO    GTBinnedAnalysis.setup(): Running setup for component 00
2017-04-02 18:56:57 INFO    GTBinnedAnalysis._select_data(): Skipping data selection.
2017-04-02 18:56:57 INFO    GTBinnedAnalysis._create_ltcube(): Skipping LT Cube.
2017-04-02 18:56:58 INFO    GTBinnedAnalysis._create_expcube(): Skipping gtexpcube.
WARNING: FITSFixedWarning: RADECSYS= 'FK5 ' 
the RADECSYS keyword is deprecated, use RADESYSa. [astropy.wcs.wcs]
2017-04-02 18:56:58 INFO    GTBinnedAnalysis._create_srcmaps(): Skipping gtsrcmaps.
2017-04-02 18:56:58 INFO    GTBinnedAnalysis.setup(): Finished setup for component 00
2017-04-02 18:56:58 INFO    GTBinnedAnalysis._create_binned_analysis(): Creating BinnedAnalysis for component 00.
2017-04-02 18:57:17 INFO    GTAnalysis.setup(): Initializing source properties
2017-04-02 18:57:27 INFO    GTAnalysis.setup(): Finished setup.

Before proceeding with the analysis we'll have a quick look at the files that are produced by the setup function.


In [7]:
ls pg1553/*fits


pg1553/3fgl_j1555.7+1111_sed.fits  pg1553/ft1_00.fits
pg1553/bexpmap_00.fits             pg1553/L1504241622054B65347F25_PH00.fits
pg1553/bexpmap_roi_00.fits         pg1553/L1504241622054B65347F25_PH01.fits
pg1553/ccube_00.fits               pg1553/L1504241622054B65347F25_SC00.fits
pg1553/ccube.fits                  pg1553/ltcube_00.fits
pg1553/fit0.fits                   pg1553/srcmap_00.fits
pg1553/fit1.fits

Here is a brief explanation of the contents of each file and its role in the analysis:

  • ft1_00.fits: Event list. This is generated by running gtselect and gtmktime on our input file list.
  • bexpmap_00.fits: All-sky binned exposure map. This map is interpolated to create an exposure model when generating the srcmap file.
  • bexpmap_roi_00.fits: Binned exposure map for the ROI. This file is only provided for visualization purposes in order to have an exposure map with the same binning as the data and model maps.
  • ccube_00.fits: Counts cube for the ROI.
  • ltcube_00.fits: Livetime cube. This contains a map of the livetime for this observation over the whole sky as a function of incidence angle.
  • srcmap_00.fits: Source map cube. This file contains maps for each of the components in the ROI after convolution with exposure and the PSF. Note that energy dispersion is applied at run-time.

Note that all of the files have a numerical suffix '00'. This is the analysis component index. In a multi-component analysis there would be instances of all of the above files for each analysis component. The files with no component index are co-added maps that are provided for visualization purposes.

To see example of one of these files we can open and plot the counts cube file. This is a 3D cube that contains the distribution of events as a function of energy and two spatial coordinates. In the example below we sum over the energy dimension of the cube to make a 2-D sky image.


In [8]:
import astropy.io.fits as pyfits

h = pyfits.open('pg1553/ccube.fits')
h.info()
counts = h[0].data
counts.shape
plt.figure()
plt.imshow(np.sum(counts,axis=0),interpolation='nearest',origin='lower')


Filename: pg1553/ccube.fits
No.    Name         Type      Cards   Dimensions   Format
  0  PRIMARY     PrimaryHDU      25   (100, 100, 28)   float64   
Out[8]:
<matplotlib.image.AxesImage at 0x7f6c5c52d550>

We can now inspect the state of the ROI prior with the print_roi() method.


In [9]:
gta.print_roi()


2017-04-02 18:57:27 INFO    GTAnalysis.print_roi(): 
name                SpatialModel   SpectrumType     offset        ts       npred
--------------------------------------------------------------------------------
3FGL J1555.7+1111   PointSource    LogParabola       0.000       nan       923.3
3FGL J1553.5+1256   PointSource    LogParabola       1.833       nan       655.6
3FGL J1603.7+1106   PointSource    PowerLaw          1.976       nan       122.7
3FGL J1552.1+0852   PointSource    PowerLaw          2.482       nan        90.2
3FGL J1608.6+1029   PointSource    PowerLaw          3.252       nan       531.6
3FGL J1541.8+1105   PointSource    PowerLaw          3.411       nan       155.5
3FGL J1546.0+0818   PointSource    PowerLaw          3.753       nan        38.6
3FGL J1548.4+1455   PointSource    PowerLaw          4.134       nan       187.8
3FGL J1541.6+1414   PointSource    PowerLaw          4.596       nan        34.4
3FGL J1611.9+1404   PointSource    PowerLaw          4.888       nan        81.3
3FGL J1540.8+1449   PointSource    PowerLaw          5.146       nan        73.9
3FGL J1607.0+1551   PointSource    PowerLaw          5.417       nan       307.3
isodiff             ConstantValue  FileFunction      -----       nan      4965.9
galdiff             MapCubeFunctio PowerLaw          -----       nan      8297.6

Additional details about an individual source can be retrieved by printing the corresponding source object. Here we use the bracket operator to return the properties of PG1553.


In [10]:
print(gta.roi['3FGL J1555.7+1111'])


Name           : 3FGL J1555.7+1111
Associations   : ['3FGL J1555.7+1111', 'PG 1553+113', '1FHL J1555.7+1111', '2FGL J1555.7+1111']
RA/DEC         :    238.936/    11.194
GLON/GLAT      :     21.918/    43.960
TS             : nan
Npred          : 923.27
Flux           : 5.324e-08 +/-      nan
EnergyFlux     : 0.0001344 +/-      nan
SpatialModel   : PointSource
SpectrumType   : LogParabola
Spectral Parameters
norm           :  4.825e-12 +/-        nan
alpha          :      1.604 +/-        nan
beta           :    0.03884 +/-        nan
Eb             :       1491 +/-        nan

Do the likelihood fitting

Now that all of the ancillary files have been generated, we can move on to the actual fitting. The first thing you should do is free some of the sources since all of the sources are initially fixed. We'll just free those sources in the center region.


In [11]:
# Free Normalization of all Sources within 3 deg of ROI center
gta.free_sources(distance=3.0,pars='norm')

# Free all parameters of isotropic and galactic diffuse components
gta.free_source('galdiff')
gta.free_source('isodiff')


2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for 3FGL J1555.7+1111     : ['norm']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for 3FGL J1553.5+1256     : ['norm']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for 3FGL J1603.7+1106     : ['Prefactor']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for 3FGL J1552.1+0852     : ['Prefactor']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for isodiff               : ['Normalization']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for galdiff               : ['Prefactor']
2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for galdiff               : ['Index']

In this simple anlaysis we are leaving the spectral shapes of sources fixed but we're going to free the spectral shape of the source we care about.


In [12]:
gta.free_source('3FGL J1555.7+1111')


2017-04-02 18:57:28 INFO    GTAnalysis.free_source(): Freeing parameters for 3FGL J1555.7+1111     : ['alpha', 'beta']

Now, actually do the fit. The software does its best to get the fit to converge by running the fit several times.


In [13]:
fit_results = gta.fit()


2017-04-02 18:57:28 INFO    GTAnalysis.fit(): Starting fit.
/opt/conda/lib/python2.7/site-packages/scipy/interpolate/fitpack2.py:224: UserWarning: 
The maximal number of iterations maxit (set to 20 by the program)
allowed for finding a smoothing spline with fp=s has been reached: s
too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
  warnings.warn(message)
2017-04-02 18:57:31 INFO    GTAnalysis.fit(): Fit returned successfully. Quality:   3 Status:   0
2017-04-02 18:57:31 INFO    GTAnalysis.fit(): LogLike:   -50238.453 DeltaLogLike:      270.794 

The dictionary returned by the fit method returns a variety of diagnostic information about the fit including the fit quality, the relative improvement in the likelihood, and the correlations among the fit parameters. We can inspect the results of the fit by printing the source object for PG1553.


In [14]:
print('Fit Quality: ',fit_results['fit_quality'])
print(gta.roi['3FGL J1555.7+1111'])


('Fit Quality: ', 3)
Name           : 3FGL J1555.7+1111
Associations   : ['3FGL J1555.7+1111', 'PG 1553+113', '1FHL J1555.7+1111', '2FGL J1555.7+1111']
RA/DEC         :    238.936/    11.194
GLON/GLAT      :     21.918/    43.960
TS             : 3004.99
Npred          : 1116.35
Flux           : 6.172e-08 +/- 6.16e-09
EnergyFlux     : 0.0001956 +/- 2.06e-05
SpatialModel   : PointSource
SpectrumType   : LogParabola
Spectral Parameters
norm           :  6.101e-12 +/-  3.063e-13
alpha          :      1.533 +/-    0.05064
beta           :    0.04338 +/-    0.01761
Eb             :       1491 +/-        nan

You can then save the state of the roi to an output file for reference later. The write_roi function does this. The first argument is a string that will be prepended to the names of the output files generated by this method.


In [15]:
gta.write_roi('fit0',make_plots=True)


2017-04-02 18:57:31 INFO    GTBinnedAnalysis.write_xml(): Writing /workdir/fermipy-extra/notebooks/pg1553/fit0_00.xml...
2017-04-02 18:57:31 INFO    GTAnalysis.write_fits(): Writing /workdir/fermipy-extra/notebooks/pg1553/fit0.fits...
WARNING: VerifyWarning: Keyword name 'loge_bounds' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]
2017-04-02 18:57:32 INFO    GTAnalysis.write_roi(): Writing /workdir/fermipy-extra/notebooks/pg1553/fit0.npy...

There are a lot of diagnostic plots also saved at the same time.


In [16]:
ls -l pg1553/*.png


-rw-r--r-- 1 root root 31928 Jul  8  2016 pg1553/3fgl_j1555.7+1111_sedlnl.png
-rw-r--r-- 1 root root 20949 Jul  8  2016 pg1553/3fgl_j1555.7+1111_sed.png
-rw-r--r-- 1 root root 74430 Apr  2 18:57 pg1553/fit0_counts_map_2.000_5.477.png
-rw-r--r-- 1 root root 25387 Apr  2 18:57 pg1553/fit0_counts_map_xproj_2.000_5.477.png
-rw-r--r-- 1 root root 25952 Apr  2 18:57 pg1553/fit0_counts_map_yproj_2.000_5.477.png
-rw-r--r-- 1 root root 58472 Apr  2 18:57 pg1553/fit0_counts_spectrum.png
-rw-r--r-- 1 root root 65177 Apr  2 18:57 pg1553/fit0_model_map_2.000_5.477.png

In [17]:
from IPython.display import Image, display
from glob import glob

In [18]:
pngs = glob('pg1553/*.png')

In [19]:
for png in pngs:
    my_image = Image(png)
    display(my_image)


Reading in the results

Since the results are saved, you can load them back up at any point (you can also get to these within python). Here we retrieve the analysis results from the output numpy file.


In [20]:
c = np.load('pg1553/fit0.npy').flat[0]

The sources dictionary has an entry for each source in the model:


In [21]:
sorted(c['sources'].keys())


Out[21]:
['3FGL J1540.8+1449',
 '3FGL J1541.6+1414',
 '3FGL J1541.8+1105',
 '3FGL J1546.0+0818',
 '3FGL J1548.4+1455',
 '3FGL J1552.1+0852',
 '3FGL J1553.5+1256',
 '3FGL J1555.7+1111',
 '3FGL J1603.7+1106',
 '3FGL J1607.0+1551',
 '3FGL J1608.6+1029',
 '3FGL J1611.9+1404',
 'galdiff',
 'isodiff']

Let's take a look at the flux, spectral parameters, and TS.


In [22]:
c['sources']['3FGL J1555.7+1111']['flux']


Out[22]:
6.171671203405652e-08

In [23]:
print(c['sources']['3FGL J1555.7+1111']['param_names'][:4])
print(c['sources']['3FGL J1555.7+1111']['param_values'][:4])


['norm' 'alpha' 'beta' 'Eb']
[  6.10078196e-12   1.53311268e+00   4.33771623e-02   1.49138049e+03]

In [24]:
c['sources']['3FGL J1555.7+1111']['ts']


Out[24]:
3004.993862917152

The SED is in there as well. We can plot it.


In [25]:
E = np.array(c['sources']['3FGL J1555.7+1111']['model_flux']['energies'])
dnde = np.array(c['sources']['3FGL J1555.7+1111']['model_flux']['dnde'])
dnde_hi = np.array(c['sources']['3FGL J1555.7+1111']['model_flux']['dnde_hi'])
dnde_lo = np.array(c['sources']['3FGL J1555.7+1111']['model_flux']['dnde_lo'])

In [26]:
plt.loglog(E, (E**2)*dnde, 'k--')
plt.loglog(E, (E**2)*dnde_hi, 'k')
plt.loglog(E, (E**2)*dnde_lo, 'k')
plt.xlabel('E [MeV]')
plt.ylabel(r'E$^2$ dN/dE [MeV cm$^{-2}$ s$^{-1}$]')
plt.show()


If you want SED points, there's a function for that. There are lots of options for this which you can set in the config file or from keyword arguments of the function itself.


In [27]:
sed = gta.sed('3FGL J1555.7+1111')


2017-04-02 18:57:36 INFO    GTAnalysis.sed(): Computing SED for 3FGL J1555.7+1111
2017-04-02 18:57:37 INFO    GTAnalysis._make_sed(): Fitting SED
2017-04-02 18:57:37 INFO    GTAnalysis.free_source(): Fixing parameters for 3FGL J1555.7+1111     : ['alpha', 'beta']
2017-04-02 18:57:37 INFO    GTAnalysis.free_source(): Fixing parameters for galdiff               : ['Index']
2017-04-02 18:57:42 INFO    GTAnalysis.sed(): Finished SED

You can save the state to the yaml file or you can just access it directly. This is also the way to get at the dictionary for any individual source.


In [28]:
src = gta.roi['3FGL J1555.7+1111']

In [29]:
plt.loglog(E, (E**2)*dnde, 'k--')
plt.loglog(E, (E**2)*dnde_hi, 'k')
plt.loglog(E, (E**2)*dnde_lo, 'k')
plt.errorbar(np.array(sed['e_ctr']),
             sed['e2dnde'], 
             yerr=sed['e2dnde_err'], fmt ='o')
plt.xlabel('E [MeV]')
plt.ylabel(r'E$^{2}$ dN/dE [MeV cm$^{-2}$ s$^{-1}$]')
plt.show()


Looks like those last two points should be upper limits. Let's plot those instead.


In [30]:
plt.loglog(E, (E**2)*dnde, 'k--')
plt.loglog(E, (E**2)*dnde_hi, 'k')
plt.loglog(E, (E**2)*dnde_lo, 'k')
plt.errorbar(sed['e_ctr'][:-2],
             sed['e2dnde'][:-2], 
             yerr=sed['e2dnde_err'][:-2], fmt ='o')
plt.errorbar(np.array(sed['e_ctr'][-2:]),
         sed['e2dnde_ul95'][-2:], yerr=0.2*sed['e2dnde_ul95'][-2:], 
             fmt='o', uplims=True)
plt.xlabel('E [MeV]')
plt.ylabel(r'E$^{2}$ dN/dE [MeV cm$^{-2}$ s$^{-1}$]')
plt.show()


Summary

There is a lot of other functionality and you should look through the docs for more details. You can also inspect the GTAnalysis object for some of these (like TS Maps, extension tests, and using event types). Following threads cover some of the more advanced functionality of fermipy:

  • IC443 : Analysis to measure the angular extension of the SNR IC443.
  • Draco : DM upper limit analysis of the Draco dwarf spheroidal galaxy.