This tutorial shows how to perform Fermi-LAT analysis with the Fermipy Python package. Many parts of this tutorial are taken directly from the documentation page of Fermipy: fermipy.readthedocs. I suggest to visit the documentation page to find further informations.
Fermipy is a python package created by Mattew Wood and maintained by a wide community of people. Fermipy facilitates analysis of data from the Large Area Telescope (LAT) with the Fermi Science Tools. The Fermipy package is built on the pyLikelihood interface of the Fermi Science Tools and provides a set of high-level tools for performing common analysis tasks:
Instruction on how to install in SLAC machines or in your laptop Fermipy are available at this page: fermipy.installation. Fermipy is only compatible with Science Tools v10r0p5 or later. If you are using an earlier version, you will need to download and install the latest version from the FSSC. Note that it is recommended to use the non-ROOT binary distributions of the Science Tools. These instructions assume that you want to run Fermipy on the SLAC machines.
With these instructions you will create your own Conda installation and you will install all the packages needed to use FermiPy and Science Tools. Using your own Conda installation avoids conflicts with package versions because you build your own environment.
First grab the installation and setup scripts from the fermipy github repository:
In [ ]:
$ curl -OL https://raw.githubusercontent.com/fermiPy/fermipy/master/condainstall.sh
$ curl -OL https://raw.githubusercontent.com/fermiPy/fermipy/master/slacsetup.sh
Now choose an installation path. This should be a new directory (e.g. $HOME/anaconda) that has at least 2-4 GB available. We will assign this location to the CONDABASE environment variable which is used by the setup script to find the location of your python installation. To avoid setting this every time you log in it's recommended to set CONDABASE into your .bashrc file.
Now run the following commands to install anaconda and fermipy. This will take about 5-10 minutes.
In [ ]:
$ export CONDABASE=<path to install directory>
$ bash condainstall.sh $CONDABASE
Once anaconda is installed you will initialize your python and ST environment by running the slacsetup function in slacsetup.sh. This function will set the appropriate environment variables needed to run the STs and python.
In [ ]:
$ source slacsetup.sh
$ slacsetup
For convenience you can also copy this function into your .bashrc file so that it will automatically be available when you launch a new shell session. By default the function will setup your environment to point to a recent version of the STs and the installation of python in CONDABASE. If CONDABASE is not defined then it will use the installation of python that is packaged with a given release of the STs. The slacsetup function takes two optional arguments which can be used to override the ST version or python installation path.
In [ ]:
# Use ST 10-00-05
$ slacsetup 10-00-05
# Use ST 11-01-01 and python distribution located at <PATH>
$ slacsetup 11-01-01 <PATH>
The installation script only installs packages that are required by fermipy and the STs. Once you've initialized your shell environment you are free to install additional python packages with the conda package manager tool with conda install
conda install fermipy
conda can also be used to upgrade packages. For instance you can upgrade fermipy to the newest version with the conda update command:
In [ ]:
$ conda update fermipy
If you want to make development of Fermipy you should get the github version making:
In [ ]:
$ git clone https://github.com/fermiPy/fermipy.git
Then you should create a branch using:
In [ ]:
$ git checkout -b mattia-dev
In the branch you can make your development of Fermipy and then use the git commands to merge them to the master repository.
More informations on how to install Fermipy are available here: fermipy-install
The first step is to compose a configuration file that defines the data selection and analysis parameters. Fermipy uses YAML files to read and write its configuration in a persistent format. The configuration file has a hierarchical structure that groups parameters into dictionaries that are keyed to a section name (data, binning, etc.). Below I report a sample of configuration applied for an analysis of the SMC:
In [2]:
%matplotlib inline
import os
import numpy as np
from fermipy.gtanalysis import GTAnalysis
from fermipy.plotting import ROIPlotter, SEDPlotter
import matplotlib.pyplot as plt
import matplotlib
In [3]:
if os.path.isfile('../data/SMC_data.tar.gz'):
!tar xzf ../data/SMC_data.tar.gz
else:
!curl -OL https://raw.githubusercontent.com/fermiPy/fermipy-extras/master/data/SMC_data.tar.gz
!tar xzf SMC_data.tar.gz
In [5]:
cat config.yaml
The configuration file has the same structure as the configuration dictionary such that one can read/write configurations using the load/dump methods of the yaml module:
In [ ]:
import yaml
# Load a configuration
config = yaml.load(open('config.yaml'))
# Update a parameter and write a new configuration
config['selection']['emin'] = 1000.
yaml.dump(config, open('new_config.yaml','w'))
The data section defines the input data set and spacecraft file for the analysis. Here evfile points to a list of FT1 files that encompass the chosen ROI, energy range, and time selection. The parameters in the binning section define the dimensions of the ROI and the spatial and energy bin size. The selection section defines parameters related to the data selection (energy range, zmax cut, and event class/type). The target parameter in this section defines the ROI center to have the same coordinates as the given source. The model section defines parameters related to the ROI model definition (diffuse templates, point sources). Fermipy gives the user the option to combine multiple data selections into a joint likelihood with the components section. For more informations on this visit: http://fermipy.readthedocs.io/en/latest/quickstart.html
Note that the setup for a joint analysis is identical to the above except for the modification to the components section. The following example shows the components configuration one would use to define a joint analysis with the four PSF event types:
components:
- { selection : { evtype : 4 } }
- { selection : { evtype : 8 } }
- { selection : { evtype : 16 } }
- { selection : { evtype : 32 } }
First of all you need to load the configuration file, create the object gta and run the tool gta.setup that implements the ST gtselect, gtmktime, gtbin, gtexpcube, gtsrcmap tools
In [1]:
%matplotlib inline
import os
import numpy as np
from fermipy.gtanalysis import GTAnalysis
from fermipy.plotting import ROIPlotter, SEDPlotter
import matplotlib.pyplot as plt
import matplotlib
In [3]:
if os.path.isfile('../data/SMC_data.tar.gz'):
!tar xzf ../data/SMC_data.tar.gz
else:
!curl -OL https://raw.githubusercontent.com/fermiPy/fermipy-extras/master/data/SMC_data.tar.gz
!tar xzf SMC_data.tar.gz
In [4]:
gta = GTAnalysis('config.yaml')
matplotlib.interactive(True)
gta.setup()
In [5]:
gta.print_model()
In [7]:
gta.free_sources()
gta.fit()
Out[7]:
In [ ]:
The current state of the ROI can be written at any point by calling write_roi.
In [7]:
gta.write_roi('initial',make_plots=True,save_model_map=True)
The output file will contain all information about the state of the ROI as calculated up to that point in the analysis including model parameters and measured source characteristics (flux, TS, NPred). An XML model file will also be saved for each analysis component.
The output file can be read with load:
In [8]:
gta.load_roi('initial')
Using gta.print_model You have an overview of the sources and components present in the ROI.
In [9]:
gta.print_model()
The sources dictionary contains one element per source keyed to the source name. It is possible to have access to a lot of informations concerning each source of model.
In [10]:
print gta.roi.sources[0].name #NAME OF THE SOURCE
print gta.roi[gta.roi.sources[0].name] #NAME OF THE SOURCE
print gta.roi[gta.roi.sources[0].name]['glon'] #Longitude OF THE SOURCE
print gta.roi[gta.roi.sources[0].name]['glat'] #Latitude OF THE SOURCE
print gta.roi[gta.roi.sources[0].name]['flux'] #Flux OF THE SOURCE
print gta.roi[gta.roi.sources[0].name]['npred'] #npred OF THE SOURCE
Other possible outputs are listed here fermipy/sourcedictionary
In [8]:
gta.free_shape(gta.roi.sources[0].name,free=False) #Free or fix the index
gta.get_free_source_params(gta.roi.sources[0].name) #Free or fix parameters for a source
gta.set_parameter(gta.roi.sources[0].name,par='Index',value=2.0,scale=-1.0,bounds=[-2.,5.]) #Change the value and bounds for a parameter of a source
You can always use gta.print_model() to have a summary of you model.
In [9]:
gta.print_model()
The ROIModel class is responsible for managing the source and diffuse components in the ROI. Configuration of the model is controlled with the model block of YAML configuration file.
DIFFUSE AND ISOTROPIC TEMPLATES
The simplest configuration uses a single file for the galactic and isotropic diffuse components. By default the galactic diffuse and isotropic components will be named galdiff and isodiff respectively. An alias for each component will also be created with the name of the mapcube or file spectrum. For instance the galactic diffuse can be referred to as galdiff or gll_iem_v06 in the following example.
To define two or more galactic diffuse components you can optionally define the galdiff and isodiff parameters as lists. A separate component will be generated for each element in the list with the name galdiffXX or isodiffXX where XX is an integer position in the list.
SOURCE COMPONENT
The list of sources for inclusion in the ROI model is set by defining a list of catalogs with the catalogs parameter. Catalog files can be in either XML or FITS format. Sources from the catalogs in this list that satisfy either the src_roiwidth or src_radius selections are added to the ROI model. If a source is defined in multiple catalogs the source definition from the last file in the catalogs list takes precedence.
Individual sources can also be defined within the configuration file with the sources parameter. This parameter contains a list of dictionaries that defines the spatial and spectral parameters of each source. The keys of the source dictionary map to the spectral and spatial source properties as they would be defined in the XML model file.
Or you can do it while you are running your script with:
In [11]:
gta.delete_source(gta.roi.sources[0].name)
glon0 = gta.config['selection']['glon']
glat0 = gta.config['selection']['glat']
gta.add_source('SMC', dict(glon=glon0, glat=glat0, Index=dict(value=-2.4,scale=1.0,max="-1.",min="-5."), Scale=dict(value=1e3,scale=1.0,max="1e5",min="1e0"), Prefactor=dict(value=1.0,scale=1e-13,max="10000.0", min="0.0001"), SpectrumType='PowerLaw'), free=True, init_source=True, save_source_maps=True)
gta.print_model()
All sources have nan because we have not done yet a fit do the ROI. Moreover in the model above all sources are fixed. In order to free the parameters of the source it's enough to make gta.free_sources()
In [12]:
gta.free_sources()
gta.print_model()
It is also possible to free only the sources that are at a certain angular distance from a source. For example below we free the sources that are 2 degrees away from 3FGL J0322.5-3721:
In [13]:
gta.free_sources(free=False)
gta.free_sources(skydir=gta.roi[gta.roi.sources[0].name].skydir,distance=[3.0],free=True)
gta.print_model()
Source fitting with fermipy is generally performed with the optimize and fit methods. fit is a wrapper on the pyLikelihood fit method and performs a likelihood fit of all free parameters of the model. This method can be used to manually optimize of the model by calling it after freeing one or more source parameters.
In [14]:
gta.print_model()
gta.free_sources(free=True)
gta.print_model()
first_fit=gta.fit()
gta.print_model()
gta.write_roi('SMC_firstfit',make_plots=True,save_model_map=True)
By default fit will repeat the fit until a fit quality of 3 is obtained. After the fit returns all sources with free parameters will have their properties (flux, TS, NPred, etc.) updated in the ROIModel instance. The return value of the method is a dictionary containing the following diagnostic information about the fit.
The fit also accepts keyword arguments which can be used to configure its behavior at runtime:
In [15]:
print first_fit['fit_quality']
print first_fit['errors']
print first_fit['loglike']
print first_fit['values']
In [16]:
print gta.roi.sources[0]['param_names']
print gta.roi.sources[0]['param_values']
print gta.roi.sources[0]['param_errors']
The optimize method performs an automatic optimization of the ROI by fitting all sources with an iterative strategy. It is generally good practice to run this method once at the start of your analysis to ensure that all parameters are close to their global likelihood maxima.
In [17]:
gta.load_roi('initial')
gta.print_model()
In [18]:
gta.load_roi('SMC_firstfit')
gta.print_model()
tsmap() generates a test statistic (TS) map for an additional source component centered at each spatial bin in the ROI. The methodology is similar to that of the gttsmap ST application but with the following approximations:
TS Cube is a related method that can also be used to generate TS maps as well as cubes (TS vs. position and energy).
In [19]:
tsmap_postfit = gta.tsmap(prefix='TSmap_start',make_plots=True,write_fits=True,write_npy=True)
In [20]:
%matplotlib inline
fig = plt.figure(figsize=(14,6))
ROIPlotter(tsmap_postfit['sqrt_ts'],roi=gta.roi).plot(levels=[0,3,5,7],vmin=0,vmax=5,subplot=121,cmap='magma')
plt.gca().set_title('Sqrt(TS)')
ROIPlotter(tsmap_postfit['npred'],roi=gta.roi).plot(vmin=0,vmax=100,subplot=122,cmap='magma')
plt.gca().set_title('NPred')
Out[20]:
Looking to the TSmap it is quite clear that the model does not fit sufficiently well the data.
residmap() calculates the residual between smoothed data and model maps. Whereas a TS map is only sensitive to positive deviations with respect to the model, residmap() is sensitive to both positive and negative residuals and therefore can be useful for assessing the model goodness-of-fit.
In [21]:
resid = gta.residmap('SMC_postfit',model={'SpatialModel' : 'PointSource', 'Index' : 2.0},write_fits=True,write_npy=True,make_plots=True)
fig = plt.figure(figsize=(14,6))
ROIPlotter(resid['data'],roi=gta.roi).plot(vmin=50,vmax=400,subplot=121,cmap='magma')
plt.gca().set_title('Data')
ROIPlotter(resid['model'],roi=gta.roi).plot(vmin=50,vmax=400,subplot=122,cmap='magma')
plt.gca().set_title('Model')
fig = plt.figure(figsize=(14,6))
ROIPlotter(resid['sigma'],roi=gta.roi).plot(vmin=-5,vmax=5,levels=[-5,-3,3,5],subplot=121,cmap='RdBu_r')
plt.gca().set_title('Significance')
ROIPlotter(resid['excess'],roi=gta.roi).plot(vmin=-100,vmax=100,subplot=122,cmap='RdBu_r')
plt.gca().set_title('Excess')
Out[21]:
In [ ]:
The localize() method can be used to spatially localize a source. Localization is performed by scanning the likelihood surface in source position in a local patch around the nominal source position. The fit to the source position proceeds in two iterations:
TS Map Scan: Obtain a first estimate of the source position by generating a likelihood map of the region using the tsmap method. In this step all background parameters are fixed to their nominal values. The size of the search region used for this step is set with the dtheta_max parameter. Likelihood Scan: Refine the position of the source by performing a scan of the likelihood surface in a box centered on the best-fit position found in the first iteration. The size of the search region is set to encompass the 99% positional uncertainty contour. This method uses a full likelihood fit at each point in the likelihood scan and will re-fit all free parameters of the model. If a peak is found in the search region and the positional fit succeeds, the method will update the position of the source in the model to the new best-fit position.
In [22]:
gta.free_sources(free=False)
gta.print_model()
gta.free_sources(skydir=gta.roi[gta.roi.sources[0].name].skydir,distance=[3.0],free=True)
gta.print_model()
localsmc = gta.localize(gta.roi.sources[0].name, update=True, make_plots=True)
gta.print_model()
The SMC is relocalized at 0.07deg.
In [23]:
print localsmc['glon']
print localsmc['glat']
print localsmc['pos_r68']
print localsmc['pos_r95']
print localsmc['pos_r99']
print localsmc['pos_err_semimajor']
print localsmc['pos_err_semiminor']
print localsmc['dloglike_loc']
The extension() method executes a source extension analysis for a given source by computing a likelihood ratio test with respect to the no-extension (point-source) hypothesis and a best-fit model for extension. The best-fit extension is found by performing a likelihood profile scan over the source width (68% containment) and fitting for the extension that maximizes the model likelihood. Currently this method supports two models for extension: a 2D Gaussian (RadialGaussian) or a 2D disk (RadialDisk).
By default the method will fix all background parameters before performing the extension fit. One can leave background parameters free by setting free_background=True.
In [24]:
gta.free_sources(free=False)
gta.print_model()
gta.free_sources(skydir=gta.roi[gta.roi.sources[0].name].skydir,distance=[3.0],free=True)
gta.print_model()
extensionsmc = gta.extension(gta.roi.sources[0].name,update=True,make_plots=True,sqrt_ts_threshold=3.0,spatial_model='RadialGaussian')
gta.print_model()
In this specific case SMC is found to be extended with TSext=371 and with an angular extension of 1.19\pm0.07deg.
In [25]:
print extensionsmc['ext']
print extensionsmc['ext_err_hi']
print extensionsmc['ext_err_lo']
print extensionsmc['ext_err']
print extensionsmc['ext']
print extensionsmc['ext_ul95']
print extensionsmc['ts_ext']
find_sources() is an iterative source-finding algorithm that uses peak detection on a TS map to find new source candidates. The procedure for adding new sources at each iteration is as follows:
Source finding is repeated up to max_iter iterations or until no peaks are found in a given iteration. Sources found by the method are added to the model and given designations PS JXXXX.X+XXXX according to their position in celestial coordinates.
In [26]:
gta.free_sources()
model = {'Index' : 2.0, 'SpatialModel' : 'PointSource'}
findsource26 = gta.find_sources(model=model,sqrt_ts_threshold=5,min_separation=0.2,tsmap_fitter='tsmap')
In [27]:
gta.print_model()
gta.write_roi('SMC_relext_TS25',make_plots=True,save_model_map=True)
In [28]:
tsmap_postfit = gta.tsmap(prefix='TSmap_relext_TS25',make_plots=True,write_fits=True,write_npy=True)
In [29]:
fig = plt.figure(figsize=(14,6))
ROIPlotter(tsmap_postfit['sqrt_ts'],roi=gta.roi).plot(levels=[0,3,5,7],vmin=0,vmax=6,subplot=121,cmap='magma')
plt.gca().set_title('Sqrt(TS)')
ROIPlotter(tsmap_postfit['npred'],roi=gta.roi).plot(vmin=0,vmax=100,subplot=122,cmap='magma')
plt.gca().set_title('NPred')
Out[29]:
In [29]:
resid = gta.residmap('TSmap_relext_TS26',model={'SpatialModel' : 'PointSource', 'Index' : 2.0},write_fits=True,write_npy=True,make_plots=True)
In [30]:
fig = plt.figure(figsize=(14,6))
ROIPlotter(resid['data'],roi=gta.roi).plot(vmin=50,vmax=400,subplot=121,cmap='magma')
plt.gca().set_title('Data')
ROIPlotter(resid['model'],roi=gta.roi).plot(vmin=50,vmax=400,subplot=122,cmap='magma')
plt.gca().set_title('Model')
fig = plt.figure(figsize=(14,6))
ROIPlotter(resid['sigma'],roi=gta.roi).plot(vmin=-5,vmax=5,levels=[-5,-3,3,5],subplot=121,cmap='RdBu_r')
plt.gca().set_title('Significance')
ROIPlotter(resid['excess'],roi=gta.roi).plot(vmin=-100,vmax=100,subplot=122,cmap='RdBu_r')
plt.gca().set_title('Excess')
Out[30]:
The sed() method computes a spectral energy distribution (SED) by performing independent fits for the flux normalization of a source in bins of energy. The normalization in each bin is fit using a power-law spectral parameterization with a fixed index. The value of this index can be set with the bin_index parameter or allowed to vary over the energy range according to the local slope of the global spectral model (with the use_local_index parameter).
The free_background, free_radius, and cov_scale parameters control how nuisance parameters are dealt with in the fit. By default the method will fix the parameters of background components ROI when fitting the source normalization in each energy bin (free_background=False). Setting free_background=True will profile the normalizations of all background components that were free when the method was executed. In order to minimize overfitting, background normalization parameters are constrained with priors taken from the global fit. The strength of the priors is controlled with the cov_scale parameter. A larger (smaller) value of cov_scale applies a weaker (stronger) constraint on the background amplitude. Setting cov_scale=None performs an unconstrained fit without priors.
In [30]:
gta.free_sources(free=False)
gta.print_model()
gta.free_sources(skydir=gta.roi[gta.roi.sources[0].name].skydir,distance=[3.0],free=True)
gta.print_model()
sedsmc = gta.sed(gta.roi.sources[0].name, bin_index=2.2, outfile='sedSMC.fits', loge_bins=None,write_npy=True,write_fits=True,make_plots=True)
In [31]:
print sedsmc['e_min']
print sedsmc['e_max']
print sedsmc['e_ref']
print sedsmc['flux']
print sedsmc['eflux']
print sedsmc['e2dnde']
print sedsmc['dnde_ul95']
print sedsmc['ts']
In [43]:
# E^2 x Differential flux ULs in each bin in units of MeV cm^{-2} s^{-1}
print sedsmc['e2dnde_ul95']
e2dnde_scan = sedsmc['norm_scan']*sedsmc['ref_e2dnde'][:,None]
plt.figure()
plt.plot(e2dnde_scan[0],sedsmc['dloglike_scan'][0]-np.max(sedsmc['dloglike_scan'][0]))
plt.gca().set_ylim(-5,1)
plt.gca().axvline(sedsmc['e2dnde_ul95'][0],color='k')
plt.gca().axhline(-2.71/2.,color='r')
Out[43]:
In [32]:
fig = plt.figure(figsize=(14,4))
ylim=[1E-7,1E-5]
fig.add_subplot(121)
SEDPlotter(sedsmc).plot()
plt.gca().set_ylim(ylim)
fig = plt.figure(figsize=(14,4))
fig.add_subplot(121)
SEDPlotter(sedsmc).plot(showlnl=True,ylim=ylim)
plt.gca().set_ylim(ylim)
Out[32]:
lightcurve() fits the charateristics of a source (flux, TS, etc.) in a sequence of time bins. This method uses the data selection and model of a baseline analysis (e.g. the full mission) and is therefore restricted to analyzing time bins that are encompassed by the time selection of the baseline analysis. In general when using this method it is recommended to use a baseline time selection of at least several years or more to ensure the best characterization of background sources in the ROI.
When fitting a time bin the method will initialize the model to the current parameters of the baseline analysis. The parameters to be refit in each time bin may be controlled with free_background, free_sources, free_radius, free_params, and shape_ts_threshold options.
By default the lightcurve method will run an end-to-end analysis in each time bin using the same processing steps as the baseline analysis. Depending on the data selection and ROI size each time bin may take 10-15 minutes to process. There are several options which can be used to reduce the lightcurve computation time. The multithread option splits the analysis of time bins across multiple cores.
The use_scaled_srcmap option generates an approximate source map for each time bin by scaling the source map of the baseline analysis by the relative exposure.
In [38]:
lc = gta.lightcurve('SMC', free_radius=3.0, nbins=8, multithread=True, nthread=8, use_scaled_srcmap=True)
In [39]:
print lc['tmin']
print lc['tmax']
print lc['fit_success']
print lc['ts_var']
print lc['flux']
print lc['eflux']
print lc['flux_ul95']
In [40]:
fig = plt.figure(figsize=(8,6))
plt.errorbar((lc['tmin']+lc['tmax'])/2., lc['flux'], yerr=lc['flux_err'], xerr=(lc['tmin']-lc['tmax'])/2., fmt="o", color="black")
plt.ylabel(r'$\Phi_{\gamma}$ [ph/cm$^2$/s]', fontsize=18)
plt.xlabel(r'$t$ [s]', fontsize=18)
plt.axis([lc['tmin'][0],lc['tmax'][len(lc['tmax'])-1],2e-9,6e-9], fontsize=18)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.grid(True)
plt.yscale('log')
plt.xscale('linear')
plt.legend(loc=1,prop={'size':16},numpoints=1, scatterpoints=1, ncol=1)
fig.tight_layout(pad=0.5)
plt.show()
In [ ]:
In [ ]:
In [ ]: