Now we are going to discuss how we can use build a summary data product that can be used to quickly fit a wide variety of different DM spectra.
Recall that the previous example involved fitting the data across all energy bins using a powerlaw with index -2 for the dark matter target.
What we would like to do is extract the spectrum of any excess (i.e., the flux or limits associated with the various energy bins) and then fit the various DM model spectra to the observed spectra.
Something like this:
So then we would be fitting various DM models against the spectral data points, rather than the counts data, as we did in the previous example. Typically we might use the uncertainties of the data points and do a $\chi^2$ fit for the DM spectrum.
There are two main issues with that approach.
So the best way to combine the information from the various energy bins is to combine the likelihoods.
First we need to extract the log-likelihood versus flux in each energy bin seperately. In any one energy bin, the analysis is just the same as what we did in the previous example, except that we only use the data and model in a single energy bin.
For a single energy bin the results may look something like this:
Where the delta log-likelihood is being plot on the color scale.
For two energy bins the results might look like this:
And finally, for all of the energy bins the results might look like this:
This last figure is called a "Castro" plot.
Basically, the dark red bands show the regions favored by the data, and the other colors show the regions increasingly disfavored by the data.
Here is another version of the same plot, where we have added the 95% CL upper limits in each of the energy bands.
Recall: the confidence level here is not quantifying the probability of the energy flux taking a value below the given 95% limit - that would be a Bayesian statement.
What is the corect phrasing we should use when describing the meaning of these upper limits?
If we assume a particular spectral form for the DM signal, we can use the data that went into the Castro plot to construct the log-likelihood as a function of the paramaters of the function we assumed. In our case we will be assuming the annihilation channel and mass of the DM, so the only free parameter is the normalization of the signal.
Here are what the 95% CL upper limits would look like in this simulation for DM annihilating to b-quarks, for several different DM masses.
By way of comparison, here is the upper limits on the spectrum you would get if you simply required that the curve did not exceed any of the single bin upper limits, which you can see is markedly worse.
Why does this plot not tell us the correct upper limit on the spectrum normalization?
Finally, here is what a positive dection of a signal might look like:
In [ ]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import LikeFitUtils as lfu
import SedUtils as SED
# lets open the file and have a look
import yaml
f_sed = yaml.load(open("results/draco_sed.yaml"))
len(f_sed)
Ok, the file contains 24 sets of curves, one for each energy bin.
Let's have a look at one of the sets of results.
In [ ]:
f_sed[0].keys()
In [ ]:
print "Energy range of bin 0 is %.1e to %.1e MeV"%(f_sed[0]['emin'],f_sed[0]['emax'])
print "Flux values scanned range from %.1e to %.1e ph cm^-2 s^-1"%(f_sed[0]['flux'][0],f_sed[0]['flux'][-1])
print "The corresponding energy flux values range from %.1e to %.1e MeV cm^-2 s^-1"%(f_sed[0]['eflux'][0],f_sed[0]['eflux'][-1])
print "The resulting delta log-Likelihood values at the edges of the scan are %.1f and %.1f"%(f_sed[0]['logLike'][0],f_sed[0]['logLike'][-1])
print "The conversion factor from energy flux to number of predicted counts is %.1e"%f_sed[0]['eflux2npred']
So, as stated, that file contains everything we need to make the Castro plot.
I've put some utilities in SedUtils.py
. These are functions to do things like interpolate the log-likelihood in each energy bin and then sum them together. I've added a small python class to manage things.
In [ ]:
import SedUtils as SED
sed = SED.SED(f_sed)
help(sed)
Ok, lets go ahead and take a look at the SED that we have.
In [ ]:
sed.binByBinUls = None
binByBinULs = sed.BinByBinULs()
figSED,axSED = SED.PlotSED(sed.energyBins,binByBinULs)
Ok, recall that we should never plot upper limits without also giving information about the expected upper limits. There is a file in the "ancil" sub-directory that has the quantiles for the upper limits from 300 Monte Carlo simulations of the analysis.
A pretty standard way to give a sense of the consistency of the results is to show the so called "Brazil" bands for the upper limits. I.e., expectation bands made from simulating the analysis chain numerous times. Typically people show the 1 and 2 $\sigma$ expectation bands and plot them in yellow and green, thus the name "Brazil".
In [ ]:
# let's get the file with the expected upper limits
f_sed_bands = yaml.load(open("ancil/draco_sed_mc_bands.yaml"))
In [ ]:
figSED2,axSED2 = SED.PlotSED(sed.energyBins,binByBinULs,f_sed_bands)
The second file is the DM_spectra.yaml
file in the "ancil" directory. This file gives the DM spectra for several different masses for the $b\hat{b}$ and $\tau^+\tau^-$ channels. I made this file specifically to match our analysis and our energy binning.
In [ ]:
f_dmspec = yaml.load(open("ancil/DM_spectra.yaml"))
print "Channels loaded are",f_dmspec.keys()
masses_bb = f_dmspec['bb'].keys()
masses_bb.sort()
masses_tau = f_dmspec['tautau'].keys()
masses_tau.sort()
fluxVals = f_dmspec['bb'][100]
print "Masses for the bb channel are",masses_bb
print "Masses for the tautau channel are",masses_tau
print "Flux values for 100GeV bb dark matter:\n",fluxVals
Similary to the previous example, the SED object will make a function that we can then pass to the optimizer, this is the NLL_Func
function.
In [ ]:
help(sed.NLL_func)
In [ ]:
nll_func = sed.NLL_func(fluxVals)
nll_null = nll_func(0.)
nll_test = nll_func(1.) # Warning, this is in units of 10^-26 cm^3 s-1
print nll_null,nll_test
There is also a Minimize
function that finds the normalization value that minimizes the negative log likelihood:
In [ ]:
result = sed.Minimize(fluxVals,1.0)
mle = result[0][0]
nll_mle = result[1]
ts = 2.*(nll_null-nll_mle)
print "Best-fit value %.1f"%(mle)
print "Test Statistic %.1f"%(ts)
So, it looks like there is no signal and we should set upper limits. As before we construct the upper limits at the point were the delta log-likelihood reaches 1.35.
In [ ]:
import LikeFitUtils as lfu
xbounds = (1e-4,1e1)
error_level = 1.35
ul = lfu.SolveForErrorLevel(nll_func,nll_mle,error_level,mle,xbounds)
print "Upper limit on <sigma v> is %.2e cm^3 s^-1"%(1e-26*ul)
In the SedUtils.py
file you will find a small piece of code to loop over all the channels and masses and to write the output to ../results/draco_dm_results.yaml
. Let's go ahead and open that file.
In [ ]:
f_dmlims = yaml.load(open("results/draco_dm_results.yaml"))
print "Channels are:",f_dmlims.keys()
print "Data saved for each channel:",f_dmlims['bb'].keys()
In [ ]:
print "Upper limits for bb channel are:\n",1e-26*f_dmlims['bb']['UL95']
Recall the point about how presenting upper limits alone gets rid of the information about the uncertainties and if the result is consistent with the null hypothesis.
Once again, you should never present upper limits without also presenting something that allows people to determine if they think the result is consistent with the null hypothesis.
You can find the quantiles calculated from 300 Monte Carlo simulated instances of the analysis chain in the file draco_spectral_mc_bands.yaml
in the ancil
folder.
In [ ]:
# Ok, first we will load the bands
bands = yaml.load(open("ancil/draco_spectral_mc_bands.yaml"))
print "MC expectation bands for channels: ",bands.keys()
print "Quantities available are: \n",bands['bb'].keys()
Ok, there as you can see, the file has a lot more information than the simple limits. The various types of limits presented in the file are:
For each type of limit the file contains inforamation about several quantiles from the Monte Carlo simulation runs.
In [ ]:
# Ok, let's go ahead and plot the limits against the expectation
f,a = SED.PlotLimits(f_dmlims['bb']['Masses'],f_dmlims['bb']['UL95'],bands['bb']['ulimits'])
Thank you for your attention