This guide tutorial aims to get you started with some basic tasks you can accomplish using echidna.
First set up environment with convenience imports and inline plotting:
In [ ]:
%pylab inline
pylab.rc("savefig", dpi=120) # set resolution of inline figures
The Spectra
class is echidna's most fundamental class. It holds the core data structure and provides much of the core functionality required. Coincidentally, this guide will be centred around this class, how to create it and then some manipulations of the class.
We'll begin with how to create an instance of the Spectra
class. It is part of the echidna.core.spectra
module, so we will import this and make a Spectra
instance.
In [ ]:
import echidna.core.spectra as spectra
Now we need a config file to create the spectrum from. There is an example config file in echidna/config
. If we look at the contents of this yaml file, we see it tells the Spectra
class to create a data structure to hold two parameters:
energy_mc
, with lower limit 0, upper limit 10 and 1000 binsradial_mc
, with lower limit 0, upper limit 15000 and 1500 binsThis config should be fine for us. We can load it using the load_from_file
method of the SpectraConfig
class:
In [ ]:
import echidna
config = spectra.SpectraConfig.load_from_file(echidna.__echidna_base__ +
"/echidna/config/example.yml")
print config.get_pars()
Note we used the __echidna_base__
member of the echidna
module here. This module has two special members for denoting the base directory (the outermost directory of the git repository) and the home directory (the echidna
directory inside the base directory. The following lines show the current location of these directories:
In [ ]:
print echidna.__echidna_base__
print echidna.__echidna_home__
Finally before creating the spectrum, we should define the number of events it should represent:
In [ ]:
num_decays = 1000
In [ ]:
spectrum = spectra.Spectra("spectrum", num_decays, config)
print spectrum
And there you have it, we've created a Spectra
object.
Ok, so we now have a spectrum, let's fill it with some events. We'll generate random energies from a Gaussian distribution and random positions from a Uniform distribution. Much of echidna is built using the numpy
and SciPy
packages and we will use them here to generate the random numbers. We'll also generate a third random number to simulate some form rudimentary detector efficiency.
In [ ]:
# Import numpy
import numpy
In [ ]:
# Generate random energies from a Gaussin with mean (mu) and sigma (sigma)
mu = 2.5 # MeV
sigma = 0.15 # MeV
# Generate random radial position from a Uniform distribution
outer_radius = 5997 # Radius of SNO+ AV
# Detector efficiency
efficiency = 0.9 # 90%
for event in range(num_decays):
energy = numpy.random.normal(mu, sigma)
radius = numpy.random.uniform(high=outer_radius)
event_detected = (numpy.random.uniform() < efficiency)
if event_detected: # Fill spectrum with values
spectrum.fill(energy_mc=energy, radial_mc=radius)
This will have filled our Spectra
class with the events. Make sure to use the exact parameter names that were printed out above, as kewyord arguments. To check we can now use the sum
method. This returns the total number of events stored in the spectrum at a given time - the integral of the spectrum.
In [ ]:
print spectrum.sum()
The value returned by sum
, should roughly equal:
In [ ]:
print num_decays * efficiency
We can also inspect the raw data structure. This is saved in the _data
member of the Spectra
class:
In [ ]:
print spectrum._data
Note: you probably won't see any entries in the above. For large arrays, numpy only prints the first three and last three entries. Since our energy range is in the middle, all our events are in the ...
part at the moment. But we will see entries printed out later when we apply some cuts.
Another useful way to inspect the Spectra
created is to plot it. Support is available within echidna to plot using either ROOT
or matplotlib
and there are some useful plotting functions available in the plot
an plot_root
modules.
In [ ]:
import echidna.output.plot as plot
import echidna.output.plot_root as plot_root
To plot the projection of the spectrum on the energy_mc
axis:
In [ ]:
fig1 = plot.plot_projection(spectrum, "energy_mc",
fig_num=1, show_plot=False)
plt.show()
and to plot the projection on the radial_mc
axis, this time using root:
In [ ]:
plot_root.plot_projection(spectrum, "radial_mc", fig_num=2)
We can also project onto two dimensions and plot a surface:
In [ ]:
fig_3 = plot.plot_surface(spectrum, "energy_mc", "radial_mc",
fig_num=3, show_plot=False)
plt.show()
The ability to smear the event, along a parameter axis, is built into echidna in the smear
module. There are three classes in the module that allow us to create a smearer for different scenarios. There are two smearers for energy-based parameters, EnergySmearRes
and EnergySmearLY
, which allow smearing by energy resolution (e.g. $\frac{5\%}{\sqrt{(E[MeV])}}$ and light yield (e.g. 200 NHit/Mev) respectively. Then additionally the RadialSmear
class handles smearing along the axis of any radial based parameter.
We will go through an example of how to smear our spectrum by a fixed energy resolution of 5%. There are two main smearing algorithms: "weighted smear" and "random smear". The "random smear" algorithm takes each event in each bin and randomly assigns it a new energy from the Gaussian distribution for that bin - it is fast but not very accurate for low statistics. The "weighted smear" algorithm is slower but much more accurate, as re-weights each bin by taking into account all other nearby bins within a pre-defined range. We will use the "weighted smear" method in this example.
First to speed the smearing process, we will apply some loose cuts. Although, fewer bins means faster smearing, you should be wary of cutting the spectrum too tightly before smearing as you may end up cutting bins that would have influenced the smearing. Cuts can be applied using the shrink
method. (Confusingly there is also a cut
method which is almost identical to the shrink
method, but updates the number of events the spectrum represents, after the cut is applied. Unless you are sure this is what you want to do, it is probably better to use the shrink
method.) To shrink over multiple parameters, it is best to construct a dictionary of _low
and _high
values for each parameter and then pass this to the shrink method.
In [ ]:
shrink_dict = {"energy_mc_low": mu - 5.*sigma,
"energy_mc_high": mu + 5.*sigma,
"radial_mc_low": 0.0,
"radial_mc_high": 3500}
spectrum.shrink(**shrink_dict)
Using the sum
method, we can check to see how many events were cut.
In [ ]:
print spectrum.sum()
Import the smear class:
In [ ]:
import echidna.core.smear as smear
and create the smearer object.
In [ ]:
smearer = smear.EnergySmearRes()
By default the "weighted smear" method considers all bins within a $\pm 5\sigma$ range. For the sake of speed, we will reduce this to 3 here. Also set the energy resolution - 0.05 for 5%.
In [ ]:
smearer.set_num_sigma(3)
smearer.set_resolution(0.05)
To smear our original spectrum and create the new Spectra
object smeared_spectrum
:
In [ ]:
smeared_spectrum = smearer.weighted_smear(spectrum)
this should hopefully only create a couple of seconds.
The following code shows how to make a simple script, using matplotlib, to overlay the original and smeared spectra.
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
def overlay_spectra(original, smeared, dimension="energy_mc", fig_num=1):
""" Overlay original and smeared spectra.
Args:
original (echidna.core.spectra.Spectra): Original spectrum.
smeared (echidna.core.spectra.Spectra): Smeared spectrum.
dimension (string, optional): Dimension to project onto.
Default is "energy_mc".
fignum (int, optional): Figure number, if producing multiple
figures. Default is 1.
Returns:
matplotlib.figure.Figure: Figure showing overlaid spectra.
"""
fig = plt.figure(num=fig_num)
ax = fig.add_subplot(1, 1, 1)
par = original.get_config().get_par(dimension)
width = par.get_width()
# Define array of bin boundaries (1 more than number of bins)
bins = np.linspace(par._low, par._high, par._bins+1)
# Define array of bin centres
x = bins[:-1] + 0.5*width
# Overlay two spectra using projection as weight
ax.hist(x, bins, weights=original.project(dimension),
histtype="stepfilled", color="RoyalBlue",
alpha=0.5, label=original._name)
ax.hist(x, bins, weights=smeared.project(dimension),
histtype="stepfilled", color="Red",
alpha=0.5, label=smeared._name)
# Add label/style
plt.legend(loc="upper right")
plt.ylim(ymin=0.0)
plt.xlabel(dimension + " [" + par.get_unit() + "]")
plt.ylabel("Events per " + str(width) + " " + par.get_unit() + " bin")
return fig
In [ ]:
fig_4 = overlay_spectra(spectrum, smeared_spectrum, fig_num=4)
plt.show()
We now have a nice smeared version of our original spectrum. To prepare the spectrum for a final analysis there are a few final manipulations we may wish to do.
There is a special version of the shrink
method called shrink_to_roi
that can be used for ROI cuts. It saves some useful information about the ROI in the Spectra
class instance, including the efficiency i.e. integral of spectrum after cut divided by integral of spectrum before cut.
In [ ]:
roi = (mu - 0.5*sigma, mu + 1.45*sigma) # To get nice shape for rebinning
smeared_spectrum.shrink_to_roi(roi[0], roi[1], "energy_mc")
print smeared_spectrum.get_roi("energy_mc")
Our spectrum is still quite finely binned, perhaps we want to bin it in 50 keV bins instead of 10 keV bins. The rebin
method can be used to acheive this.
The rebin
method requires us to specify the new shape (tuple) of the data. With just two dimensions this is trivial, but with more dimensions, it may be better to use a construct such as:
In [ ]:
dimension = smeared_spectrum.get_config().get_pars().index("energy_mc")
old_shape = smeared_spectrum._data.shape
reduction_factor = 5 # how many bins to combine into a single bin
new_shape = tuple([j / reduction_factor if i == dimension else j
for i, j in enumerate(old_shape)])
print old_shape
print new_shape
In [ ]:
smeared_spectrum.rebin(new_shape)
Finally, we "simulated" 1000 events, but we most likely want to scale this down for to represent the number of events expected in our analysis. The Spectra
class has a scale
method to accomplish this. Remember that the scale
method should always be supplied with the number of events the full spectrum (i.e. before any cuts using shrink
or shrink_to_roi
) should represent. Lets assume that our spectrum should actually represent 104.25 events:
In [ ]:
smeared_spectrum.scale(104.25)
print smeared_spectrum.sum()
After creating, filling, convolving and various other manipulations what does our final spectrum look like?
In [ ]:
print smeared_spectrum._data
In [ ]:
fig_5 = plot.plot_projection(smeared_spectrum, "energy_mc",
fig_num=5, show_plot=False)
plt.show()