ROOT.RooStats.HistFactory.Measurement()
with MakeModelAndMeasurementFast()
meas = ROOT.RooStats.HistFactory.Measurement("meas", "meas")
# Add lots of things to the measurement
model_workspace = ROOT.RooStats.HistFactory.MakeModelAndMeasurementFast(meas)
ROOT.Measurement().AddPOI()
sample = ROOT.RooStats.HistFactory.Sample("sample")
# A sample uncertainity that affectst the entire sample
sample.AddOverallSys("sample_uncertainty", uncertainty_down, uncertainty_up)
# Shape uncertainity
sample_shape = ROOT.RooStats.HistFactory.HistoSys("sample_shape")
sample_shape.SetHistoHigh(sample_up_hist)
sample_shape.SetHistoLow(sample_down_hist)
sample.AddHistoSys(sample_shape)
Models are built by HistFactory
from measurements that consist of channels which hold samples.
The measurement is the object which collects all the information that will then be turned into a model by HistFactory
. It is the top level combination of the channels that form the model.
A channel is a representation of observed data. Each channel is required to have a histogram of measured data associated with it. [3] Loosely, channels can be though of as "regions" in an analysis (e.g., signal regions, control regions, validation regions).
Samples are used to describe the channel (data) they are associated with. They can represent signals and backgrounds. Each sample has a histogram describing its shape and a list of systematics describing how that shape transforms based on a number of parameters. [3]
In [1]:
import ROOT
ROOT.RooMsgService.instance().setGlobalKillBelow(5)
%jsroot on
import sys
import os
# Don't require pip install to test out
sys.path.append(os.getcwd() + '/../../src')
from dfgmark import histfactorybench as hfbench
#import rootpy
#from rootpy.stats.histfactory import utils as hfutils
Goal: The goal of the project is to be able to benchmark the speed/performance of the current implementation of HistFactory
to perform a maximum likelihood fit for a given binned model generated by the user. The user can do this by using the project Python library (which will essentially just be a collection of helper functions).
The binned model is supposed to be parameterized by:
If this is all being made "on the fly" then all components are necessary. However, if the channels are being loaded in from ROOT
files, then it would seem that the number of events and number of bins could be inferred(?).
As nuisance parameters are any model parameter which is not a parameter of interest, then if a parameter is not explicitly set as a POI (Measurement.SetPOI("parameter name")
) then it is treated as a nuisance parameter.
Let's build the simplest model: A counting experiment. One bin with a flat background sample and signal model.
In [2]:
help(hfbench.sample_func_histogram)
In [3]:
help(hfbench.make_sample)
In [4]:
help(hfbench.make_channel)
In [5]:
help(hfbench.make_model)
In [6]:
help(hfbench.benchmark_fit)
In [7]:
help(hfbench.benchmark_fit_with_workspace)
Now let's create the samples (1 signal and 1 background), the channel (1 signal region), and the model.
First, set the parameters for the model and make the flat 1 bin histograms
In [8]:
n_events = 1000
n_bins = 1
range_low = 0
range_high = 10
frac_sig = 0.1
frac_bkg = 1 - frac_sig
signal_hist = ROOT.TH1F("signal_hist", "signal_hist", n_bins, range_low, range_high)
signal_hist.SetBinContent(1, frac_sig * n_events)
background_hist = ROOT.TH1F("background_hist", "background_hist", n_bins, range_low, range_high)
background_hist.SetBinContent(1, frac_bkg * n_events)
data_hist = ROOT.TH1F("data_hist", "data_hist", n_bins, range_low, range_high)
data_hist.SetBinContent(1, n_events)
As this is a counting experiment, should have Poisson uncertainties ($\sqrt{n_{\text{events}}}$) on all histograms, so let's check:
In [9]:
hist_list = [signal_hist, background_hist, data_hist]
for hist in hist_list:
print("{0}: {1} events and uncertainty of {2} with sqrt({1}) being {3}".format(
hist.GetName().rsplit('_')[0],
hist.GetBinContent(1),
hist.GetBinError(1),
ROOT.TMath.Sqrt(hist.GetBinContent(1))))
Now make the samples and add them and the channel data histogram to the channel
In [10]:
samples = []
samples.append(hfbench.make_sample("signal", signal_hist, 0.01, 0.01))
samples.append(hfbench.make_sample("background", background_hist, 0.01, 0.01))
channels = []
SR = hfbench.make_channel(channel_name="SR", channel_data=data_hist, channel_samples=samples)
channels.append(SR)
Define the parameter of interest (POI), and then make the workspace and model
In [11]:
POI = ["SigXsecOverSM"]
ws, model = hfbench.make_model(n_events, n_bins, channels, POI)
In [12]:
ws.Print()
In [13]:
#model = ws.pdf("model_SR")
##hfbench.benchmark_fit(ws.var("obs_x_SR"), model, n_events, n_events, verbose=True)
#hfbench.benchmark_fit_with_workspace(ws, model, n_events, n_events, verbose=True)
In [14]:
#f = ws.var("obs_channel1").frame()
c = ROOT.TCanvas()
f = ws.var("obs_x_SR").frame()
print(ws.var("obs_x_SR"))
ws.data("obsData").plotOn(f)
f.Draw()
c.Draw()
print(ws.data("obsData"))
#h_test = super(ROOT.RooDataSet,ws.data("obsData")).createHistogram("h_test", ws.var("obs_x_SR"), ROOT.RooFit.Binning(n_bins, range_low, range_high))
roo_data_hist = ROOT.RooDataHist("h_test", "h_test", ROOT.RooArgSet(ws.var("obs_x_SR")), ws.data("obsData"))
h_test = roo_data_hist.createHistogram("h_test", ws.var("obs_x_SR"))
h_test.Sumw2(False)
h_test.Sumw2()
h_test.SetMinimum(0)
h_test.SetMaximum(h_test.GetBinContent(1) * 1.1)
print(roo_data_hist)
print(h_test)
roo_data_hist.Print("v")
c_test = ROOT.TCanvas()
h_test.Draw()
c_test.Draw()
It seems that the uncertainties on the model data (which as I understand it is stored as "obsData" in the model) have blown up here after just creating the model. So I must be doing something wrong in my make_model()
function.
In [15]:
x = ROOT.RooRealVar('x','x', range_low, range_high)
fr = x.frame()
f = ws.var("obs_x_SR").frame()
data = ws.data('obsData')
#data.plotOn(fr) #plotOn() is complaining that it can not find a working method :?
print(data) # check to make sure that it is not a null pointer
#h_data = super(ROOT.RooDataSet,data).createHistogram("data_hist", x, ROOT.RooFit.Binning(n_bins, range_low, range_high))
h_data = super(ROOT.RooDataSet,data).createHistogram("data_hist", ws.var("obs_x_SR"), ROOT.RooFit.Binning(n_bins, range_low, range_high))
h_data.Sumw2(False)
h_data.Sumw2()
h_data.SetMinimum(0)
h_data.SetMaximum(h_data.GetBinContent(1) * 1.1)
c = ROOT.TCanvas()
c.cd()
h_data.Draw("E")
c.Draw()
print("{0}: {1} events and uncertainty of {2} with sqrt({1}) being {3}".format(
h_data.GetName().rsplit('_')[0],
h_data.GetBinContent(1),
h_data.GetBinError(1),
ROOT.TMath.Sqrt(h_data.GetBinContent(1))))
In [16]:
# draw the histogram before the model was made
c = ROOT.TCanvas()
c.cd()
data_hist.Draw("E")
data_hist.SetMinimum(0)
data_hist.SetMaximum(data_hist.GetBinContent(1) * 1.1)
c.Draw()
model_config = ws.obj("ModelConfig")
hist_factory_nav = ROOT.RooStats.HistFactory.HistFactoryNavigation(model_config)
#print(hist_factory_nav)
# draw the histogram stack
stack = hist_factory_nav.GetChannelStack("SR")
#channel_data = hist_factory_nav.GetDataHist(ws.data("obsData"),"SR","channel_data")
channel_data = h_data
sig_hist = hist_factory_nav.GetSampleHist("SR", "signal", "sig_hist")
bkg_hist = hist_factory_nav.GetSampleHist("SR", "background", "bkg_hist")
c1 = ROOT.TCanvas()
stack.Draw()
channel_data.Draw("SAME")
leg = ROOT.TLegend(0.65, 0.65, 0.9, 0.9)
for hist in stack.GetHists():
samplename = hist.GetName().rsplit('_')[0]
leg.AddEntry(hist, samplename,"f")
leg.AddEntry(channel_data, "channel data","lep")
leg.Draw()
c1.Draw()
# draw the histogram stored as "obsData" in the model
c2 = ROOT.TCanvas()
channel_data.Draw()
channel_data.SetMinimum(0)
channel_data.SetMaximum(channel_data.GetBinContent(1) * 1.1)
c2.Draw()
In [17]:
# To avoid a segfault at kerenel exit
canvas_list = [c, c_test, c1, c2]
for canvas in canvas_list:
canvas.IsA().Destructor(canvas)
RooFit
and HistFactory