Defining some concepts and vocabulary [1]:
ROOT.RooStats.HistFactory.Measurement()
with MakeModelAndMeasurementFast()
meas = ROOT.RooStats.HistFactory.Measurement("meas", "meas")
# Add lots of things to the measurement
model_workspace = ROOT.RooStats.HistFactory.MakeModelAndMeasurementFast(meas)
ROOT.Measurement().AddPOI()
sample = ROOT.RooStats.HistFactory.Sample("sample")
# A sample uncertainity that affectst the entire sample
sample.AddOverallSys("sample_uncertainty", uncertainty_down, uncertainty_up)
# Shape uncertainity
sample_shape = ROOT.RooStats.HistFactory.HistoSys("sample_shape")
sample_shape.SetHistoHigh(sample_up_hist)
sample_shape.SetHistoLow(sample_down_hist)
sample.AddHistoSys(sample_shape)
Models are built by HistFactory
from measurements that consist of channels which hold samples.
The measurement is the object which collects all the information that will then be turned into a model by HistFactory
. It is the top level combination of the channels that form the model.
A channel is a representation of observed data. Each channel is required to have a histogram of measured data associated with it. [3] Loosely, channels can be though of as "regions" in an analysis (e.g., signal regions, control regions, validation regions).
Samples are used to describe the channel (data) they are associated with. They can represent signals and backgrounds. Each sample has a histogram describing its shape and a list of systematics describing how that shape transforms based on a number of parameters. [3]
Example: [6]
In [1]:
import numpy as np
import ROOT
ROOT.RooMsgService.instance().setGlobalKillBelow(5)
%jsroot on
Per Kyle's suggestion, instead of using data (which is large and needs to be stored and transferred) have information be stored in the shape of a function, and then sample the function to produce the input histograms.
Kyle suggested doing the most trivial thing possible (a line) and then moving on from there
In [2]:
# An example TF1: A line with positive slope
f1 = ROOT.TF1("f1","[0]*x + [1]", 0, 10)
f1.SetParameters(1.5, 0.5)
In [3]:
def sample_func_histogram(shape_function, n_events=1, n_bins=10, range_low=0, range_high=1, hist_name='data_hist'):
"""
Sample a function. create an pdf from it, then histogram it
Args:
shape_function (TF1): The mathematical form of the function to be sampled
n_events (int): The number of points to sample
range_low (int, float): The lower bound on the sampling range
range_high (int, float): The upper bound on the sampling range
n_bins (int): The number of bins in the generated histogram
hist_name (str): The name of the histogram object
Returns:
hist (TH1): The generated histogram of n_events
"""
x = ROOT.RooRealVar('x', 'x', range_low, range_high)
# Get the formula string of the TF1 with the parameters evaluated
formula = str(shape_function.GetFormula().GetExpFormula("P"))
pdf = ROOT.RooGenericPdf('mypdf', 'mypdf', formula, ROOT.RooArgList(x))
data = pdf.generate(ROOT.RooArgSet(x), n_events)
return super(ROOT.RooDataSet,data).createHistogram(hist_name, x, ROOT.RooFit.Binning(n_bins, range_low, range_high))
In [4]:
c = ROOT.TCanvas()
hist = sample_func_histogram(f1, 10000, 10, 0, 10)
hist.Draw()
c.Draw()
In [5]:
def generate_shape_histogram(shape_function, n_bins=10, range_low=-1, range_high=1):
"""
Sample a function over a given range and generate a histogram
Args:
shape_function (TF1): The mathematical form of the function to be sampled
range_low (int, float): The lower bound on the sampling range
range_high (int, float): The upper bound on the sampling range
n_bins (int): The number of bins in the generated histogram
Returns:
hist (TH1): The generated histogram which represents the shape of the function
"""
hist = ROOT.TH1F("hist", "", n_bins, range_low, range_high)
for bin in range(0, n_bins+1):
hist.SetBinContent(bin,shape_function.Eval(hist.GetXaxis().GetBinCenter(bin)))
return hist
In [6]:
#h1 = generate_shape_histogram(f1, 10, 0, 10)
h1 = sample_func_histogram(f1, 1000, 10, 0, 10, "sample_hist")
c1 = ROOT.TCanvas()
#f1.Draw()
c1.cd()
h1.Draw("SAME")
#f1.Draw("SAME")
c1.Draw()
Comments from Vince: Indeed building the fit model is the first (and crucial) step. It might also be nice to use this model for something. Can you generate some data from this model and maybe perform a simple fit? If you're happy with the way it performs I think it's time to start adding bins/events/channels and building out into external functions to declutter your space.
Goal: The goal of the project is to be able to benchmark the speed/performance of the current implementation of HistFactory
to perform a maximum likelihood fit for a given binned model generated by the user. The user can do this by using the project Python library (which will essentially just be a collection of helper functions).
The binned model is supposed to be parameterized by:
If this is all being made "on the fly" then all components are necessary. However, if the channels are being loaded in from ROOT
files, then it would seem that the number of events and number of bins could be inferred(?).
As nuisance parameters are any model parameter which is not a parameter of interest, then if a parameter is not explicitly set as a POI (Measurement.SetPOI("parameter name")
) then it is treated as a nuisance parameter.
To make a model, a measurement and least one channel with at least one sample is required. Let's make this concrete by just creating a simple model line by line. [3]
In [7]:
# set inputs
range_low = 0
range_high = 10
n_events = 1000
n_bins = 5
frac_sig = 0.2
frac_bkg = 1 - frac_sig
# Create the measurement
meas = ROOT.RooStats.HistFactory.Measurement("meas", "meas")
meas.SetOutputFilePrefix("./results")
meas.SetPOI("SigXsecOverSM")
meas.AddConstantParam("Lumi")
meas.SetLumi(1.0)
meas.SetLumiRelErr(0.10)
meas.SetExportOnly(False)
# Create a channel
f_channel = ROOT.TF1("f_channel","[0]*x + [1] + [2]",range_low,range_high)
f_channel.SetParameters(1.5, 0.5, 4)
channel_1 = ROOT.RooStats.HistFactory.Channel("channel_1")
channel_1.SetData(sample_func_histogram(f_channel, n_events, n_bins, range_low, range_high))
##channel_1.SetStatErrorConfig(0.05, "Poisson")
# Create the signal sample and set its value
signal = ROOT.RooStats.HistFactory.Sample("signal")
signal.SetNormalizeByTheory(False)
f_sig = ROOT.TF1("f_sig","[0]*x + [1]",-5,10)
f_sig.SetParameters(1.5, 0.5)
#signal.SetHisto(generate_shape_histogram(f_sig, n_bins, range_low, range_high))
signal.SetHisto(sample_func_histogram(f_sig, frac_sig * n_events, n_bins, range_low, range_high, "sig_hist"))
###
# Add the parmaeter of interest and a systematic
# Try to make intelligent choice of upper bound
##import math
##upper_bound = max(3 * math.ceil((data_val - background_val) / signal_val), 3)
#signal.AddNormFactor("SigXsecOverSM", 1, 0, upper_bound)
###
signal.AddNormFactor("SigXsecOverSM", 1, 0, 3)
# Add overall systematic of 5%
signal.AddOverallSys("signal_uncertainty", 1.0 - 0.05, 1.0 + 0.05)
# add signal sample to channel
channel_1.AddSample(signal)
# Create the background sample and set its value
background = ROOT.RooStats.HistFactory.Sample("background")
background.SetNormalizeByTheory(False)
f_bkg = ROOT.TF1("f_bkg","[0]*x + [1]",range_low,range_high)
f_bkg.SetParameters(0, 4)
#background.SetHisto(generate_shape_histogram(f_bkg, n_bins, range_low, range_high))
background.SetHisto(sample_func_histogram(f_bkg, frac_bkg * n_events, n_bins, range_low, range_high, "bkg_hist"))
# Add overall systematic of 2%
background.AddOverallSys("background_uncertainty", 1.0 - 0.02, 1.0 + 0.02)
# add signal sample to channel
channel_1.AddSample(background)
# Add the channel to the measurement
meas.AddChannel(channel_1)
In [8]:
# Plot a data histogram for a visual check
data_hist = sample_func_histogram(f_channel, n_events, n_bins, range_low, range_high)
c2 = ROOT.TCanvas()
data_hist.Draw()
c2.Draw()
In [9]:
# If histograms for the channels are stored in ROOT files then go and collect
# the histograms represented in the channel into the channel object [4]
#meas.CollectHistograms()
# Print the measurement tree
print('\nPrinting measurement tree:\n')
meas.PrintTree()
factory = ROOT.RooStats.HistFactory.HistoToWorkspaceFactoryFast()
print('\n\nPrinting workspace:\n')
workspace = factory.MakeCombinedModel(meas)
print('\n\nPrinting model fit information:\n')
ROOT.RooStats.HistFactory.FitModel(workspace)
workspace.SetName('HistFactoryWorkspace')
workspace.writeToFile("output/HistFactoryWorkspace_linear.root")
# build the model workspace, fit it, create output plots, and save the workspace and plots to files
#hist2workspace = ROOT.RooStats.HistFactory.HistoToWorkspaceFactoryFast(meas)
#workspace = hist2workspace.MakeCombinedModel(meas)
#workspace = hist2workspace.MakeSingleChannelModel(meas, channel_1)
#workspace = ROOT.RooStats.HistFactory.MakeModelAndMeasurementFast(meas)
Out[9]:
In [10]:
# simPdf and obsData are defualts used by HistFactory
model = workspace.pdf('simPdf')
data = workspace.data('obsData')
model.fitTo(data, ROOT.RooFit.Save(True))
Out[10]:
In [11]:
model.graphVizTree("model.dot")
In [12]:
%%bash
dot -Tpng -o Images/model.png model.dot
dot -Tpdf -o Images/model.pdf model.dot
In [13]:
# Don't know how to set minimum value properly for a THStack, but can see bkg with unzoom js button
infile = ROOT.TFile.Open("output/HistFactoryWorkspace_linear.root")
w = infile.Get("HistFactoryWorkspace")
mc = w.obj("ModelConfig")
data = w.data("obsData")
hist_factory_nav = ROOT.RooStats.HistFactory.HistFactoryNavigation(mc)
stack = hist_factory_nav.GetChannelStack("channel_1")
channel_data = hist_factory_nav.GetDataHist(data,"channel_1","channel_data")
signal_hist = hist_factory_nav.GetSampleHist("channel_1", "signal", "signal_hist")
background_hist = hist_factory_nav.GetSampleHist("channel_1", "background", "background_hist")
c1 = ROOT.TCanvas()
stack.Draw()
channel_data.Draw("SAME")
leg = ROOT.TLegend(0.65, 0.65, 0.9, 0.9)
for hist in stack.GetHists():
samplename = hist.GetName().rsplit('_')[0]
leg.AddEntry(hist, samplename,"f")
stack.SetMinimum(-5)
leg.Draw()
c1.Draw()
c2 = ROOT.TCanvas()
channel_data.Draw()
c2.Draw()
for hist in stack.GetHists():
print(hist)
print(stack)
In [14]:
print(stack.GetNhists())
for hist in stack.GetHists():
print(hist)
Now let's do that again, but more intelligently by creating functions. As measurement, channel, and sample objects can be created independently of each other, let's build everything up by constructing the sample function
In [15]:
def make_sample(sample_name="sample", sample_histogram=None,
uncertainty_down=None, uncertainty_up=None,
shape_up_hist=None, shape_down_hist=None,
has_theory_norm=False):
"""
Construct a HistFactory.Sample()
Args:
sample_name (str): The name of the signal sample
sample_histogram: A TH1 object that encodes the shape of the sample
uncertainty_down (float): The downward normalisation systematic uncertainty (1.0 - uncertainty_down)
uncertainty_up (float): The upward normalisation systematic uncertainty (1.0 + uncertainty_up)
shape_up_hist: A TH1 object that encodes the updward sample shape uncertainty
shape_down_hist: A TH1 object that encodes the downward sample shape uncertainty
has_theory_norm (bool): Is the sample normalized by theory or not. Default is False.
Returns:
sample (HistFactory.Sample()): The constructed sample. Ex: signal sample, background sample
"""
sample = ROOT.RooStats.HistFactory.Sample(sample_name)
sample.SetNormalizeByTheory(has_theory_norm)
sample.SetHisto(sample_histogram)
#
sample.AddNormFactor("SigXsecOverSM", 1, 0, 3)
# Add normalisation systematic uncertainty
if uncertainty_down and uncertainty_up:
sample.AddOverallSys(sample_name + "_uncertainty", 1.0 - uncertainty_down, 1.0 + uncertainty_up)
# Add signal shape systematic uncertainty
if shape_up_hist and shape_down_hist:
shape_systematic = ROOT.RooStats.HistFactory.HistoSys(sample_name + "_shape_sys")
shape_systematic.SetHistoHigh(shape_up_hist)
shape_systematic.SetHistoLow(shape_down_hist)
sample.AddHistoSys(shape_systematic)
return sample
In [16]:
def make_channel(channel_name="channel", channel_data=None, channel_samples=None):
"""
Make a channel with data and associated samples
Args:
channel_name (str): The name of the channel.
Example: If this channel represents a signal region, the name might be "SR"
channel_data (TH1): The data histogram that describes the channel
channel_samples (list of HistFactory.Sample()): The samples associated with the channel that describe
the model for the channel
Returns:
channel (HistFactor.Channel()): The channel object
"""
channel = ROOT.RooStats.HistFactory.Channel(channel_name)
channel.SetData(channel_data)
#channel.SetStatErrorConfig(0.05, "Poisson")
if channel_samples:
for sample in channel_samples:
channel.AddSample(sample)
return channel
In [17]:
def make_model(n_evnets, n_bins, channels, POI, workspace_name=None, workspace_save_path=None):
"""
Args:
n_events: The number of events
n_bins: The numbe of bins
channels (dictionary of HistFactory.Channel()): The channels in the model with associated samples
#n_sig_comp: The number of signal components (automatically determined from the channels)
#n_bkg_comp: The number of background components (automatically determined from the channels)
POI (list of str): The parameters of interest
#n_nuisance: The number of nuisance parameters (automatically determined from the samples)
workspace_name (str): The name of the resulting RooStats.RooWorkspace()
workspace_save_path (str): The path and filename (sans ".root") to save the workspace
Returns:
workspace (RooStats.RooWorkspace): The workspace for the measruement
model (RooStats.RooSimultaneous): The model generated from the workspace
"""
measurement = ROOT.RooStats.HistFactory.Measurement("measurement", "measurement")
for parameter in POI:
measurement.SetPOI(parameter)
# Set luminosity (arbitrary choices)
measurement.SetLumi(1.0)
measurement.SetLumiRelErr(0.02)
measurement.AddConstantParam("Lumi")
# Add channels to measurement
for channel in channels:
measurement.AddChannel(channel)
# make the factory
factory = ROOT.RooStats.HistFactory.HistoToWorkspaceFactoryFast()
workspace = factory.MakeCombinedModel(measurement)
if workspace_name:
workspace.SetName(workspace_name)
else:
workspace.SetName('HistFactoryWorkspace')
if workspace_save_path:
workspace.writeToFile(workspace_save_path + ".root")
model = workspace.pdf('simPdf') # 'simPdf' is the default name assigned
return workspace, model
Now let's create the samples (1 signal and 1 background), the channel (1 signal region), and the model:
In [18]:
# See composite model notebook
f_sig = ROOT.TF1("f_sig","[0]*x + [1]", 0, 10)
f_sig.SetParameters(1.5, 0.5)
f_bkg = ROOT.TF1("f_bkg","[0]*x + [1]",range_low,range_high)
f_bkg.SetParameters(0, 4)
samples = []
samples.append(make_sample("signal", sample_func_histogram(f_sig, 20, 10, 0, 10), 0.01, 0.01))
samples.append(make_sample("background", sample_func_histogram(f_bkg, 80, 10, 0, 10), 0.01, 0.01))
print(samples)
In [19]:
f_channel = ROOT.TF1("f_channel","[0]*x + [1] + [2]", 0, 10)
f_channel.SetParameters(1.5, 0.5, 4)
channels = []
SR = make_channel(channel_name="SR", channel_data=sample_func_histogram(f_channel, 100, 10, 0, 10), channel_samples=samples)
channels.append(SR)
print(channels)
In [20]:
POI = ["SigXsecOverSM"]
ws, model = make_model(100, 10, channels, POI)
data = ws.data('obsData') # 'obsData' is the default name assigned
In [21]:
ws.Print()
In [22]:
model.fitTo(data, ROOT.RooFit.Save(True), ROOT.RooFit.PrintLevel(-1))
#ROOT.RooStats.HistFactory.FitModel(ws) #Using HistFactory built in funciton FitModel()
Out[22]:
Generate some data from the model
In [23]:
model_config = ws.obj("ModelConfig")
##model_config.LoadSnapshot()
x = ws.var("obs_x_SR")
data = model.generate(ROOT.RooArgSet(x),10000)
In [24]:
# Get Info
hist_factory_nav = ROOT.RooStats.HistFactory.HistFactoryNavigation(model_config)
print(hist_factory_nav)
channel_data = hist_factory_nav.GetDataHist(ws.data("obsData"),"SR","channel_data")
# I don't understand the uncertainties on this histogram
c2 = ROOT.TCanvas()
channel_data.Draw()
c2.Draw()
In [25]:
def get_formula(function, range_low=0, range_high=1, hist_name='data_hist'):
# Get the formula string of the TF1 with the parameters evaluated
return str(function.GetFormula().GetExpFormula("P"))
In [26]:
g1 = ROOT.TF1("g1","[0]*exp(-0.5*((x-[1])/[2])**2)",-5,5)
g1.SetParameters(1.25,0,0.5)
g2 = ROOT.TF1("g2","[0]*exp(-0.5*((x-[1])/[2])**2)",-5,5)
g2.SetParameters(1,1,2)
g3 = ROOT.TF1("g3","[0]*exp(-0.5*((x-[1])/[2])**2)",-5,5)
g3.SetParameters(1,-1,2)
g4 = ROOT.TF1("g4","[0]*exp(-0.5*((x-[1])/[2])**2) + [3]*exp(-0.5*((x-[4])/[5])**2) + [6]*exp(-0.5*((x-[7])/[8])**2)", -5, 5)
g4.SetParameters(1.25,0,0.5,1,1,2,1,-1,2)
In [27]:
get_formula(g1) + " + " + get_formula(g2) + " + " + get_formula(g3)
Out[27]:
In [28]:
c3 = ROOT.TCanvas()
c3.cd()
g1.Draw()
g2.Draw("SAME")
g3.Draw("SAME")
g4.Draw("SAME")
c3.Draw()
In [29]:
n_events = 1000
n_bins = 10
range_low = -5
range_high = 5
samples = []
samples.append(make_sample("signal", sample_func_histogram(g1, 0.1*n_events, n_bins, range_low, range_high),
0.01, 0.01))
samples.append(make_sample("background1", sample_func_histogram(g2, 0.45*n_events, n_bins, range_low, range_high),
0.01, 0.01))
samples.append(make_sample("background2", sample_func_histogram(g3, 0.45*n_events, n_bins, range_low, range_high),
0.01, 0.01))
#print(samples)
channels = []
SR = make_channel(channel_name="SR", channel_data=sample_func_histogram(g4, n_events, n_bins, range_low, range_high), channel_samples=samples)
channels.append(SR)
#print(channels)
POI = ["SigXsecOverSM"]
ws, model = make_model(n_events, n_bins, channels, POI)
data = ws.data('obsData')
model_config = ws.obj("ModelConfig")
hist_factory_nav = ROOT.RooStats.HistFactory.HistFactoryNavigation(model_config)
#print(hist_factory_nav)
stack = hist_factory_nav.GetChannelStack("SR")
channel_data = hist_factory_nav.GetDataHist(ws.data("obsData"),"SR","channel_data")
signal_hist = hist_factory_nav.GetSampleHist("SR", "signal", "signal_hist")
bkg1_hist = hist_factory_nav.GetSampleHist("SR", "background1", "bkg1_hist")
bkg2_hist = hist_factory_nav.GetSampleHist("SR", "background2", "bkg2_hist")
c1 = ROOT.TCanvas()
stack.Draw()
channel_data.Draw("SAME")
leg = ROOT.TLegend(0.65, 0.65, 0.9, 0.9)
for hist in stack.GetHists():
samplename = hist.GetName().rsplit('_')[0]
leg.AddEntry(hist, samplename,"f")
stack.SetMinimum(-5)
leg.Draw()
c1.Draw()
c2 = ROOT.TCanvas()
channel_data.Draw()
c2.Draw()
RooFit
and HistFactory