Example Notebook for Model with Only Signal

Components of a Model

Models are built by HistFactory from measurements that consist of channels which hold samples.

The measurement is the object which collects all the information that will then be turned into a model by HistFactory. It is the top level combination of the channels that form the model.

A channel is a representation of observed data. Each channel is required to have a histogram of measured data associated with it. [3] Loosely, channels can be though of as "regions" in an analysis (e.g., signal regions, control regions, validation regions).

Samples are used to describe the channel (data) they are associated with. They can represent signals and backgrounds. Each sample has a histogram describing its shape and a list of systematics describing how that shape transforms based on a number of parameters. [3]



In [1]:

    
import ROOT
ROOT.RooMsgService.instance().setGlobalKillBelow(5)
%jsroot on

import sys
import os
# Don't require pip install to test out
sys.path.append(os.getcwd() + '/../../src')
from dfgmark import histfactorybench as hfbench









    



Welcome to JupyROOT 6.11/01

Functions



In [2]:

    
help(hfbench.sample_func_histogram)









    



Help on function sample_func_histogram in module dfgmark.histfactorybench:

sample_func_histogram(shape_function, n_events=1, n_bins=10, range_low=0, range_high=1, hist_name='data_hist')
    Sample a function. create an pdf from it, then histogram it
    
    Args:
        shape_function (TF1): The mathematical form of the function to be sampled
        n_events (int): The number of points to sample
        range_low (int, float): The lower bound on the sampling range
        range_high (int, float): The upper bound on the sampling range
        n_bins (int): The number of bins in the generated histogram
        hist_name (str): The name of the histogram object
    Returns:
        hist (TH1): The generated histogram of n_events



In [3]:

    
help(hfbench.make_sample)









    



Help on function make_sample in module dfgmark.histfactorybench:

make_sample(sample_name='sample', sample_histogram=None, uncertainty_down=None, uncertainty_up=None, shape_up_hist=None, shape_down_hist=None, has_theory_norm=False)
    Construct a HistFactory.Sample()
    
    Args:
        sample_name (str): The name of the signal sample
        sample_histogram: A TH1 object that encodes the shape of the sample
        uncertainty_down (float): The downward normalisation systematic uncertainty (1.0 - uncertainty_down)
        uncertainty_up (float): The upward normalisation systematic uncertainty (1.0 + uncertainty_up)
        shape_up_hist: A TH1 object that encodes the updward sample shape uncertainty
        shape_down_hist: A TH1 object that encodes the downward sample shape uncertainty
        has_theory_norm (bool): Is the sample normalized by theory or not. Default is False.
    Returns:
        sample (HistFactory.Sample()): The constructed sample. Ex: signal sample, background sample



In [4]:

    
help(hfbench.make_channel)









    



Help on function make_channel in module dfgmark.histfactorybench:

make_channel(channel_name='channel', channel_data=None, channel_samples=None)
    Make a channel with data and associated samples
    
    Args:
        channel_name (str): The name of the channel.
            Example: If this channel represents a signal region, the name might be "SR"
        channel_data (TH1): The data histogram that describes the channel
        channel_samples (list of HistFactory.Sample()): The samples associated with the channel that describe
                                                        the model for the channel
    
    Returns:
        channel (HistFactor.Channel()): The channel object



In [5]:

    
help(hfbench.make_model)









    



Help on function make_model in module dfgmark.histfactorybench:

make_model(n_evnets, n_bins, channels, POI, workspace_name=None, workspace_save_path=None)
    Args:
        n_events: The number of events
        n_bins: The numbe of bins
        channels (dictionary of HistFactory.Channel()): The channels in the model with associated samples
        # n_sig_comp: The number of signal components (automatically determined from the channels)
        # n_bkg_comp: The number of background components (automatically
        # determined from the channels)
        POI (list of str): The parameters of interest
        # n_nuisance: The number of nuisance parameters (automatically
        # determined from the samples)
        workspace_name (str): The name of the resulting RooStats.RooWorkspace()
        workspace_save_path (str): The path and filename (sans ".root") to save the workspace
    Returns:
        workspace (RooStats.RooWorkspace): The workspace for the measruement
        model (RooStats.RooSimultaneous): The model generated from the workspace



In [6]:

    
help(hfbench.benchmark_fit)









    



Help on function benchmark_fit in module dfgmark.histfactorybench:

benchmark_fit(var, model, n_events, n_trials, verbose=False)
    Fit the given model and return the time it take
    
    Args:
        var (RooAbsReal and inheritors): The variable with range over which to fit the model
        model (RooAbsPdf): The pdf of the model
        n_events (int): The number of data points to generate from the model
        n_trials (int): The number of times to fit the model
        verbose (bool):
            False: Don't print information
            True: Print information on fit
    Returns:
        mean_fit_time (float): The mean time it takes to fit the model over n_trials times
    Example:
        ws, model = make_model(n_events, n_bins, channels, POI)
        benchmark_fit(ws.var("obs_x_SR"), model, n_events, n_trials)



In [7]:

    
help(hfbench.benchmark_fit_with_workspace)









    



Help on function benchmark_fit_with_workspace in module dfgmark.histfactorybench:

benchmark_fit_with_workspace(work_space, model, n_events, n_trials, verbose=False)
    Fit the given model and return the time it takes
    
    Args:
        work_space (RooWorkspace): The work space for the model
        model (RooAbsPdf): The pdf of the model
        n_events (int): The number of data points to generate from the model
        n_trials (int): The number of times to fit the model
        verbose (bool):
            False: Don't print information
            True: Print information on fit
    Returns:
        mean_fit_time (float): The mean time it takes to fit the model over n_trials times
    Example:
        ws, model = make_model(n_events, n_bins, channels, POI)
        benchmark_fit_with_workspace(ws, model, n_events, n_trials)

Examples

Now let's create the samples (1 signal), the channel (1 signal region), and the model.

First, set the parameters for the model and make the signal sample histogram



In [8]:

    
n_events = 1000
n_bins = 10
range_low = 0
range_high = 10

frac_sig = 1

# pdf of a falling line
f1 = ROOT.TF1("f1","[0]*x + [1]", 0, 10)
f1.SetParameters(-1/50, 1/5)

signal_hist = hfbench.sample_func_histogram(
    f1, n_events=n_events, range_low=0, range_high=10, hist_name="signal_hist")

data_hist = hfbench.sample_func_histogram(
    f1, n_events=n_events, range_low=0, range_high=10, hist_name="data_hist")



In [9]:

    
c = ROOT.TCanvas()
signal_hist.Draw()
data_hist.SetLineColor(1)
data_hist.Draw("SAME")
c.Draw()

Now make the samples and add them and the channel data histogram to the channel



In [10]:

    
samples = []
samples.append(hfbench.make_sample("signal", signal_hist, 0.01, 0.01))

channels = []
SR = hfbench.make_channel(channel_name="SR", channel_data=data_hist, channel_samples=samples)
channels.append(SR)

Define the parameter of interest (POI) and then make the workspace and model



In [11]:

    
POI = ["SignalStrength"]

ws, model = hfbench.make_model(n_events, n_bins, channels, POI)









    



-------------------
Starting to process SR channel with 1 observables
lumi str = [1,0,10]
lumi Error str = nominalLumi[1,0,1.2],0.02
making normFactor: SignalStrength
signal_SR has no variation histograms 
processing hist signal_hist__x
-----------------------------------------
import model into workspace
RooDataSet::AsimovData[obs_x_SR,weight:binWeightAsimov] = 10 entries (1000 weighted)

RooWorkspace(SR) SR workspace contents

variables
---------
(Lumi,SignalStrength,alpha_signal_uncertainty,binWidth_obs_x_SR_0,nom_alpha_signal_uncertainty,nominalLumi,obs_x_SR,weightVar)

p.d.f.s
-------
RooRealSumPdf::SR_model[ binWidth_obs_x_SR_0 * L_x_signal_SR_overallSyst_x_Exp ] = 8/1000
RooGaussian::alpha_signal_uncertaintyConstraint[ x=alpha_signal_uncertainty mean=nom_alpha_signal_uncertainty sigma=1 ] = 1
RooGaussian::lumiConstraint[ x=Lumi mean=nominalLumi sigma=0.02 ] = 1
RooProdPdf::model_SR[ lumiConstraint * alpha_signal_uncertaintyConstraint * SR_model(obs_x_SR) ] = 8

functions
--------
RooProduct::L_x_signal_SR_overallSyst_x_Exp[ 1 * signal_SR_overallSyst_x_Exp ] = 8
RooStats::HistFactory::FlexibleInterpVar::signal_SR_epsilon[ paramList=(alpha_signal_uncertainty) ] = 1
RooHistFunc::signal_SR_nominal[ depList=(obs_x_SR) ] = 8
RooProduct::signal_SR_overallNorm_x_sigma_epsilon[ SignalStrength * signal_SR_epsilon ] = 1
RooProduct::signal_SR_overallSyst_x_Exp[ signal_SR_nominal * signal_SR_overallNorm_x_sigma_epsilon ] = 8

datasets
--------
RooDataSet::asimovData(obs_x_SR)
RooDataSet::obsData(obs_x_SR)

embedded datasets (in pdfs and functions)
-----------------------------------------
RooDataHist::signal_SRnominalDHist(obs_x_SR)

named sets
----------
ModelConfig_GlobalObservables:(nom_alpha_signal_uncertainty)
ModelConfig_Observables:(obs_x_SR)
coefList:(binWidth_obs_x_SR_0)
constraintTerms:(lumiConstraint,alpha_signal_uncertaintyConstraint)
globalObservables:(nom_alpha_signal_uncertainty)
likelihoodTerms:(SR_model)
obsAndWeight:(weightVar,obs_x_SR)
observables:(obs_x_SR)
observablesSet:(obs_x_SR)
shapeList:(L_x_signal_SR_overallSyst_x_Exp)

generic objects
---------------
RooStats::ModelConfig::ModelConfig

Setting Parameter(s) of Interest as: SignalStrength



In [12]:

    
ws.Print()









    



RooWorkspace(HistFactoryWorkspace) SR workspace contents

variables
---------
(Lumi,SignalStrength,alpha_signal_uncertainty,binWidth_obs_x_SR_0,nom_alpha_signal_uncertainty,nominalLumi,obs_x_SR,weightVar)

p.d.f.s
-------
RooRealSumPdf::SR_model[ binWidth_obs_x_SR_0 * L_x_signal_SR_overallSyst_x_Exp ] = 8/1000
RooGaussian::alpha_signal_uncertaintyConstraint[ x=alpha_signal_uncertainty mean=nom_alpha_signal_uncertainty sigma=1 ] = 1
RooGaussian::lumiConstraint[ x=Lumi mean=nominalLumi sigma=0.02 ] = 1
RooProdPdf::model_SR[ lumiConstraint * alpha_signal_uncertaintyConstraint * SR_model(obs_x_SR) ] = 8

functions
--------
RooProduct::L_x_signal_SR_overallSyst_x_Exp[ 1 * signal_SR_overallSyst_x_Exp ] = 8
RooStats::HistFactory::FlexibleInterpVar::signal_SR_epsilon[ paramList=(alpha_signal_uncertainty) ] = 1
RooHistFunc::signal_SR_nominal[ depList=(obs_x_SR) ] = 8
RooProduct::signal_SR_overallNorm_x_sigma_epsilon[ SignalStrength * signal_SR_epsilon ] = 1
RooProduct::signal_SR_overallSyst_x_Exp[ signal_SR_nominal * signal_SR_overallNorm_x_sigma_epsilon ] = 8

datasets
--------
RooDataSet::asimovData(obs_x_SR)
RooDataSet::obsData(obs_x_SR)

embedded datasets (in pdfs and functions)
-----------------------------------------
RooDataHist::signal_SRnominalDHist(obs_x_SR)

parameter snapshots
-------------------
NominalParamValues = (Lumi=1[C],nominalLumi=1[C],alpha_signal_uncertainty=0,nom_alpha_signal_uncertainty=0[C],SignalStrength=1,obs_x_SR=9.5,binWidth_obs_x_SR_0=1[C],weightVar=0)

named sets
----------
ModelConfig_GlobalObservables:(nom_alpha_signal_uncertainty)
ModelConfig_NuisParams:(alpha_signal_uncertainty)
ModelConfig_Observables:(obs_x_SR)
ModelConfig_POI:(SignalStrength)
coefList:(binWidth_obs_x_SR_0)
constraintTerms:(lumiConstraint,alpha_signal_uncertaintyConstraint)
globalObservables:(nom_alpha_signal_uncertainty)
likelihoodTerms:(SR_model)
obsAndWeight:(weightVar,obs_x_SR)
observables:(obs_x_SR)
observablesSet:(obs_x_SR)
shapeList:(L_x_signal_SR_overallSyst_x_Exp)

generic objects
---------------
RooStats::ModelConfig::ModelConfig



In [13]:

    
model.graphVizTree("model.dot")



In [14]:

    
%%bash
dot -Tpdf -o Images/model.pdf model.dot



In [15]:

    
#hfbench.benchmark_fit(ws.var("obs_x_SR"), model, n_events, n_events, verbose=True)
hfbench.benchmark_fit_with_workspace(ws, model, n_events, n_events, verbose=True);









    



model_SR was fit 1000 times in 2.089689254760742 seconds
The average fit time is 0.002089689254760742 seconds



In [16]:

    
# check information on the fit
hfbench.benchmark_fit_with_workspace(ws, model, n_events, 1, verbose=True);









    



model_SR was fit 1 times in 0.0046193599700927734 seconds
The average fit time is 0.0046193599700927734 seconds
 **********
 **12002 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 SignalStrength   1.00000e+00  3.31632e-02    0.00000e+00  3.00000e+00
     2 alpha_signal_uncertainty  -4.59140e-09  9.93347e-01   -5.00000e+00  5.00000e+00
 **********
 **12003 **SET ERR         0.5
 **********
 **********
 **12004 **SET PRINT           1
 **********
 **********
 **12005 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **12006 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=-3826.88 FROM MIGRAD    STATUS=INITIATE        4 CALLS           5 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  SignalStrength   1.00000e+00   3.31632e-02   2.34529e-02  -5.90578e-04
   2  alpha_signal_uncertainty  -4.59140e-09   9.93347e-01   2.00000e-01  -7.01333e-06
                               ERR DEF= 0.5
 MIGRAD FAILS TO FIND IMPROVEMENT
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3826.88 FROM HESSE     STATUS=OK             14 CALLS          30 TOTAL
                     EDM=1.34697e-13    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  SignalStrength   1.00000e+00   3.31632e-02   1.00192e-03  -2.27188e-05
   2  alpha_signal_uncertainty  -4.59140e-09   9.93358e-01   8.54409e-03  -2.71174e-07
                               ERR DEF= 0.5
 MIGRAD FAILS TO FIND IMPROVEMENT
 MIGRAD MINIMIZATION HAS CONVERGED.
 FCN=-3826.88 FROM MIGRAD    STATUS=CONVERGED      38 CALLS          39 TOTAL
                     EDM=1.34697e-13    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  SignalStrength   1.00000e+00   3.31632e-02   0.00000e+00  -2.27188e-05
   2  alpha_signal_uncertainty  -4.59140e-09   9.93358e-01  -0.00000e+00  -2.71174e-07
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  1.100e-03 -1.000e-02 
 -1.000e-02  1.000e+00 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.30151   1.000 -0.302
        2  0.30151  -0.302  1.000
 **********
 **12007 **SET ERR         0.5
 **********
 **********
 **12008 **SET PRINT           1
 **********
 **********
 **12009 **HESSE        1000
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-3826.88 FROM HESSE     STATUS=OK             14 CALLS          53 TOTAL
                     EDM=4.2157e-16    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  SignalStrength   1.00000e+00   3.31632e-02   2.00384e-04  -3.39837e-01
   2  alpha_signal_uncertainty  -4.59140e-09   9.93347e-01   1.70882e-03  -9.18280e-10
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  1.100e-03 -1.000e-02 
 -1.000e-02  1.000e+00 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.30151   1.000 -0.302
        2  0.30151  -0.302  1.000



In [17]:

    
model.Print("v")









    



--- RooAbsArg ---
  Value State: DIRTY
  Shape State: DIRTY
  Attributes: 
  Address: 0x7059610
  Clients: 
  Servers: 
    (0x5270a10,V-) RooGaussian::lumiConstraint "lumiConstraint"
    (0x703fde0,V-) RooGaussian::alpha_signal_uncertaintyConstraint "alpha_signal_uncertaintyConstraint"
    (0x7043380,V-) RooRealSumPdf::SR_model "SR_model"
  Proxies: 
    !pdfs -> 
      1)                      lumiConstraint
      2)  alpha_signal_uncertaintyConstraint
      3)                            SR_model
--- RooAbsReal ---

  Plot label is "model_SR"
--- RooAbsPdf ---
Cached value = 0



In [18]:

    
mc = ws.obj("ModelConfig")
mc.LoadSnapshot()

pl = ROOT.RooStats.ProfileLikelihoodCalculator(ws.data("obsData"), mc)
ROOT.SetOwnership(pl, False)

##ws.data("obsData").Print("v")
##mc.Print("v")

model.Print("v")

x = ws.var("obs_x_SR")
#pl.SetConfidenceLevel(0.99)
pl.SetConfidenceLevel(0.683)

interval = pl.GetInterval()

firstPOI = mc.GetParametersOfInterest().first()
lowerLimit = interval.LowerLimit(firstPOI)
upperLimit = interval.UpperLimit(firstPOI)

print("68% interval on {0} is [{1},{2}]\n".format(firstPOI.GetName(),lowerLimit,upperLimit))

plot = ROOT.RooStats.LikelihoodIntervalPlot(interval)
plot.SetRange(-3,3)
plot.SetNPoints(50)
plot.SetMaximum(0.1)
c = ROOT.TCanvas()
#c.SetLogy(True)
plot.Draw()
c.Draw()
c.SaveAs("nll.pdf")
c.IsA().Destructor(c) # As currently not exiting properly









    



68% interval on SignalStrength is [0.967217085094437,1.0335970649191333]

--- RooAbsArg ---
  Value State: DIRTY
  Shape State: DIRTY
  Attributes: 
  Address: 0x7059610
  Clients: 
  Servers: 
    (0x5270a10,V-) RooGaussian::lumiConstraint "lumiConstraint"
    (0x703fde0,V-) RooGaussian::alpha_signal_uncertaintyConstraint "alpha_signal_uncertaintyConstraint"
    (0x7043380,V-) RooRealSumPdf::SR_model "SR_model"
  Proxies: 
    !pdfs -> 
      1)                      lumiConstraint
      2)  alpha_signal_uncertaintyConstraint
      3)                            SR_model
--- RooAbsReal ---

  Plot label is "model_SR"
--- RooAbsPdf ---
Cached value = 0






    



Info in <TCanvas::Print>: pdf file nll.pdf has been created



In [19]:

    
c = ROOT.TCanvas()
f = ws.var("obs_x_SR").frame()
#print(ws.var("obs_x_SR"))
ws.data("obsData").plotOn(f)
f.Draw()
c.Draw()
print(ws.data("obsData"))

#h_test = super(ROOT.RooDataSet,ws.data("obsData")).createHistogram("h_test", ws.var("obs_x_SR"), ROOT.RooFit.Binning(n_bins, range_low, range_high))
h_test = hfbench.roodataset_to_hist("h_test", ws.var("obs_x_SR"), ws.data("obsData"))

c_test = ROOT.TCanvas()
h_test.Draw()
c_test.Draw()









    



<ROOT.RooDataSet object ("obsData") at 0x6737d8c>

Plot the channel (signal sample and data) together



In [20]:

    
model_config = ws.obj("ModelConfig")

hist_factory_nav = ROOT.RooStats.HistFactory.HistFactoryNavigation(model_config)
ROOT.SetOwnership(hist_factory_nav, False)

# draw the histogram stack
stack = hist_factory_nav.GetChannelStack("SR")
channel_data = hist_factory_nav.GetDataHist(ws.data("obsData"),"SR","channel_data")
channel_data.Sumw2(False)
channel_data.Sumw2()
sig_hist = hist_factory_nav.GetSampleHist("SR", "signal", "sig_hist")

fit_frame = ws.var("obs_x_SR").frame()
model.plotOn(fit_frame, ROOT.RooFit.Normalization(n_events), ROOT.RooFit.LineStyle(1)) #kDashed = 2

c1 = ROOT.TCanvas()
stack.Draw()
channel_data.Draw("SAME")
fit_frame.Draw("SAME")
leg = ROOT.TLegend(0.65, 0.65, 0.9, 0.9)
for hist in stack.GetHists():
    samplename = hist.GetName().rsplit('_')[0]
    leg.AddEntry(hist, samplename,"f")
leg.AddEntry(channel_data, "channel data","lep")
#leg.AddEntry(fit_frame, "fit to data","lep")
leg.AddEntry("", "fit to data","lep")
leg.Draw()
c1.Draw()



In [21]:

    
# To avoid a segfault at kerenel exit
canvas_list = [c, c_test, c1]
for canvas in canvas_list:
    canvas.IsA().Destructor(canvas)

References

RooFit/RooStats tutorials for INFN School of Statistics 2013
Vince Croft's Jupyter Notebooks on RooFit and HistFactory
Building HistFactory models using C++ and Python
RooStats::HistFactory::Measurement Class Reference
Baak, et. al, HistFitter software framework for statistical data analysis
Lukas Heinrich, Plans for HistFactory integration in HEPData, "(Re)interpreting the results of new physics searches at the LHC" workshop, June 2016