order: An introduction

In this example we get to know the most important classes of order and how they are related to describe your analysis and all external data. We will set up a simple but scalable example analysis that involves most of the API. For more info, see the full API documentation.

Classes and Relations

Name Purpose
Analysis Represents the central object of a physics analysis.
Campaign Provides data of a well-defined range of data-taking, detector alignment, MC settings, datasets, etc.
Config Holds analysis information related to a campaign instance (most configuration happens here!).
Dataset Definition of a dataset, produced for / measured in a campaign.
Process Phyiscs process with cross sections for multiple center-of-mass energies, labels, etc.
Channel Analysis channel, often defined by a particular decay resulting in distinct final state objects.
Category Category definition, (optionally) within the phase-space of an analysis channel.
Variable Generic variable description providing expression and selection statements, titles, binning, etc.
Shift Represents a systematic shift with a name, direction and type.

Relations between these classes are the glue that hold an analysis together. If you have ever performed a HEP analysis, they might look pretty familiar to you.

AnalysisCampaignConfig

  1. An analysis is not limited to a single campaign (e.g. for combining results across several data-taking periods or even experiments).
  2. A campaign is independent of analyses it is used in. In general, it could be defined externally / centrally.
  3. An analysis stores campaign-related data in config objects.
  4. An analysis can store multiple config objects that are related to the same campaign.

Campaign, ConfigDataset

  1. A campaign can contain all datasets that were recorded / produced for its era and settings.
  2. A config contains a subset of its campaign's datasets, depending on what is required in its analysis.
  3. A dataset belongs to a campaign and since a config is distinctly assigned to a campaign, a dataset is also related to a config.

DatasetProcess

  1. A dataset contains physics processes.
  2. A process can be contained in multiple datasets.
  3. Processes can have child and parent processes.

ChannelCategory

  1. A category describes a sub-phase-space of a channel, therefore, it belongs to a channel and channels have categories.
  2. Channels can have child channels and a parent channel.
  3. Categories can have child and parent categories.

ConfigChannel, Variable, Shift

  1. A config has channels.
  2. A config has variables.
  3. A config has shifts.

Example Analysis

In this example, we define a toy $t\bar{t}H$ analysis with a signal dataset, a $t\bar{t}$ background and real data.

Imports

In [1]:
import order as od
import scinum as sn
General, Analysis-unrelated Setup

Define a campaign, its datasets and link processes. This could be done externally or even via importing a centrally maintained repository.


In [2]:
# campaign
c_2017 = od.Campaign("2017_13Tev_25ns", 1, ecm=13, bx=25)

# processes
p_data = od.Process("data", 1,
    is_data=True,
    label="data",
)
p_ttH = od.Process("ttH", 2,
    label=r"$t\bar{t}H$",
    xsecs={
        13: sn.Number(0.5071, {"scale": (sn.Number.REL, 0.058, 0.092)}),
    },
)
p_tt = od.Process("tt", 3,
    label=r"$t\bar{t}$",
    xsecs={
        13: sn.Number(831.76, {"scale": (19.77, 29.20)}),
    },
)

# datasets
d_data = od.Dataset("data", 1,
    campaign=c_2017,
    is_data=True,
    n_files=100,
    n_events=200000,
    keys=["/data/2017.../AOD"],
)
d_ttH = od.Dataset("ttH", 2,
    campaign=c_2017,
    n_files=50,
    n_events=100000,
    keys=["/ttH_powheg.../.../AOD"],
)
d_tt = od.Dataset("tt", 3,
    campaign=c_2017,
    n_files=500,
    n_events=87654321,
    keys=["/tt_powheg.../.../AOD"],
)
d_WW = od.Dataset("WW", 4,
    campaign=c_2017,
    n_files=100,
    n_events=54321,
    keys=["/WW_madgraph.../.../AOD"],
)

# link processes to datasets
d_data.add_process(p_data)
d_ttH.add_process(p_ttH)
d_tt.add_process(p_tt)
print([len(d.processes) for d in [d_data, d_ttH, d_tt]])


[1, 1, 1]

Task: Get the cross section of the process in the ttH dataset at the energy of its campaign.


In [3]:
d_ttH.get_process("ttH").get_xsec(d_ttH.campaign.ecm)


Out[3]:
$0.5071\;^{+0.0294118}_{-0.0466532}\;\left(\text{scale}\right)$
Analysis Setup

Now, define the analysis object and create a config for the 2017_13Tev_25ns campaign:


In [4]:
ana = od.Analysis("ttH", 1)

# create a config by passing the campaign, so id and name will be identical
cfg = ana.add_config(c_2017)


Add processes we're interested in and datasets that we want to use:


In [5]:
# add processes manually
cfg.add_process(p_data)
cfg.add_process(p_ttH)
cfg.add_process(p_tt)

# add datasets in a loop
for name in ["data", "ttH", "tt"]:
    cfg.add_dataset(c_2017.get_dataset(name))

Task: Get the mean number of events per file in the ttH dataset.


In [6]:
cfg.get_dataset("ttH").n_events / float(cfg.get_dataset("ttH").n_files)


Out[6]:
2000.0


Define channels and categories:


In [7]:
ch_bb = cfg.add_channel("ttH_bb", 1)
cat_5j = ch_bb.add_category("eq5j",
    label="5 jets",
    selection="n_jets == 5",
)
cat_6j = ch_bb.add_category("ge6j",
    label=r"$\geq$ 6 jets",
    selection="n_jets >= 6",
)

# divide the 6j category further
cat_6j_3b = cat_6j.add_category("ge6j_eq3b",
    label=r"$\geq$ 6 jets, 3 b-tags",
    selection="n_jets >= 6 && n_btags == 3",
)
cat_6j_4b = cat_6j.add_category("ge6j_ge4b",
    label=r"$\geq$ 6 jets, $\geq$ 4 b-tags",
    selection="n_jets >= 6 && n_btags >= 4",
)

Task: Get the ROOT-latex label of the 6j4b category by using only the config.


In [8]:
cfg.get_channel("ttH_bb").get_category("ge6j_ge4b", deep=True).label_root


Out[8]:
'#geq 6 jets, #geq 4 b-tags'


Systematic shifts we're going to study:


In [9]:
cfg.add_shift("nominal", 1)
cfg.add_shift("lumi_up", 2, type="rate")
cfg.add_shift("lumi_down", 3, type="rate")
cfg.add_shift("scale_up", 4, type="shape")
cfg.add_shift("scale_down", 5, type="shape")
print(len(cfg.shifts))


5

Task: Determine all shift objects starting wiht the source of the scale_down shift.


In [10]:
shifts = [s for s in cfg.shifts if s.source == "scale"]
print(shifts)


[<Shift at 0x10b8d2050, name=scale_up, id=4, context=shift>, <Shift at 0x10b8c7b50, name=scale_down, id=5, context=shift>]


Add some variables that we want to project via ROOT trees (or numpy arrays / pandas dataframes with numexpr).


In [11]:
cfg.add_variable("jet1_pt",
    expression="Reco__jet1__pt",
    binning=(25, 0., 500,),
    unit="GeV",
    x_title=r"Leading jet $p_{T}$",
)
cfg.add_variable("jet1_px",
    expression="Reco__jet1__pt * cos(Reco__jet1__phi)",
    binning=(25, 0., 500,),
    unit="GeV",
    x_title=r"Leading jet $p_{x}$",
)
print(len(cfg.variables))


2

Task: Get the full ROOT histogram title (i.e. + axis labels) of the jet1_px variable.


In [12]:
cfg.get_variable("jet1_px").get_full_title(root=True)


Out[12]:
'jet1_px;Leading jet p_{x} / GeV;Entries / 20.0 GeV'


Add "soft" information as auxiliary data.


In [13]:
cfg.set_aux("lumi", 40.)
cfg.set_aux(("globalTag", "data"), "80X_dataRun2...")
cfg.set_aux(("globalTag", "mc"), "80X_mcRun2...")
print(len(cfg.aux))


3

Task: Get the MC global tag.


In [14]:
print(cfg.get_aux(("globalTag", "mc")))


80X_mcRun2...


Now, we can start to use the analysis objects in a "framework" ...