PyTreeReader: Looping on TTrees in Python, fast.


The PyTreeReader class solves the problem of looping in a performant way on TTrees in Python. This is achieved just in time compiling a C++ class tailored to the branches the user wants to read and interfacing it conveniently to Python. Usability is a key. The high usability of the old way of looping on trees in Python is preserved, implying for the user as little overhead as possible.

Preparation

We include ROOT and the PyTreeReader class. Clearly this will have to be provided more conviniently, e.g. as part of the pythonisations PyROOT already provides for TTree.


In [1]:
import ROOT
from PyTreeReader import PyTreeReader


Welcome to JupyROOT 6.07/07

This is to create an histogram and read a tree.


In [2]:
h = ROOT.TH1F("h","h",1024,-256,256)
fill = h.Fill

f = ROOT.TFile(ROOT.gROOT.GetTutorialsDir()+"/hsimple.root")
tree = f.ntuple

Traditional looping

Now we establish the baseline: banchmark of the old way for looping.


In [3]:
%%timeit -n 1 -r 1
for event in tree:
    fill(event.px*event.py*event.pz*event.random)


1 loop, best of 1: 725 ms per loop

Enters the PyTreeReader

This is how we use for the first time the PyTreeReader and how we benchmark it. The only difference within the loop body consists in invoking functions called as the branches rather than data members called as the branches themselves. Even for this trivial tree, the difference is impressive.

Actually, having little read, decompression and deserialisation is better: the difference in time is really due to the superior performance of the PyTreeReader.


In [4]:
%%timeit -n 1 -r 1
for event in PyTreeReader(tree):
    fill(event.px()*event.py()*event.pz()*event.random())


1 loop, best of 1: 182 ms per loop

Caching in memory the entire tree

It is possible to instruct the PyTreeReader to cache in memory the content of the tree. We'll see two benchmarks: the time needed to cache and the time needed to loop.

Caching benchmark


In [5]:
%%timeit -n 1 -r 1
ptr = PyTreeReader(tree, cache=True)


1 loop, best of 1: 165 ms per loop

In [6]:
ptr = PyTreeReader(tree, cache=True)

Loop on the cached values


In [7]:
%%timeit -n 1 -r 1
for event in ptr:
    fill(event.px()*event.py()*event.pz()*event.random())


1 loop, best of 1: 128 ms per loop

Selecting only some branches

It can be inconvenient to construct methods to access all the branches, even more to cache all of their contents by default. This is the reason why PyTreeReader provides the pattern constructor argument. For example, in order to consider only branches whose names start with "p", one can do:


In [8]:
%%timeit -n 1 -r 1
ptr = PyTreeReader(tree, cache=True, pattern="p[*]")


1 loop, best of 1: 17.7 ms per loop

Pattern is indeed the pattern to match the desired branches by name (fnmatch is used). Its default value is simple, "*";

Accessing a vector of cached values

Caching values in memory has a price but once this step has been accomplished, more flexibility is available. For example, all the values of a particular branch can be retrieved in bulk:


In [9]:
for py in ptr.py_array()[:10]: print py # Better stopping after a few of them :)


0.0118254804984
0.0378782115877
-0.0105458786711
0.0484584420919
0.509063661098
-1.8647428751
0.397898256779
0.509420633316
0.837935447693
-1.09520089626

Wondering about the type returned by py_array? A widely adopted C++ data structure contiguous in memory:


In [10]:
ptr.py_array()


Out[10]:
<ROOT.vector<float> object at 0x2e74118>

What can be done more?

This is the first implementation of the class. A lot can be achieved with seizable performance improvements with respect to the traditional way of looping. The usability is preserved.

However, some interesting features could be added:

  • If the TTree is a TNtuple, interface to Pandas. For example by getting a panda series.
  • We want to get rid of all loops. Ok, PyTreeReader speeds them up but can be used as the engine to remove them :) One could think of adding to our histogram classes a method to fill the histogram with a collection of points which is simpler than TH1::FillN (no size, just the collection).
    • One could even think about doing the same for graphs.