We include ROOT, DataFrame and PyTreeReader class. DataFrame uses PyTreeReader for filling histograms and filtering results. All of the computing is mostly done by using PyTreeReader inside the DataFrame Class. Clearly this will be done in a better way now that the usage of PyTreeReader in ROOT is still unknown. PyTreeReader can be found from https://github.com/dpiparo/pytreereader
In [1]:
import ROOT
from PyTreeReader import PyTreeReader
from functional import DataFrame
from ROOT import TFile
This is to get a tree from test data called cernstaff.root
In [2]:
testFile = TFile('cernstaff.root')
testTree = testFile.Get('T')
Here we create the DataFrame object
In [3]:
dataFrame = DataFrame(testTree)
As you can see, it also creates a PyTreeReader. This is why PyTreeReader is mandatory for the class
In [4]:
%%time
dataFrame.filter(lambda e : e.Children() > 4).head(5)
In [8]:
dataFrame.resetcache()
Out[8]:
In [9]:
%%time
dataFrame.filter(lambda e : e.Children() > 4).cache().head(5)
In [11]:
%%time
dataFrame.filter(lambda e : e.Children() > 4).cache().filter(lambda e : e.Age() < 47).head(5)
There is some caching with the files in the Swan service, but the point is that first and second run differ alot with their speed
In [12]:
dataFrame.resetcache()
Out[12]:
In [14]:
%%time
dataFrame.filter(lambda e : e.Age() > 45).cache().histo('Age:Cost').Draw('COLZ')
ROOT.gPad.Draw()
Rerun the same analysis, compare the time
In [15]:
%%time
dataFrame.filter(lambda e : e.Age() > 45).cache().histo('Age:Cost').Draw('COLZ')
ROOT.gPad.Draw()
Lets add one more filter after the cache and see how it differs...
In [16]:
%%time
dataFrame.filter(lambda e : e.Age() > 45).cache().filter(lambda e: e.Cost() > 8500).histo('Age:Cost').Draw('COLZ')
ROOT.gPad.Draw()
This is the first implementation of the class and functional chains.
Usability can be improved with adding more and more transformations and actions to
A lot can be achieved with seizable performance improvements with using the PyTreeReader.
However, there are some minor flaws in the class:
In [ ]: