Zero-copy communication between C++ and Python

Numpy arrays are just C arrays wrapped with metadata in Python. Thus, we can share data between C and Python without even copying. In general, this communication is

  • not overwrite safe: Python protects against out-of-range indexes, but C/C++ does not;
  • not type safe: no guarantee that C and Python will interpret bytes in memory the same way, including endianness;
  • not thread safe: no protection at all against concurrent access.

But without much overhead, we can wrap a shared array (or collection of arrays) in two APIs— one in C++, one in Python— to provide these protections.

commonblock is a nascent library to do this. It passes array lengths and types from Python to C++ via ctypes and uses librt.so (wrapped by prwlock in Python) to implement locks that are usable on both sides.


In [1]:
import numpy
import commonblock

tracks = commonblock.NumpyCommonBlock(
    trackermu_qoverp     = numpy.zeros(1000, dtype=numpy.double),
    trackermu_qoverp_err = numpy.zeros(1000, dtype=numpy.double),
    trackermu_phi        = numpy.zeros(1000, dtype=numpy.double),
    trackermu_eta        = numpy.zeros(1000, dtype=numpy.double),
    trackermu_dxy        = numpy.zeros(1000, dtype=numpy.double),
    trackermu_dz         = numpy.zeros(1000, dtype=numpy.double),
    globalmu_qoverp      = numpy.zeros(1000, dtype=numpy.double),
    globalmu_qoverp_err  = numpy.zeros(1000, dtype=numpy.double))

hits = commonblock.NumpyCommonBlock(
    detid      = numpy.zeros(5000, dtype=numpy.uint64),
    localx     = numpy.zeros(5000, dtype=numpy.double),
    localy     = numpy.zeros(5000, dtype=numpy.double),
    localx_err = numpy.zeros(5000, dtype=numpy.double),
    localy_err = numpy.zeros(5000, dtype=numpy.double))

Using it in CMSSW

CMSSW can be executed within a Python process, thanks to Chris's PR #17236. Since the configuration language is also in Python, you can build the configuration and start CMSSW in the same Python process.

We can get our common block into CMSSW by passing its pointer as part of a ParameterSet. Since this is all one process, that pointer address is still valid when CMSSW launches.


In [2]:
import FWCore.ParameterSet.Config as cms

process = cms.Process("Demo")

process.load("FWCore.MessageService.MessageLogger_cfi")

process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(1000))

process.source = cms.Source(
    "PoolSource", fileNames = cms.untracked.vstring("file:MuAlZMuMu-2016H-002590494DA0.root"))

process.demo = cms.EDAnalyzer(
    "DemoAnalyzer",
    tracks = cms.uint64(tracks.pointer()),   # pass the arrays to C++ as a pointer
    hits   = cms.uint64(hits.pointer()))

process.p = cms.Path(process.demo)

On the C++ side

NumpyCommonBlock.h is a header-only library that defines the interface. We pick up the object by casting the pointer:

   tracksBlock = (NumpyCommonBlock*)iConfig.getParameter<unsigned long long>("tracks");
   hitsBlock   = (NumpyCommonBlock*)iConfig.getParameter<unsigned long long>("hits");

and then get safe accessors to each array with a templated method that checks C++'s compiled type against Python's runtime type.

   trackermu_qoverp     = tracksBlock->newAccessor<double>("trackermu_qoverp");
   trackermu_qoverp_err = tracksBlock->newAccessor<double>("trackermu_qoverp_err");
   trackermu_phi        = tracksBlock->newAccessor<double>("trackermu_phi");
   trackermu_eta        = tracksBlock->newAccessor<double>("trackermu_eta");
   trackermu_dxy        = tracksBlock->newAccessor<double>("trackermu_dxy");
   trackermu_dz         = tracksBlock->newAccessor<double>("trackermu_dz");
   globalmu_qoverp      = tracksBlock->newAccessor<double>("globalmu_qoverp");
   globalmu_qoverp_err  = tracksBlock->newAccessor<double>("globalmu_qoverp_err");

   detid      = hitsBlock->newAccessor<uint64_t>("detid");
   localx     = hitsBlock->newAccessor<double>("localx");
   localy     = hitsBlock->newAccessor<double>("localy");
   localx_err = hitsBlock->newAccessor<double>("localx_err");
   localy_err = hitsBlock->newAccessor<double>("localy_err");

Running CMSSW

Chris's PythonEventProcessor.run() method blocks, so I put it in a thread to let CMSSW and Python run at the same time.

I had to release the GIL with PR #18683 to make this work, and that feature will work its way into releases eventually.


In [3]:
import threading
import libFWCorePythonFramework
import libFWCorePythonParameterSet

class CMSSWThread(threading.Thread):
    def __init__(self, process):
        super(CMSSWThread, self).__init__()
        self.process = process

    def run(self):
        processDesc = libFWCorePythonParameterSet.ProcessDesc()
        self.process.fillProcessDesc(processDesc.pset())

        cppProcessor = libFWCorePythonFramework.PythonEventProcessor(processDesc)
        cppProcessor.run()

Demonstration

In this demo, I loop over AlCaZMuMu muons and fill the arrays with track parameters (before and after adding muon hits to the fit) and display them as Pandas DataFrames as soon as they're full (before CMSSW finishes).

The idea is that one would stream data from CMSSW into some Python thing in large blocks (1000 tracks/5000 hits at a time in this example).

Bi-directional communication is possible, but I don't know what it could be used for.


In [4]:
cmsswThread = CMSSWThread(process)
cmsswThread.start()

tracks.wait(1)   # CMSSW notifies that it has filled the tracks array
tracks.pandas()


Out[4]:
globalmu_qoverp globalmu_qoverp_err trackermu_dxy trackermu_dz trackermu_eta trackermu_phi trackermu_qoverp trackermu_qoverp_err
0 -0.010567 0.000176 -0.068008 5.044248 1.345596 -2.698669 -0.010630 0.000154
1 0.006895 0.000151 0.054830 4.506279 1.833096 0.569517 0.006876 0.000148
2 0.019243 0.000203 0.049390 -1.901605 0.103585 0.658894 0.019275 0.000230
3 -0.023387 0.000218 -0.095338 -1.904487 -0.249766 -2.945127 -0.023375 0.000224
4 -0.004139 0.000135 0.116625 -8.595935 -1.881879 -0.706054 -0.004042 0.000107
5 0.015591 0.000239 -0.066035 -8.682809 -1.252449 -2.672938 0.015586 0.000239
6 0.037974 0.000364 0.032850 -0.111745 -0.257494 -1.806014 0.037933 0.000375
7 -0.009422 0.000226 0.009904 -0.572965 2.108640 0.986081 -0.009398 0.000228
8 0.018069 0.000190 0.119466 -1.273930 0.374561 -0.530472 0.018046 0.000195
9 -0.022957 0.000241 -0.111564 -1.262620 0.426754 2.919330 -0.022932 0.000246
10 0.010840 0.000186 -0.060381 8.158009 1.290994 -2.586959 0.010848 0.000205
11 -0.020447 0.000201 0.116348 7.963290 0.320124 -0.273392 -0.020432 0.000231
12 0.015693 0.000163 -0.021478 0.218570 0.654600 1.256718 0.015729 0.000161
13 -0.015592 0.000306 -0.005073 0.464836 1.073493 -2.113072 -0.016101 0.000235
14 0.009935 0.000172 0.089779 -7.842827 1.467717 -1.240792 0.009917 0.000180
15 -0.022529 0.000216 -0.088659 -8.054338 0.559999 1.905623 -0.022532 0.000246
16 -0.013605 0.000238 0.059533 -5.468162 -1.721838 0.558813 -0.013599 0.000238
17 0.035048 0.000375 -0.087184 -5.721601 0.530253 -2.885311 0.034984 0.000367
18 0.016558 0.000278 0.085424 -8.886459 -1.426541 -1.280423 0.016702 0.000287
19 -0.028493 0.000283 -0.086348 -8.803782 0.918577 1.748048 -0.028482 0.000310
20 -0.036947 0.000348 0.118730 5.219053 0.260863 -0.292811 -0.036946 0.000356
21 0.010269 0.000228 -0.101764 4.986706 -2.079050 -3.091943 0.010254 0.000262
22 0.005757 0.000149 0.082549 3.768002 2.071252 0.314001 0.005749 0.000150
23 -0.015071 0.000245 -0.071253 4.271885 1.184207 -2.715034 -0.015120 0.000230
24 0.021482 0.000215 0.029551 6.644899 0.339417 -1.842905 0.021482 0.000222
25 -0.018125 0.000317 -0.035147 6.936323 -1.742731 1.361164 -0.018122 0.000331
26 0.005912 0.000114 0.028987 -1.408655 -1.540506 -1.824010 0.005888 0.000112
27 -0.004827 0.000108 -0.116181 -1.256072 -2.060348 2.897079 -0.004877 0.000130
28 -0.009855 0.000187 -0.111878 -0.973026 1.779793 2.173956 -0.009821 0.000182
29 0.036934 0.000287 0.117619 -0.828305 -0.105899 -0.749916 0.036988 0.000316
... ... ... ... ... ... ... ... ...
970 0.009206 0.000296 -0.059400 0.395068 2.347421 1.570078 0.009215 0.000325
971 -0.059931 0.000963 0.032175 0.828887 -1.087645 -1.837424 -0.058968 0.000838
972 0.004374 0.000128 -0.012923 -2.343376 2.226633 1.142010 0.004423 0.000133
973 -0.005360 0.000149 -0.001796 -1.326975 2.108116 -2.045045 -0.005353 0.000153
974 0.003408 0.000124 -0.030739 1.371909 2.244316 1.323628 0.003418 0.000129
975 -0.017310 0.000292 -0.066125 2.067206 1.211065 -2.633213 -0.017213 0.000290
976 -0.018864 0.000197 0.051302 0.453377 0.323696 -1.599724 -0.018843 0.000207
977 0.019193 0.000187 -0.031497 0.325793 0.647963 1.349336 0.019190 0.000185
978 -0.014932 0.000180 0.016256 0.196339 -0.573654 -1.934022 -0.014935 0.000192
979 0.028336 0.000252 -0.027164 0.278182 -0.090457 1.289689 0.028368 0.000259
980 -0.018804 0.000189 -0.005167 -4.677931 0.235975 -2.216133 -0.018764 0.000202
981 0.028439 0.000303 -0.036017 -4.630476 -0.614698 1.345122 0.028418 0.000335
982 -0.019854 0.000172 0.091969 -2.437110 0.048532 -1.183622 -0.019909 0.000189
983 0.018293 0.000341 -0.103844 -2.532153 1.057075 2.105705 0.018580 0.000309
984 -0.025145 0.000279 0.119201 2.486174 0.029890 -0.487306 -0.025090 0.000280
985 0.014813 0.000251 -0.123532 2.472967 -1.611795 2.635718 0.014803 0.000261
986 0.029102 0.000281 0.092381 -2.665610 -0.566803 0.157805 0.029085 0.000281
987 -0.014338 0.000245 -0.054387 -2.441701 1.695919 -2.574079 -0.014351 0.000264
988 0.004974 0.000120 -0.112054 -2.261815 -1.986994 2.255399 0.004967 0.000122
989 -0.032564 0.000328 0.019351 -2.475638 -0.466075 -1.919595 -0.032567 0.000332
990 -0.016973 0.000177 -0.120753 1.165448 0.612474 2.725368 -0.016959 0.000182
991 0.021225 0.000202 0.119872 1.141800 0.619098 -0.516619 0.021255 0.000225
992 0.003270 0.000134 0.094847 -9.621575 2.326024 -1.165964 0.003250 0.000129
993 -0.052976 0.000433 -0.121242 -10.006914 0.318874 2.445030 -0.052954 0.000435
994 0.007059 0.000363 -0.117198 3.398886 -1.971428 2.348829 0.005861 0.000209
995 0.054462 0.000440 0.066841 3.227137 0.606553 0.445651 0.054662 0.000537
996 -0.006828 0.000133 0.119212 -2.403269 -1.802110 -0.225948 -0.006804 0.000128
997 0.007302 0.000185 -0.087562 -2.763320 -1.855936 -2.901717 0.007285 0.000190
998 0.005690 0.000187 -0.089730 -7.122958 2.327386 2.004788 0.005674 0.000196
999 -0.026393 0.000244 0.083845 -6.713851 0.544398 -1.325145 -0.026395 0.000244

1000 rows × 8 columns


In [5]:
hits.pandas()


Out[5]:
detid localx localx_err localy localy_err
0 604018352 -14.661780 0.000339 8.819818 0.141031
1 637612422 15.698437 1.642616 0.000000 200.085754
2 637579686 16.581907 2.337946 0.000000 317.243591
3 604021936 -15.579300 0.000682 -75.140610 0.308281
4 604026032 17.155981 0.000775 -1.725272 0.355754
5 637645258 -17.506124 2.727962 0.000000 304.015594
6 604030128 18.839199 0.000826 47.635704 0.334478
7 637612522 19.774876 3.734178 0.000000 480.071472
8 604017696 4.622838 0.000176 9.101110 0.141533
9 604021264 27.893734 0.000855 21.453674 0.048724
10 604029456 -34.848495 0.006552 73.947083 0.436826
11 637644902 -4.864298 1.115486 0.000000 191.200974
12 637632661 1.144447 1.512373 0.000000 1306.253296
13 637634709 -8.583336 1.736142 0.000000 1306.253296
14 637634741 -66.428574 2.516252 0.000000 1306.253296
15 637636821 -40.523811 3.584546 0.000000 1306.253296
16 637636853 51.562500 4.911988 0.000000 1306.253296
17 637567765 69.811119 1.512373 0.000000 1138.800903
18 637571921 -63.428574 3.584546 0.000000 1138.800903
19 637571953 -2.062500 4.911988 0.000000 1138.800903
20 604050696 -1.569143 0.000092 7.911637 0.138749
21 604058248 -21.679737 0.000622 32.137383 0.048726
22 604062344 -24.531935 0.000574 49.761459 0.033840
23 604051112 -24.160847 0.000662 66.750565 0.168120
24 637575556 -23.787813 2.267907 0.000000 238.525208
25 637575588 24.235094 2.816015 0.000000 317.246429
26 604054696 -26.346043 0.001318 -10.343246 0.355760
27 604058792 28.434036 0.001474 41.254997 0.391294
28 637608392 -28.014406 4.042261 0.000000 480.073517
29 604062888 30.674231 0.002239 97.117676 0.622246
... ... ... ... ... ...
4970 637616586 18.126968 3.645632 0.000000 480.070892
4971 604030136 -21.122742 0.000730 137.415466 0.355813
4972 604025368 -13.811366 0.000459 -24.786297 0.048720
4973 604029464 -14.701804 0.000441 -12.107164 0.033837
4974 637566993 -3.433331 1.512373 0.000000 1138.800903
4975 637569041 9.809521 1.736142 0.000000 1138.800903
4976 637599793 -60.622223 2.191935 0.000000 525.363403
4977 637634609 69.380951 2.516252 0.000000 1306.253296
4978 637636689 37.000000 3.584546 0.000000 1306.253296
4979 637636721 -57.750000 4.911988 0.000000 1306.253296
4980 637600313 -75.777771 2.191935 0.000000 525.363403
4981 637635129 81.190475 2.516252 0.000000 1306.253296
4982 637633113 -59.904762 3.584546 0.000000 1306.253296
4983 637633309 78.966667 1.512373 0.000000 1306.253296
4984 637641130 43.765308 4.606113 0.000000 304.030762
4985 604021928 -50.614330 0.004653 65.216415 0.445777
4986 604026024 55.193279 0.003115 123.115700 0.355785
4987 604026016 -59.897228 0.004034 139.919006 0.355805
4988 637575626 -56.585171 5.306306 0.000000 224.493042
4989 604050992 -21.435236 0.000544 65.873123 0.168077
4990 637587460 21.409033 2.267907 0.000000 238.525208
4991 604054576 -27.498405 0.001344 -7.830546 0.355649
4992 604058672 32.534130 0.001379 76.730690 0.355798
4993 637620296 -31.310219 4.283083 0.000000 480.075073
4994 604062768 35.145294 0.001449 134.050430 0.355801
4995 637587560 34.681236 4.495598 0.000000 224.478928
4996 604051024 -22.841208 0.000670 -46.986721 0.107425
4997 637644932 22.577343 3.722775 0.000000 525.368042
4998 637644964 25.294344 2.123088 0.000000 191.206726
4999 604054608 -26.345749 0.001978 -141.827637 0.355711

5000 rows × 5 columns


In [7]:
%matplotlib inline

tracks.pandas().plot.hist()


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0f1415d950>

In [21]:
df = hits.pandas()

df[numpy.abs(df.localy) > 0].plot.hexbin(x="localx", y="localy", gridsize=25)


Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0ef6669810>

Higher-level interfaces

This example uses a notify/wait mechanism like basic Java. (The EDAnalyzer says notify(1) when the arrays are full and the Python code above blocks with wait(1).)

This is lock-level programming and is easy to get wrong. Some common use-patterns, like streaming data out of CMSSW in the above example, would be better served by canned classes that look like a Python generator that yields a block of data while CMSSW fills the next. Also, I need to modernize my C++.

Possible uses

  • Running machine learning algorithms from Scikit-Learn et al in an EDAnalyzer (for a user's analysis) or an EDProducer (jet substructure, track identification, etc., though I'd be wary of using such an exotic method in production).
  • Applying last-minute jet energy corrections in Oli & Matteo's analysis on Spark without rewriting those algorithms for the JVM. CMSSW would have to be installed on the Spark cluster (analytix.cern.ch).
  • Alignment and calibrations! When I was last involved in AlCa, each group effectively invented their own map-reduce framework because alignment and calibration jobs have this form. In alignment, the mapping phase consists of refitting tracks with a new geometry, which must be done in CMSSW, and the reducing phase consists of collecting residuals by DetId, fitting, and updating the global geometry. If a TTreeCache can keep the input tracks and hits in memory on a cluster of CMSSW worker nodes, AlCa algorithms could see the factor-of-100 speed up from avoiding disk in fitting iterations.

In [ ]: