Introduction to Deep Learning with Keras

Michela Paganini - Yale University

High Energy Phenomenology, Experiment and Cosmology Seminar Series

This is the second of two notebooks for this tutorial. If you are unfamiliar with data handling techniques in a Python environment, see the previous notebook.

What is Keras? Why Using Keras?

Modular, powerful and intuitive Deep Learning Python library built on

and and
- Let you call C++ efficiently while using syntax of higher level programming language
- Offer arbitrary tensor function definition, symbolic graph building, autodifferentiation, optimization, parallelism
Keras doesn’t itself handle any tensor ops; it relies on these tensor manipulation libraries
Connects with both of them via the abstract Keras backend
Provides you with high level API, simple neural network building blocks
Now integral part of TensorFlow via `tf.keras` module
Model and data parallelism across different devices

Other Properties of Keras

Developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

https://keras.io

Minimalist, user-friendly interface
Extremely well documented, lots of working examples
Very shallow learning curve $\rightarrow$ by far one of the best tools for both beginners and experts
Open-source, developed and maintained by a community of contributors, and publicly hosted on GitHub
Extensible: possibility to customize layers

From the Keras website:

Load and Plot Data



In [1]:

    
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
%matplotlib inline

Let's load the utility functions we defined in the previous notebook.



In [2]:

    
import glob
from root_numpy import root2array
from numpy.lib.recfunctions import stack_arrays

def root2pandas(files_path, tree_name, **kwargs):
    '''
    Args:
    -----
        files_path: a string like './data/*.root', for example
        tree_name: a string like 'Collection_Tree' corresponding to the name of the folder inside the root 
                   file that we want to open
        kwargs: arguments taken by root2array, such as branches to consider, start, stop, step, etc
    Returns:
    --------    
        output_panda: a pandas dataframe like allbkg_df in which all the info from the root file will be stored
    
    Note:
    -----
        if you are working with .root files that contain different branches, you might have to mask your data
        in that case, return pd.DataFrame(ss.data)
    '''
    # -- create list of .root files to process
    files = glob.glob(files_path)
    
    # -- process ntuples into rec arrays
    ss = stack_arrays([root2array(fpath, tree_name, **kwargs).view(np.recarray) for fpath in files])

    try:
        return pd.DataFrame(ss)
    except Exception:
        return pd.DataFrame(ss.data)
    
    
def flatten(column):
    '''
    Args:
    -----
        column: a column of a pandas df whose entries are lists (or regular entries -- in which case nothing is done)
                e.g.: my_df['some_variable'] 

    Returns:
    --------    
        flattened out version of the column. 

        For example, it will turn:
        [1791, 2719, 1891]
        [1717, 1, 0, 171, 9181, 537, 12]
        [82, 11]
        ...
        into:
        1791, 2719, 1891, 1717, 1, 0, 171, 9181, 537, 12, 82, 11, ...
    '''
    try:
        return np.array([v for e in column for v in e])
    except (TypeError, ValueError):
        return column









    














    



Welcome to JupyROOT 6.08/02

Load all data in memory into dataframes.



In [3]:

    
# MC signal:
ttbar = root2pandas('files/ttbar.root', 'events') 
# MC backgrounds:
dy = root2pandas('files/dy.root', 'events')
wj = root2pandas('files/wjets.root', 'events')
ww = root2pandas('files/ww.root', 'events')
wz = root2pandas('files/wz.root', 'events')
zz = root2pandas('files/zz.root', 'events')
singletop = root2pandas('files/single_top.root', 'events')
qcd = root2pandas('files/qcd.root', 'events')
# data:
data = root2pandas('files/data.root', 'events')

All samples except from the $t\bar{t}$ one are produced using muon trigger. To enforce that on the $t\bar{t}$ sample as well, we use the branch called triggerIsoMu24 which contains a boolean to indicate if any event would have passed the specified trigger.



In [4]:

    
# -- step by step:
ttbar['triggerIsoMu24']









    Out[4]:





0        False
1        False
2        False
3        False
4        False
5        False
6        False
7         True
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16        True
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24        True
25       False
26       False
27       False
28       False
29       False
         ...  
36911    False
36912    False
36913    False
36914    False
36915    False
36916    False
36917    False
36918    False
36919    False
36920    False
36921    False
36922    False
36923    False
36924     True
36925    False
36926    False
36927    False
36928    False
36929    False
36930    False
36931    False
36932     True
36933    False
36934    False
36935    False
36936     True
36937    False
36938    False
36939    False
36940    False
Name: triggerIsoMu24, dtype: bool



In [5]:

    
ttbar[ttbar['triggerIsoMu24']] # slicing doesn't automatically reset the indices









    Out[5]:






  
    
      
      NJet
      Jet_Px
      Jet_Py
      Jet_Pz
      Jet_E
      Jet_btag
      Jet_ID
      NMuon
      Muon_Px
      Muon_Py
      ...
      MClepton_px
      MClepton_py
      MClepton_pz
      MCleptonPDGid
      MCneutrino_px
      MCneutrino_py
      MCneutrino_pz
      NPrimaryVertices
      triggerIsoMu24
      EventWeight
    
  
  
    
      7
      3
      [-39.2862, 49.3453, -7.69329]
      [-32.3978, 5.15671, 41.4265]
      [-65.3803, -95.7182, -70.6679]
      [83.5105, 108.047, 82.8404]
      [2.42398, -1.0, -1.0]
      [True, True, True]
      1
      [-34.0692]
      [5.64295]
      ...
      -34.540741
      5.717676
      -10.324351
      -13
      43.943020
      4.711743
      -16.245531
      4
      True
      0.279545
    
    
      16
      5
      [184.591, 10.8253, -72.3689, -43.3258, 2.57552]
      [117.452, -166.958, 48.44, -5.91005, 42.4344]
      [379.501, -51.8267, 452.426, 33.3998, -77.0341]
      [438.532, 176.045, 461.069, 56.1953, 88.4542]
      [-1.0, 3.20327, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [42.3929]
      [-35.9437]
      ...
      43.464474
      -37.058472
      -30.771671
      13
      -21.166794
      19.246643
      -13.815475
      4
      True
      0.279545
    
    
      24
      5
      [-92.04, 29.9482, 24.2262, 42.4436, -18.0844]
      [51.4753, -84.1564, 48.5667, -8.55752, 24.7327]
      [-52.9665, -181.678, -10.7411, -31.6566, -43.3...
      [121.717, 203.098, 57.0088, 53.9551, 53.2984]
      [-1.0, -1.0, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [26.0595]
      [0.139715]
      ...
      27.888340
      -2.613286
      -114.308418
      13
      -43.849094
      -32.715321
      -46.295307
      8
      True
      0.299037
    
    
      37
      3
      [92.6861, -21.7365, 21.6693]
      [-126.292, 52.302, 31.9107]
      [88.5404, -12.5433, -18.5118]
      [180.864, 58.5981, 43.3167]
      [2.59038, 2.73967, -1.0]
      [True, True, True]
      1
      [-13.8492]
      [38.0815]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      1
      True
      0.270251
    
    
      40
      3
      [-97.5345, -19.9276, 2.89748]
      [-13.185, -80.3989, 30.9127]
      [-11.907, 51.528, -91.2712]
      [101.636, 98.2916, 96.7967]
      [2.89632, -1.0, -1.0]
      [True, True, True]
      1
      [72.6027]
      [-9.59597]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      16
      True
      0.010469
    
    
      51
      4
      [13.2568, 84.261, -4.9345, -33.4614]
      [188.902, -129.347, -121.279, -62.2332]
      [-67.8184, 185.424, 164.392, 53.9148]
      [202.167, 242.673, 205.04, 89.4054]
      [2.31161, -1.0, -1.0, 3.40863]
      [True, True, True, True]
      2
      [-41.887, 8.97538]
      [108.296, 21.3534]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      6
      True
      0.293529
    
    
      57
      4
      [86.7032, -84.3351, -60.7989, 38.2517]
      [0.745445, -0.972272, 49.1331, -22.5668]
      [-77.6728, -19.5303, 3.48934, 48.336]
      [117.088, 87.3901, 78.8732, 66.4349]
      [1.94137, -1.0, -1.0, -1.0]
      [True, True, True, True]
      0
      []
      []
      ...
      -51.339859
      47.421997
      9.806538
      -13
      7.199954
      33.037640
      64.709595
      13
      True
      0.026791
    
    
      58
      0
      []
      []
      []
      []
      []
      []
      1
      [-73.5007]
      [47.405]
      ...
      -77.774635
      49.710968
      -163.584732
      13
      -14.959000
      -17.021935
      -129.785049
      20
      True
      0.010469
    
    
      68
      3
      [-72.5721, -37.3589, 54.3353]
      [-21.6138, 44.3861, -11.165]
      [-203.388, -141.191, -148.702]
      [217.305, 152.979, 159.159]
      [1.64367, 2.30578, -1.0]
      [True, True, True]
      1
      [-1.18855]
      [-36.1914]
      ...
      -1.134651
      -35.692688
      -109.194305
      13
      67.564972
      -8.594560
      -78.256340
      6
      True
      0.286201
    
    
      80
      3
      [394.21, -66.0473, 8.33537]
      [-174.289, 96.3359, -45.8379]
      [-363.242, 1.3422, -32.5724]
      [574.341, 117.572, 57.576]
      [2.80352, 1.69599, -1.0]
      [True, True, True]
      1
      [-66.4606]
      [-1.64239]
      ...
      -100.356377
      -2.219039
      -10.412981
      -13
      -246.632401
      112.364296
      -81.893837
      3
      True
      0.271386
    
    
      86
      4
      [65.0804, 47.7328, -41.3639, -15.2447]
      [-84.0002, 73.0816, -21.3092, 30.1185]
      [-19.6535, -286.044, -80.5942, -45.1493]
      [109.014, 299.209, 93.831, 56.4601]
      [3.79264, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [9.05585]
      [-24.4455]
      ...
      8.528934
      -24.427439
      27.513771
      13
      -51.809479
      8.691632
      -20.814299
      11
      True
      0.105165
    
    
      96
      2
      [-40.122, 27.6997]
      [-44.2169, -41.4156]
      [10.1575, 29.0914]
      [62.1224, 58.2074]
      [-1.0, 1.66511]
      [True, True]
      1
      [42.9513]
      [45.6999]
      ...
      44.989742
      46.907940
      50.970924
      -13
      13.318195
      -31.906128
      19.553530
      10
      True
      0.299981
    
    
      107
      5
      [-100.62, 32.0011, -35.6917, 28.3667, 3.8716]
      [-4.6241, -39.0739, 26.2937, -25.9119, 34.3113]
      [-135.39, 38.0203, -23.802, -58.4347, 29.4247]
      [171.725, 64.9216, 51.6265, 70.5236, 45.8437]
      [-1.0, -1.0, -1.0, 3.47004, -1.0]
      [True, True, True, True, True]
      1
      [32.2751]
      [-59.3023]
      ...
      33.404510
      -60.193768
      -54.171921
      13
      44.509659
      19.568098
      -44.442898
      1
      True
      0.270251
    
    
      122
      3
      [10.4771, -43.1103, 23.1228]
      [65.1101, -9.28895, 33.4949]
      [-64.5278, 20.2662, 20.8267]
      [92.8245, 48.7273, 46.165]
      [4.1136, -1.0, -1.0]
      [True, True, True]
      1
      [-0.298645]
      [-52.4379]
      ...
      -2.104305
      -95.161293
      -309.347382
      -15
      -8.769296
      -14.520910
      4.469729
      6
      True
      0.286201
    
    
      133
      3
      [-104.634, 48.1258, 12.1684]
      [29.6702, -66.15, 34.2625]
      [-72.2929, -229.216, 13.3559]
      [131.668, 243.703, 38.9503]
      [3.45501, -1.0, -1.0]
      [True, True, True]
      0
      []
      []
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      1
      True
      0.270251
    
    
      144
      3
      [-75.713, 58.5136, 31.0808]
      [44.9549, -38.7253, -41.9073]
      [-195.356, -125.271, -35.4589]
      [214.638, 144.113, 64.1693]
      [2.9908, 3.62441, -1.0]
      [True, True, True]
      1
      [-45.2115]
      [5.31683]
      ...
      -44.851936
      5.279506
      -0.475743
      13
      20.658106
      36.965542
      -45.506985
      6
      True
      0.286201
    
    
      148
      4
      [-113.872, -32.332, 34.4798, 10.6671]
      [-23.9376, 35.5097, -6.87441, 30.7233]
      [87.0612, 45.5352, 72.4027, -44.7416]
      [145.824, 66.5828, 80.8721, 55.7227]
      [2.06658, -1.0, -1.0, 3.2202]
      [True, True, True, True]
      1
      [74.4426]
      [-47.3361]
      ...
      74.646675
      -48.276752
      -21.981726
      13
      -14.983271
      -19.674046
      -33.766380
      19
      True
      0.014366
    
    
      157
      2
      [39.0372, 18.8909]
      [56.7757, -41.463]
      [-154.183, -153.292]
      [169.354, 160.027]
      [-1.0, 2.21366]
      [True, True]
      1
      [-81.4154]
      [11.1185]
      ...
      -88.916168
      11.958924
      -100.715332
      -13
      -8.342011
      -37.894085
      -57.389412
      5
      True
      0.279545
    
    
      168
      4
      [-23.9239, -6.59758, -14.6238, 45.4713]
      [60.4995, -51.068, 46.1813, -10.2181]
      [57.153, -172.568, -82.0874, -18.6692]
      [87.7235, 180.256, 95.6667, 50.63]
      [3.08056, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [42.5798]
      [-37.9202]
      ...
      42.677971
      -38.684910
      25.826906
      13
      -35.230225
      -32.083885
      29.348593
      4
      True
      0.274432
    
    
      171
      2
      [-18.0533, 45.1819]
      [75.6471, -25.322]
      [-59.945, -22.2292]
      [98.8365, 57.0231]
      [1.58192, -1.0]
      [True, True]
      1
      [-2.2919]
      [-55.9058]
      ...
      -2.266898
      -56.387188
      84.049629
      13
      17.545727
      17.956202
      95.806236
      3
      True
      0.279545
    
    
      175
      2
      [-62.7099, -16.6717]
      [62.0686, -68.4154]
      [-56.106, -164.618]
      [105.655, 179.547]
      [-1.0, -1.0]
      [True, True]
      1
      [52.0206]
      [100.328]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      3
      True
      0.293529
    
    
      188
      2
      [10.1792, -26.8712]
      [55.9482, -16.998]
      [-89.4134, 72.4999]
      [106.531, 79.6631]
      [-1.0, 2.40431]
      [True, True]
      1
      [-65.1655]
      [22.1073]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      6
      True
      0.293529
    
    
      194
      3
      [69.7044, 28.6426, -30.5756]
      [-21.1823, 21.4084, -5.88853]
      [189.136, 125.371, 28.3306]
      [203.069, 130.501, 42.8366]
      [2.25507, -1.0, -1.0]
      [True, True, True]
      1
      [-84.0036]
      [-38.9238]
      ...
      -79.938377
      -35.873989
      -74.092720
      13
      -14.628411
      30.902468
      -51.598186
      5
      True
      0.293529
    
    
      197
      4
      [-81.0861, 35.7697, 45.4883, -7.59584]
      [-22.679, -41.042, 24.9598, 37.9333]
      [22.1584, -50.1099, -306.429, -61.9009]
      [87.7846, 74.7338, 311.244, 73.3563]
      [4.60365, 1.62578, -1.0, -1.0]
      [True, True, True, True]
      1
      [-21.7847]
      [10.4468]
      ...
      -21.555775
      10.569125
      16.874752
      13
      66.121849
      17.503078
      1.514331
      5
      True
      0.286201
    
    
      203
      4
      [-92.6663, -29.3145, 43.617, 22.991]
      [-9.88577, -39.3474, 0.0992902, 32.9985]
      [-19.9208, 24.1042, -28.9692, 73.4268]
      [96.1515, 55.6889, 53.1124, 84.3405]
      [2.15808, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [51.9452]
      [-34.679]
      ...
      51.943840
      -35.272839
      -15.022160
      13
      -14.474733
      10.930140
      20.586962
      11
      True
      0.160826
    
    
      204
      1
      [21.5621]
      [-30.0244]
      [160.531]
      [164.844]
      [-1.0]
      [True]
      2
      [109.452, -9.8738]
      [191.047, -12.5724]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      4
      True
      0.274432
    
    
      219
      2
      [55.6175, 32.9164]
      [39.0514, -11.1956]
      [-44.6516, -31.1788]
      [82.3264, 47.1459]
      [2.18723, -1.0]
      [True, True]
      1
      [-17.0189]
      [47.6267]
      ...
      -17.056755
      47.598736
      0.132445
      -13
      -18.238882
      4.040609
      71.326790
      4
      True
      0.279545
    
    
      223
      2
      [-63.6374, 21.9639]
      [58.9319, -21.1155]
      [131.927, 79.1002]
      [159.269, 85.0185]
      [-1.0, -1.0]
      [True, True]
      1
      [39.4852]
      [-28.8416]
      ...
      40.519344
      -29.519676
      67.463295
      -13
      2.028334
      13.116418
      -16.692287
      25
      True
      0.001839
    
    
      235
      1
      [-22.13]
      [-44.9609]
      [-56.7319]
      [77.0054]
      [-1.0]
      [True]
      1
      [18.7418]
      [37.8549]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      15
      True
      0.005580
    
    
      268
      3
      [40.9751, -26.4305, 8.81603]
      [0.197465, 19.2683, -30.2287]
      [-10.3833, 126.664, 35.724]
      [42.9108, 130.983, 47.8053]
      [-1.0, 1.85155, -1.0]
      [True, True, True]
      0
      []
      []
      ...
      -53.865688
      5.485102
      147.813019
      13
      6.192799
      -14.861897
      285.897095
      7
      True
      0.293529
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      36671
      4
      [63.6192, 15.5392, -24.603, -22.424]
      [46.3087, -54.6324, 33.719, 33.788]
      [-201.403, 121.437, -185.224, -41.4796]
      [216.771, 134.415, 189.95, 59.4113]
      [-1.0, 2.30456, -1.0, -1.0]
      [True, True, True, True]
      1
      [44.2237]
      [1.19632]
      ...
      44.393093
      4.441749
      17.649075
      13
      -37.079338
      4.005792
      22.262280
      13
      True
      0.192474
    
    
      36674
      3
      [-63.3877, -42.8766, 42.7456]
      [16.5414, 9.93516, 3.20814]
      [57.6559, 93.1895, 27.8355]
      [87.8452, 103.46, 51.6133]
      [2.389, -1.0, -1.0]
      [True, True, True]
      1
      [-19.4274]
      [-29.8691]
      ...
      -19.507946
      -30.081053
      47.349102
      -13
      57.827797
      -55.537346
      26.413647
      16
      True
      0.105165
    
    
      36678
      4
      [-42.7893, 41.4222, -12.0665, 17.7019]
      [-36.8382, -15.6533, 37.0771, 30.5058]
      [-41.9049, -4.82216, 5.23354, -70.8694]
      [70.8273, 45.4714, 39.5967, 79.481]
      [3.14142, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [-1.03951]
      [-37.0587]
      ...
      -0.284460
      -37.510612
      -173.577927
      -13
      -17.592842
      -28.825563
      -12.680113
      11
      True
      0.192474
    
    
      36691
      4
      [-58.3385, -18.4484, 15.4408, 37.792]
      [-4.25548, -43.2476, 42.527, 3.44773]
      [145.612, 57.8691, 70.6434, -58.7726]
      [157.184, 75.1443, 84.1751, 70.7867]
      [-1.0, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [-45.0365]
      [-3.44792]
      ...
      -46.260303
      -3.330531
      -12.245513
      13
      34.181591
      14.543386
      6.494123
      4
      True
      0.279545
    
    
      36697
      0
      []
      []
      []
      []
      []
      []
      2
      [72.8115, -32.1987]
      [-41.4184, 46.7349]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      15
      True
      0.036131
    
    
      36698
      2
      [49.845, -38.1155]
      [-31.4604, -12.8079]
      [-105.783, -222.021]
      [121.478, 225.912]
      [-1.0, -1.0]
      [True, True]
      1
      [39.8958]
      [-8.11494]
      ...
      41.240803
      -6.950862
      -118.234520
      -13
      -24.029903
      32.869717
      -56.536091
      5
      True
      0.274432
    
    
      36704
      3
      [44.5244, -22.9694, -5.49336]
      [52.0617, -22.2976, -29.943]
      [-123.699, 45.3537, -114.566]
      [141.7, 56.1422, 118.853]
      [-1.0, -1.0, -1.0]
      [True, True, True]
      1
      [-18.3659]
      [38.4539]
      ...
      -19.019670
      38.604202
      -15.343031
      -13
      -85.043907
      6.205180
      -118.760750
      8
      True
      0.253713
    
    
      36705
      4
      [95.7703, -91.2611, -22.8162, 11.3628]
      [-61.2077, -23.8256, -90.0745, 70.7234]
      [14.1595, 200.652, 21.1822, 282.059]
      [116.274, 222.271, 96.4095, 291.297]
      [-1.0, -1.0, -1.0, -1.0]
      [True, True, True, True]
      1
      [-68.7636]
      [41.4596]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      10
      True
      0.293189
    
    
      36723
      2
      [-29.0196, 40.0986]
      [-28.8848, -6.73161]
      [-10.89, 10.5535]
      [43.0001, 42.402]
      [1.72804, -1.0]
      [True, True]
      1
      [48.2104]
      [10.7573]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      3
      True
      0.271386
    
    
      36728
      4
      [-37.2348, -19.6443, 35.2389, 36.7954]
      [-58.4847, 53.7423, 23.7087, 0.172772]
      [-140.538, -103.277, -81.6349, -79.4282]
      [157.469, 118.497, 92.416, 87.6949]
      [-1.0, -1.0, -1.0, -1.0]
      [True, True, True, True]
      0
      []
      []
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      5
      True
      0.293529
    
    
      36733
      2
      [90.0657, -63.0919]
      [57.5107, -48.7488]
      [-176.365, 13.9594]
      [207.504, 81.852]
      [2.16326, 3.64163]
      [True, True]
      1
      [29.2887]
      [-13.7026]
      ...
      29.956720
      -13.935602
      5.496824
      13
      -3.975806
      18.813091
      100.464325
      4
      True
      0.293529
    
    
      36761
      5
      [-72.7327, 87.8201, -48.6379, 35.3672, -29.3188]
      [-71.4104, 25.7807, -24.6549, 37.8814, 26.399]
      [-25.5412, -27.7219, 69.2121, 166.603, -101.946]
      [105.677, 96.5188, 88.2745, 174.643, 109.666]
      [2.64415, -1.0, 2.74785, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [-15.0587]
      [-19.3267]
      ...
      -14.784895
      -19.563482
      -41.428085
      13
      44.433407
      46.769634
      -128.488770
      9
      True
      0.277205
    
    
      36773
      2
      [42.3535, 7.84763]
      [7.22512, -38.6475]
      [24.7581, 90.8231]
      [50.1142, 99.2029]
      [2.67679, -1.0]
      [True, True]
      1
      [-72.876]
      [53.7722]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      5
      True
      0.286201
    
    
      36777
      5
      [-99.6067, -3.34446, 32.1012, 42.7883, -11.6452]
      [-25.3014, 92.4234, -32.9203, 0.946451, 36.2387]
      [155.932, 174.57, 38.0382, 25.5367, 18.2656]
      [187.385, 198.147, 60.2857, 50.5626, 42.798]
      [-1.0, 3.07208, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [23.5196]
      [-33.3833]
      ...
      19.755816
      -44.092232
      172.988556
      -13
      28.066628
      -17.915775
      9.881391
      6
      True
      0.293529
    
    
      36779
      3
      [-163.892, -104.229, 43.5479]
      [15.7904, -1.47729, -1.82685]
      [-387.744, -61.9309, -63.0709]
      [423.49, 123.1, 77.3157]
      [-1.0, -1.0, 2.4724]
      [True, True, True]
      1
      [204.758]
      [3.03201]
      ...
      194.055786
      2.588275
      31.950975
      -13
      36.918297
      -33.213650
      -13.392416
      17
      True
      0.014366
    
    
      36845
      2
      [56.3493, 19.3332]
      [-62.0922, 23.8897]
      [98.1019, 8.61856]
      [129.654, 32.2253]
      [2.90254, -1.0]
      [True, True]
      1
      [-58.4214]
      [49.5135]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      10
      True
      0.277205
    
    
      36846
      2
      [63.5405, 61.2992]
      [33.7838, 6.70778]
      [-10.9178, 69.2736]
      [73.4685, 93.2766]
      [-1.0, -1.0]
      [True, True]
      1
      [-79.0116]
      [-19.7364]
      ...
      -77.586594
      -19.407608
      -66.658394
      13
      -72.814438
      -36.851547
      17.604248
      6
      True
      0.293529
    
    
      36854
      6
      [23.5022, -29.0738, 27.8149, 27.4214, 0.959874...
      [-79.9664, -53.5415, 44.6099, 26.0418, 33.4664...
      [7.103, 22.0331, -156.058, -22.6882, -58.1608,...
      [85.9293, 65.432, 165.279, 44.604, 67.6031, 13...
      [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0]
      [True, True, True, True, True, True]
      1
      [-32.9304]
      [34.6853]
      ...
      -32.763885
      34.414585
      -20.146509
      13
      -34.493790
      9.358992
      -145.859955
      19
      True
      0.026791
    
    
      36862
      4
      [65.2048, 26.1962, -27.163, 38.1059]
      [-77.5005, 70.3165, -45.9601, 33.931]
      [-55.8269, -3.97152, 95.8268, 287.091]
      [117.233, 75.8283, 110.272, 291.82]
      [2.6823, 2.13792, -1.0, -1.0]
      [True, True, True, True]
      0
      []
      []
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      9
      True
      0.277205
    
    
      36864
      4
      [-105.143, 96.585, -95.1253, 80.7612]
      [-153.554, 39.1784, 37.6713, -11.9996]
      [-238.466, -25.2841, -9.86635, -101.42]
      [304.335, 108.426, 103.746, 130.728]
      [-1.0, 4.3339, 4.28816, -1.0]
      [True, True, True, True]
      0
      []
      []
      ...
      18.866938
      -15.149240
      7.708319
      -13
      1.578786
      85.071571
      34.158749
      11
      True
      0.160826
    
    
      36873
      4
      [237.71, -193.887, 35.7849, 28.4387]
      [-69.9921, 0.191406, 13.5153, -21.4827]
      [-152.786, 436.524, -0.336736, 2.15682]
      [293.461, 478.051, 39.3145, 36.544]
      [-1.0, 3.04585, -1.0, 3.54336]
      [True, True, True, True]
      1
      [-45.9498]
      [59.5273]
      ...
      -48.707134
      60.437092
      219.109711
      -13
      -32.495464
      14.442889
      15.860429
      9
      True
      0.293189
    
    
      36884
      3
      [68.6953, -68.5364, 46.7583]
      [22.5236, 14.8628, -42.646]
      [15.618, -29.9445, 35.7689]
      [75.5679, 77.1019, 73.1977]
      [2.38775, 2.44255, 1.78135]
      [True, True, True]
      1
      [-10.2928]
      [25.1898]
      ...
      -10.396090
      25.239723
      -11.126865
      13
      -42.048183
      -10.150770
      74.172813
      3
      True
      0.270077
    
    
      36888
      2
      [-99.3072, -29.3348]
      [49.2617, -29.6785]
      [155.757, -6.78339]
      [191.845, 42.8853]
      [-1.0, 4.18899]
      [True, True]
      1
      [114.711]
      [-47.2675]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      8
      True
      0.299981
    
    
      36892
      4
      [-158.273, -67.7991, -33.143, -1.02856]
      [201.345, -24.0705, -49.9765, -31.1794]
      [-56.5885, 108.054, -133.396, -40.5998]
      [270.567, 130.381, 146.749, 51.5353]
      [-1.0, 4.01285, -1.0, -1.0]
      [True, True, True, True]
      1
      [-101.12]
      [-152.441]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      9
      True
      0.277205
    
    
      36897
      3
      [-107.69, -46.7154, -15.9445]
      [-24.167, 8.60087, -26.171]
      [-6.4825, 41.4792, 96.2556]
      [111.979, 63.7637, 101.178]
      [2.38118, -1.0, -1.0]
      [True, True, True]
      1
      [137.716]
      [21.6976]
      ...
      146.659790
      22.631247
      36.031197
      13
      51.945667
      -31.295113
      -22.553812
      9
      True
      0.277205
    
    
      36900
      4
      [36.9388, 27.8715, 35.8173, -23.6561]
      [35.4823, 34.4244, -21.416, -19.9391]
      [-53.2023, -18.4092, -60.2866, 21.9757]
      [74.1006, 48.2303, 74.2057, 38.408]
      [2.69591, 2.68174, -1.0, -1.0]
      [True, True, True, True]
      1
      [-31.3944]
      [30.6773]
      ...
      -31.754948
      30.867546
      0.585443
      -13
      -73.327507
      -50.762230
      -15.318438
      5
      True
      0.271386
    
    
      36906
      3
      [60.7559, -6.67839, 10.8504]
      [-23.811, -47.7918, -33.6265]
      [78.5075, 165.349, 5.05383]
      [103.611, 172.752, 36.5233]
      [-1.0, 1.40379, 4.02371]
      [True, True, True]
      1
      [-78.0497]
      [14.8932]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      19
      True
      0.036131
    
    
      36924
      1
      [44.4301]
      [22.9213]
      [-53.3913]
      [74.282]
      [-1.0]
      [True]
      1
      [14.2536]
      [-34.5261]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      15
      True
      0.105165
    
    
      36932
      2
      [-189.224, 161.017]
      [289.557, -116.198]
      [-246.186, -161.373]
      [425.525, 257.883]
      [-1.0, -1.0]
      [True, True]
      1
      [-23.1623]
      [-215.169]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      5
      True
      0.293529
    
    
      36936
      4
      [130.752, -88.6566, 52.0555, -38.6537]
      [-25.4033, 73.8251, -74.2651, -30.6417]
      [-120.235, -287.902, -124.879, -128.15]
      [180.098, 310.715, 154.604, 137.425]
      [-1.0, 2.17192, -1.0, -1.0]
      [True, True, True, True]
      1
      [-44.8889]
      [42.0714]
      ...
      -42.990280
      40.279636
      -78.633194
      -13
      13.631536
      -7.189492
      9.089453
      12
      True
      0.019638
    
  

4515 rows × 51 columns



In [6]:

    
ttbar = ttbar[ttbar['triggerIsoMu24']].reset_index(drop=True)



In [8]:

    
ttbar.head()









    Out[8]:






  
    
      
      NJet
      Jet_Px
      Jet_Py
      Jet_Pz
      Jet_E
      Jet_btag
      Jet_ID
      NMuon
      Muon_Px
      Muon_Py
      ...
      MClepton_px
      MClepton_py
      MClepton_pz
      MCleptonPDGid
      MCneutrino_px
      MCneutrino_py
      MCneutrino_pz
      NPrimaryVertices
      triggerIsoMu24
      EventWeight
    
  
  
    
      0
      3
      [-39.2862, 49.3453, -7.69329]
      [-32.3978, 5.15671, 41.4265]
      [-65.3803, -95.7182, -70.6679]
      [83.5105, 108.047, 82.8404]
      [2.42398, -1.0, -1.0]
      [True, True, True]
      1
      [-34.0692]
      [5.64295]
      ...
      -34.540741
      5.717676
      -10.324351
      -13
      43.943020
      4.711743
      -16.245531
      4
      True
      0.279545
    
    
      1
      5
      [184.591, 10.8253, -72.3689, -43.3258, 2.57552]
      [117.452, -166.958, 48.44, -5.91005, 42.4344]
      [379.501, -51.8267, 452.426, 33.3998, -77.0341]
      [438.532, 176.045, 461.069, 56.1953, 88.4542]
      [-1.0, 3.20327, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [42.3929]
      [-35.9437]
      ...
      43.464474
      -37.058472
      -30.771671
      13
      -21.166794
      19.246643
      -13.815475
      4
      True
      0.279545
    
    
      2
      5
      [-92.04, 29.9482, 24.2262, 42.4436, -18.0844]
      [51.4753, -84.1564, 48.5667, -8.55752, 24.7327]
      [-52.9665, -181.678, -10.7411, -31.6566, -43.3...
      [121.717, 203.098, 57.0088, 53.9551, 53.2984]
      [-1.0, -1.0, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      1
      [26.0595]
      [0.139715]
      ...
      27.888340
      -2.613286
      -114.308418
      13
      -43.849094
      -32.715321
      -46.295307
      8
      True
      0.299037
    
    
      3
      3
      [92.6861, -21.7365, 21.6693]
      [-126.292, 52.302, 31.9107]
      [88.5404, -12.5433, -18.5118]
      [180.864, 58.5981, 43.3167]
      [2.59038, 2.73967, -1.0]
      [True, True, True]
      1
      [-13.8492]
      [38.0815]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      1
      True
      0.270251
    
    
      4
      3
      [-97.5345, -19.9276, 2.89748]
      [-13.185, -80.3989, 30.9127]
      [-11.907, 51.528, -91.2712]
      [101.636, 98.2916, 96.7967]
      [2.89632, -1.0, -1.0]
      [True, True, True]
      1
      [72.6027]
      [-9.59597]
      ...
      0.000000
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      16
      True
      0.010469
    
  

5 rows × 51 columns

Let's say we want to start by training a simple model that only relies on event-level variables. The ones available in these samples are:



In [9]:

    
# names of event-level branches
npart = ['NJet', 'NMuon', 'NElectron', 'NPhoton', 'MET_px', 'MET_py']

If you are a physicist, the first thing you might want to do is to plot them. We do so very easily using `matplotlib`:



In [10]:

    
for key in npart: # loop through the event-level branches and plot them on separate histograms
    
    # -- set font and canvas size (optional)
    matplotlib.rcParams.update({'font.size': 16})
    fig = plt.figure(figsize=(8,8), dpi=100)
    
    # -- declare common binning strategy (otherwise every histogram will have its own binning)
    bins = np.linspace(min(ttbar[key]), max(ttbar[key]) + 1, 30)
    
    # plot!
    _ = plt.hist(ttbar[key], histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$', linewidth=2)
    _ = plt.hist(dy[key], histtype='step', normed=False, bins=bins, weights=dy['EventWeight'], label='Drell Yan')
    _ = plt.hist(wj[key], histtype='step', normed=False, bins=bins, weights=wj['EventWeight'], label=r'$W$ + jets')
    _ = plt.hist(ww[key], histtype='step', normed=False, bins=bins, weights=ww['EventWeight'], label=r'$WW$')
    _ = plt.hist(wz[key], histtype='step', normed=False, bins=bins, weights=wz['EventWeight'], label=r'$WZ$')
    _ = plt.hist(zz[key], histtype='step', normed=False, bins=bins, weights=zz['EventWeight'], label=r'$ZZ$')
    _ = plt.hist(singletop[key], histtype='step', normed=False, bins=bins, weights=singletop['EventWeight'], label=r'single $t$')
    _ = plt.hist(qcd[key], histtype='step', normed=False, bins=bins, weights=qcd['EventWeight'], label='QCD', color='salmon')
    
    plt.xlabel(key)
    plt.yscale('log')
    plt.legend(loc='best')
    plt.show()

Stack all the backgrounds (shown only for one branch here):



In [11]:

    
import matplotlib.cm as cm
colors = cm.cool(np.linspace(0, 1, 7))



In [12]:

    
plt.figure(figsize=(7,7))
bins = np.linspace(0, 10, 11)
_ = plt.hist([
                dy['NJet'], wj['NJet'], ww['NJet'], wz['NJet'], zz['NJet'], singletop['NJet'], qcd['NJet']
             ],
             histtype='stepfilled',
             normed=False,
             bins=bins,
             weights=[
                 dy['EventWeight'], wj['EventWeight'], ww['EventWeight'], wz['EventWeight'], zz['EventWeight'], singletop['EventWeight'], qcd['EventWeight']
             ],
             label=[ r'Drell Yan', r'$W$ + jets', r'$WW$', r'$WZ$', r'$ZZ$', r'single $t$', 'QCD'
             ],
             stacked=True,
             color=colors)

plt.hist(ttbar['NJet'],
         histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$',
         linewidth=2, color='black', linestyle='dashed')

plt.yscale('log')
plt.xlabel('Number of Jets')
plt.legend()









    Out[12]:





<matplotlib.legend.Legend at 0x122178c10>

Plot 2D histogram of $t\bar{t}$ MET_px and MET_py:



In [13]:

    
from matplotlib.colors import LogNorm

plt.figure(figsize=(5,5)) # square canvas
_ = plt.hist2d(ttbar['MET_px'], ttbar['MET_py'], bins=40, cmap='binary', norm=LogNorm())

# add decorations
plt.xlabel(r'MET$_{p_x}$') # LaTeX!
plt.ylabel(r'MET$_{p_y}$', fontsize=20)
_ = plt.colorbar()

Turn Data into ML-Compatible Inputs

What are ML-Compatible Inputs?

In the simplest scenarios of tabular data, Keras, just like scikit-learn, takes as inputs the following objects:

$X$
an ndarray of dimensions [nb_examples, nb_features] containing the distributions to be used as inputs to the model. Each row is an object to classify, each column corresponds to a specific variable.
$y$
an array of dimensions [nb_examples] containing the truth labels indicating the class each object belongs to (for classification), or the continuous target values (for regression).
$w$
(optional) an array of dimensions [nb_examples] containing the weights to be assigned to each example

The indices of these objects must map to the same examples. In general, you will want to shuffle and split them into a training and test set.

Forward Pass

If we want to stack multiple dataframes into a single one, we can "concatenate" them. To simplify our classification problem, in this tutorial I will only focus on a three-class classification task, in which we aim to separate $t\bar{t}$ events from two of the main background sources: Drell Yan and W+jets events.



In [14]:

    
del ww, wz, zz, singletop, qcd, data



In [15]:

    
# -- this will only contain TTbar, Drell Yan and W+jets events (all branches)
df_full = pd.concat((ttbar, dy, wj), ignore_index=True)

However, we decided we were only going to train on event-level variables, so this is would be a more useful df:



In [16]:

    
# recall: npart was the list of branch names corresponding to event-level variables
df =  pd.concat((ttbar[npart], dy[npart], wj[npart]), ignore_index=True)
df.head()

Now, turn your new df the desired ndarray $X$ that can be directly used for ML applications using this handy pandas function:



In [17]:

    
X = df.as_matrix()



In [18]:

    
type(X)









    Out[18]:





numpy.ndarray



In [19]:

    
X.shape









    Out[19]:





(191981, 6)

The weight array $w$ can also easily be extracted using a similar procedure:



In [20]:

    
w =  pd.concat((ttbar['EventWeight'], dy['EventWeight'], wj['EventWeight']), ignore_index=True).values



In [22]:

    
type(w)









    Out[22]:





numpy.ndarray

Finally, generate an array of truth labels $y$ to distinguish among the different classes in the problem:



In [23]:

    
y = []
for _df, ID in [(ttbar, 0), (dy, 1), (wj, 2)]:
    y.extend([ID] * _df.shape[0])
y = np.array(y)



In [24]:

    
# -- check what we created
y









    Out[24]:





array([0, 0, 0, ..., 2, 2, 2])

Extra Pre-Processing Steps: Shuffling, Splitting into Train-Test, Scaling Inputs

The sklearn function `train_test_split` will randomly shuffle and automatically split all your objects into train and test subsets.



In [25]:

    
ix = range(X.shape[0]) # array of indices, just to keep track of them for safety reasons and future checks
X_train, X_test, y_train, y_test, w_train, w_test, ix_train, ix_test = train_test_split(X, y, w, ix, train_size=0.6)









    



/Users/mp744/venvs/keras2tf1/lib/python2.7/site-packages/sklearn/model_selection/_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
  FutureWarning)

It is common practice to scale the inputs to Neural Nets such that they have approximately similar ranges. Without this step, you might end up with variables whose values span very different orders of magnitude. This will create problems in the NN convergence due to very wild fluctuations in the magnitude of the internal weights. To take care of the scaling, we use the sklearn StandardScaler:



In [26]:

    
from sklearn.preprocessing import StandardScaler



In [27]:

    
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Training a NN using Keras

Neural Netwroks

Stack of tensor operators
Series of linear and non-linear transformations with the goal of finding optimal parameters to transform inputs and approximate targets
Deep nets > shallow nets

1. Multilayer Perceptron (MLP)

Dense

Densely connected layer, where all inputs are connected to all outputs
Core layer of an MLP
Linear transformation of the input vector $x \in \mathbb{R}^n$, which can be expressed using the $n \times m$ matrix $W \in \mathbb{R}^{n \times m}$ as:
$u = Wx + b$ where $b \in \mathbb{R}^m$ is the bias unit
All entries in both $W$ and $b$ are trainable

In Keras:

keras.layers.Dense(
                units,
                activation=None,
                use_bias=True,
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros', kernel_regularizer=None,
                bias_regularizer=None,
                activity_regularizer=None,
                kernel_constraint=None,
                bias_constraint=None
)

input_dim (or input_shape) are necessary arguments for the 1st layer of the net:

# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))

Activation Functions

Mathematical way of quantifying the activation state of a node $\rightarrow$ whether it's firing or not
Non-linear activation functions are the key to Deep Learning
Allow NNs to learn complex, non-linear transformations of the inputs
Some popular choices: Available activations: softmax, elu, selu, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear. Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, and can be found in the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.

Regularization

Series of methods to avoid overfitting
Mathematical encouragement towards simpler models
Explicitly penalize weights that get too large
Two main categories:
In Keras, Dropout is added in as a layer. It masks the outputs of the previous layer such that some of them will randomly become inactive and will not contribute to information propagation:
```
keras.layers.Dropout(rate, noise_shape=None, seed=None)
```

Build a simple Keras classifier



In [28]:

    
from keras.models import Model
from keras.layers import Dense, Dropout, Input









    



Using TensorFlow backend.
2017-11-14 23:55:27.843006: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843031: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843040: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.



In [29]:

    
inputs = Input(shape=(X_train.shape[1], )) # placeholder

hidden = Dense(10, activation='relu')(inputs)
hidden = Dropout(0.2)(hidden)
hidden = Dense(20, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
hidden = Dense(30, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
outputs = Dense(3, activation='softmax')(hidden)
# last layer has to have the same dimensionality as the number of classes we want to predict, here 3

model = Model(inputs, outputs)



In [30]:

    
model.summary()









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 6)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                70        
_________________________________________________________________
dropout_1 (Dropout)          (None, 10)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 20)                220       
_________________________________________________________________
dropout_2 (Dropout)          (None, 20)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 30)                630       
_________________________________________________________________
dropout_3 (Dropout)          (None, 30)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 93        
=================================================================
Total params: 1,013
Trainable params: 1,013
Non-trainable params: 0
_________________________________________________________________



In [31]:

    
from keras.utils.vis_utils import plot_model
plot_model(model, 'temp.png', show_shapes=True)

Satisfied with your net?

Objective / Loss / Cost

Mathematical way of quantifying how much ŷ deviates from y
Dictates how strongly we penalize certain types of mistakes
Cost of inaccurately classifying an event (“cost function”)

Optimizers

(animations from A. Radford)

Now you need to declare what loss function and optimizer to use. We pass these as arguments to compile:



In [26]:

    
model.compile('adam', 'sparse_categorical_crossentropy')

Training



In [27]:

    
from keras.callbacks import EarlyStopping, ModelCheckpoint



In [28]:

    
from collections import Counter
Counter(y_test)
# uneven classes









    Out[28]:





Counter({0: 1796, 1: 31146, 2: 43851})



In [31]:

    
print 'Training:'
try:
    model.fit(
        X_train, y_train, class_weight={ # rebalance class representation
            0 : 0.20 * (float(len(y)) / (y == 0).sum()),
            1 : 0.40 * (float(len(y)) / (y == 1).sum()),
            2 : 0.40 * (float(len(y)) / (y == 2).sum())
        },
        callbacks = [
            EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
            ModelCheckpoint('./models/tutorial-progress.h5', monitor='val_loss', verbose=True, save_best_only=True)
        ],
        epochs=20, 
        validation_split = 0.2,
        verbose=True
) 
except KeyboardInterrupt:
    print 'Training ended early.'









    



Training:
Train on 92150 samples, validate on 23038 samples
Epoch 1/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4397Epoch 00001: val_loss improved from inf to 0.42322, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4398 - val_loss: 0.4232
Epoch 2/20
92032/92150 [============================>.] - ETA: 0s - loss: 0.4388Epoch 00002: val_loss improved from 0.42322 to 0.41837, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4388 - val_loss: 0.4184
Epoch 3/20
91680/92150 [============================>.] - ETA: 0s - loss: 0.4400Epoch 00003: val_loss did not improve
92150/92150 [==============================] - 8s 88us/step - loss: 0.4401 - val_loss: 0.4207
Epoch 4/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4416Epoch 00004: val_loss improved from 0.41837 to 0.41834, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4416 - val_loss: 0.4183
Epoch 5/20
91648/92150 [============================>.] - ETA: 0s - loss: 0.4420Epoch 00005: val_loss improved from 0.41834 to 0.41505, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 87us/step - loss: 0.4429 - val_loss: 0.4150
Epoch 6/20
91680/92150 [============================>.] - ETA: 0s - loss: 0.4401Epoch 00006: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4405 - val_loss: 0.4218
Epoch 7/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4383Epoch 00007: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4383 - val_loss: 0.4208
Epoch 8/20
91552/92150 [============================>.] - ETA: 0s - loss: 0.4384Epoch 00008: val_loss did not improve
92150/92150 [==============================] - 8s 85us/step - loss: 0.4386 - val_loss: 0.4232
Epoch 9/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4391Epoch 00009: val_loss did not improve
92150/92150 [==============================] - 8s 87us/step - loss: 0.4391 - val_loss: 0.4194
Epoch 10/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4384Epoch 00010: val_loss improved from 0.41505 to 0.41338, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 87us/step - loss: 0.4383 - val_loss: 0.4134
Epoch 11/20
91936/92150 [============================>.] - ETA: 0s - loss: 0.4409Epoch 00011: val_loss did not improve
92150/92150 [==============================] - 8s 87us/step - loss: 0.4407 - val_loss: 0.4186
Epoch 12/20
92032/92150 [============================>.] - ETA: 0s - loss: 0.4404Epoch 00012: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4403 - val_loss: 0.4164
Epoch 13/20
92128/92150 [============================>.] - ETA: 0s - loss: 0.4405Epoch 00013: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4404 - val_loss: 0.4312
Epoch 14/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4389Epoch 00014: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4390 - val_loss: 0.4165
Epoch 15/20
91840/92150 [============================>.] - ETA: 0s - loss: 0.4416Epoch 00015: val_loss improved from 0.41338 to 0.41242, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 85us/step - loss: 0.4417 - val_loss: 0.4124
Epoch 16/20
91616/92150 [============================>.] - ETA: 0s - loss: 0.4365Epoch 00016: val_loss improved from 0.41242 to 0.41008, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 85us/step - loss: 0.4366 - val_loss: 0.4101
Epoch 17/20
91744/92150 [============================>.] - ETA: 0s - loss: 0.4374Epoch 00017: val_loss did not improve
92150/92150 [==============================] - 8s 85us/step - loss: 0.4378 - val_loss: 0.4219
Epoch 18/20
91584/92150 [============================>.] - ETA: 0s - loss: 0.4395Epoch 00018: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4393 - val_loss: 0.4243
Epoch 19/20
91872/92150 [============================>.] - ETA: 0s - loss: 0.4410Epoch 00019: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4412 - val_loss: 0.4162
Epoch 20/20
91584/92150 [============================>.] - ETA: 0s - loss: 0.4349Epoch 00020: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4348 - val_loss: 0.4209



In [32]:

    
# -- load in best network
model.load_weights('./models/tutorial-progress.h5')

# -- Save network weights and structure
print 'Saving model...'
model.save_weights('./models/tutorial.h5', overwrite=True)
json_string = model.to_json()
open('./models/tutorial.json', 'w').write(json_string)
print 'Done'









    



Saving model...
Done

Testing



In [33]:

    
print 'Testing...'
yhat = model.predict(X_test, verbose = True, batch_size = 512)









    



Testing...
76793/76793 [==============================] - 0s 3us/step



In [34]:

    
# predictions
yhat









    Out[34]:





array([[  1.80225354e-04,   9.99819815e-01,   2.92606961e-09],
       [  6.79788063e-04,   2.35158458e-01,   7.64161706e-01],
       [  3.93254653e-04,   2.32965067e-01,   7.66641676e-01],
       ..., 
       [  9.69352317e-04,   2.21174046e-01,   7.77856588e-01],
       [  4.15727874e-04,   2.39787340e-01,   7.59796917e-01],
       [  2.84072390e-04,   2.67122030e-01,   7.32593954e-01]], dtype=float32)



In [35]:

    
# -- turn them into classes
yhat_cls = np.argmax(yhat, axis=1)

Visualize performance with confusion matrix:



In [36]:

    
import itertools
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')



In [37]:

    
# Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, yhat_cls, sample_weight=w_test)
np.set_printoptions(precision=2)



In [38]:

    
plot_confusion_matrix(cnf_matrix, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
                      normalize=True,
                      title='Normalized confusion matrix')









    



Normalized confusion matrix
[[ 0.94  0.03  0.03]
 [ 0.03  0.67  0.3 ]
 [ 0.03  0.05  0.92]]



In [39]:

    
# signal eff = weighted tpr --> out of all signal events, what % for we classify as signal?
print 'Signal efficiency:', w_test[(y_test == 0) & (yhat_cls == 0)].sum() / w_test[y_test == 0].sum()

# bkg eff = weighted fpr --> out of all bkg events, what % do we classify as signal?
b_eff = w_test[(y_test != 0) & (yhat_cls == 0)].sum() / w_test[y_test != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff









    



Signal efficiency: 0.941405
Background efficiency: 0.0267202
Background rej: 37.4248125521

If you want to, you could further analyze the results by, for example, looking at all events that were assigned to class 0 ($t\bar{t}$) and learn more about their physical characteristics. This could provide insight into what the network is focusing in its decision-making process.



In [42]:

    
# -- events that got assigned to class 0
predicted_ttbar = df_full.iloc[np.array(ix_test)[yhat_cls == 0]]



In [43]:

    
predicted_ttbar['true flavor'] = y_test[yhat_cls == 0]









    



/home/ubuntu/venv/micky-work/lib/python2.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.



In [44]:

    
predicted_ttbar.head()









    Out[44]:







  
    
      
      NJet
      Jet_Px
      Jet_Py
      Jet_Pz
      Jet_E
      Jet_btag
      Jet_ID
      NMuon
      Muon_Px
      Muon_Py
      ...
      MClepton_py
      MClepton_pz
      MCleptonPDGid
      MCneutrino_px
      MCneutrino_py
      MCneutrino_pz
      NPrimaryVertices
      triggerIsoMu24
      EventWeight
      true flavor
    
  
  
    
      71068
      2
      [-39.515, -32.7724]
      [-7.06055, 4.96038]
      [-59.9577, 1.15056]
      [72.4383, 33.8005]
      [-1.0, -1.0]
      [True, True]
      1
      [36.5116]
      [-27.2592]
      ...
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      7
      True
      0.540515
      1
    
    
      1249
      5
      [60.9035, -111.827, -27.5385, 52.0547, 52.3307]
      [117.735, -9.45221, -51.6361, 13.4452, 7.99291]
      [151.038, -72.7721, -35.0566, 143.468, -8.85212]
      [202.914, 135.274, 70.2694, 153.809, 55.7241]
      [-1.0, -1.0, -1.0, -1.0, -1.0]
      [True, True, True, True, True]
      0
      []
      []
      ...
      -27.402386
      -25.099463
      13
      -129.215714
      -97.106895
      43.848606
      14
      True
      0.192474
      0
    
    
      143500
      2
      [-53.283, -38.1472]
      [-76.5362, -9.61194]
      [-86.3619, 32.1003]
      [128.014, 51.698]
      [-1.0, -1.0]
      [True, True]
      1
      [39.2392]
      [16.2889]
      ...
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      10
      True
      2.393087
      2
    
    
      1186
      3
      [-17.622, -42.9345, -26.1204]
      [-59.8529, 38.5907, 16.7819]
      [-13.7983, -1.9199, 128.346]
      [64.3067, 58.404, 132.271]
      [1.52388, -1.0, -1.0]
      [True, True, True]
      2
      [63.2695, 34.3971]
      [3.4336, -23.4732]
      ...
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      9
      True
      0.299981
      0
    
    
      28224
      2
      [49.646, 34.655]
      [19.6789, -29.2292]
      [-194.703, -144.563]
      [202.0, 151.752]
      [-1.0, -1.0]
      [True, True]
      2
      [-60.5106, -35.1742]
      [37.2119, -34.9203]
      ...
      0.000000
      0.000000
      0
      0.000000
      0.000000
      0.000000
      20
      True
      0.005656
      1
    
  

5 rows × 52 columns

In a 3-class classifier, the outputs lie on a 2D triangle with vertices at (1,0,0), (0,1,0), (0,0,1).



In [45]:

    
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')
for label, color, marker in [(0, 'r', '^'), (1, 'b', 'o'), (2, 'yellow', 'x')]:
    ax.scatter(yhat[y_test == label, 0], yhat[y_test == label, 1], yhat[y_test == label, 2], c=color, marker=marker, alpha=0.01)

ax.set_xlabel(r'$P(t\bar{t})$')
ax.set_ylabel('P(Drell Yan)')
ax.set_zlabel('P(W + jets)')
ax.view_init(30, 30)
plt.show()

To do some model introspection, add intermediate hidden layers to the outputs:



In [42]:

    
# model = Model(inputs, [outputs, hidden])
# model.compile('adam', 'sparse_categorical_crossentropy') 
# model.load_weights('./models/tutorial-progress.h5')
# _, last_layer = model.predict(X_test, verbose = True, batch_size = 512)

2. Recurrent Neural Networks

Let's look at a fancier way of solving the same classification problem. In this case we will use Recurrent Neural Netwroks. These allow you to process variable length sequences of data. For example, we can use them to describe an event in terms of the properties of its jets, whose number varies event by event. We could also describe the same event using the properties of its muons, or any other particle that appears in it. Because the number of particles of each type changes in each event, we need the flexibility of RNNs to process this type of data.

We will build an event-level classifier with 4 recurrent branches for the 4 different types of particles in the event.



In [66]:

    
jetvars = [key for key in df_full.keys() if key.startswith('Jet')]
jetvars.remove('Jet_ID')
print jetvars

muonvars = [key for key in df_full.keys() if key.startswith('Muon')]
print muonvars

photonvars = [key for key in df_full.keys() if key.startswith('Photon')]
print photonvars

electronvars = [key for key in df_full.keys() if key.startswith('Electron')]
print electronvars









    



['Jet_Px', 'Jet_Py', 'Jet_Pz', 'Jet_E', 'Jet_btag']
['Muon_Px', 'Muon_Py', 'Muon_Pz', 'Muon_E', 'Muon_Charge', 'Muon_Iso']
['Photon_Px', 'Photon_Py', 'Photon_Pz', 'Photon_E', 'Photon_Iso']
['Electron_Px', 'Electron_Py', 'Electron_Pz', 'Electron_E', 'Electron_Charge', 'Electron_Iso']



In [67]:

    
df_jets = df_full[jetvars].copy()
df_electrons = df_full[electronvars].copy()
df_muons = df_full[muonvars].copy()
df_photons = df_full[photonvars].copy()



In [68]:

    
num_electrons = max([len(e) for e in df_electrons.Electron_E])
num_electrons









    Out[68]:





2



In [69]:

    
num_muons = max([len(m) for m in df_muons.Muon_E])
num_muons









    Out[69]:





3



In [70]:

    
num_photons = max([len(gam) for gam in df_photons.Photon_E])
num_photons









    Out[70]:





2



In [71]:

    
num_jets = max([len(j) for j in df_jets.Jet_E])
num_jets









    Out[71]:





9

Just for the sake of variety, you can either have class labels like (0, 1, 2, 3, ...) and train using spare_categorical_crossentropy as you loss function -- like we did before -- or, equivalently, you can have class labels like ([1, 0, 0, 0, ...], [0, 1, 0, 0, ...], ...) and train using categorical_crossentropy.



In [72]:

    
from keras.utils.np_utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_train









    Out[72]:





array([[ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       ..., 
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])



In [73]:

    
def create_stream(df, num_obj, sort_col):
   
    n_variables = df.shape[1]
    var_names = df.keys()

    data = np.zeros((df.shape[0], num_obj, n_variables), dtype='float32')

    # -- call functions to build X (a.k.a. data)                                                                                                                                                                      
    sort_objects(df, data, sort_col, num_obj)
    
    # -- ix_{train, test} from above or from previously stored ordering
    Xobj_train = data[ix_train]
    Xobj_test = data[ix_test]
    
    #print 'Scaling features ...'
    scale(Xobj_train, var_names, savevars=True) # scale training sample and save scaling
    scale(Xobj_test, var_names, savevars=False) # apply scaling to test set
    return Xobj_train, Xobj_test



In [74]:

    
def sort_objects(df, data, SORT_COL, max_nobj):
    ''' 
    sort objects using your preferred variable
    
    Args:
    -----
        df: a dataframe with event-level structure where each event is described by a sequence of jets, muons, etc.
        data: an array of shape (nb_events, nb_particles, nb_features)
        SORT_COL: a string representing the column to sort the objects by
        max_nobj: number of particles to cut off at. if >, truncate, else, -999 pad
    
    Returns:
    --------
        modifies @a data in place. Pads with -999
    
    '''
    import tqdm
    # i = event number, event = all the variables for that event 
    for i, event in tqdm.tqdm(df.iterrows(), total=df.shape[0]): 

        # objs = [[pt's], [eta's], ...] of particles for each event 
        objs = np.array(
                [v.tolist() for v in event.get_values()], 
                dtype='float32'
            )[:, (np.argsort(event[SORT_COL]))[::-1]]

        # total number of tracks per jet      
        nobjs = objs.shape[1] 

        # take all tracks unless there are more than n_tracks 
        data[i, :(min(nobjs, max_nobj)), :] = objs.T[:(min(nobjs, max_nobj)), :] 

        # default value for missing tracks 
        data[i, (min(nobjs, max_nobj)):, :  ] = -999



In [75]:

    
def scale(data, var_names, savevars, VAR_FILE_NAME='scaling.json'):
    ''' 
    Args:
    -----
        data: a numpy array of shape (nb_events, nb_particles, n_variables)
        var_names: list of keys to be used for the model
        savevars: bool -- True for training, False for testing
                  it decides whether we want to fit on data to find mean and std 
                  or if we want to use those stored in the json file 
    
    Returns:
    --------
        modifies data in place, writes out scaling dictionary
    '''
    import json
    
    scale = {}
    if savevars: 
        for v, name in enumerate(var_names):
            #print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
            f = data[:, :, v]
            slc = f[f != -999]
            m, s = slc.mean(), slc.std()
            slc -= m
            slc /= s
            data[:, :, v][f != -999] = slc.astype('float32')
            scale[name] = {'mean' : float(m), 'sd' : float(s)}
            
        with open(VAR_FILE_NAME, 'wb') as varfile:
            json.dump(scale, varfile)

    else:
        with open(VAR_FILE_NAME, 'rb') as varfile:
            varinfo = json.load(varfile)

        for v, name in enumerate(var_names):
            #print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
            f = data[:, :, v]
            slc = f[f != -999]
            m = varinfo[name]['mean']
            s = varinfo[name]['sd']
            slc -= m
            slc /= s
            data[:, :, v][f != -999] = slc.astype('float32')



In [77]:

    
Xjet_train, Xjet_test = create_stream(df_jets, num_jets, sort_col='Jet_btag')









    



100%|██████████| 191981/191981 [00:22<00:00, 8468.88it/s]



In [78]:

    
Xphoton_train, Xphoton_test = create_stream(df_photons, num_photons, sort_col='Photon_E')









    



100%|██████████| 191981/191981 [00:24<00:00, 7774.78it/s]



In [79]:

    
Xmuon_train, Xmuon_test = create_stream(df_muons, num_muons, sort_col='Muon_E')









    



100%|██████████| 191981/191981 [00:23<00:00, 8088.31it/s]



In [80]:

    
Xelectron_train, Xelectron_test = create_stream(df_electrons, num_electrons, sort_col='Electron_E')









    



100%|██████████| 191981/191981 [00:23<00:00, 8074.72it/s]



In [89]:

    
from keras.layers import Masking, GRU, concatenate



In [83]:

    
JET_SHAPE = Xjet_train.shape[1:]
MUON_SHAPE = Xmuon_train.shape[1:]
ELECTRON_SHAPE = Xelectron_train.shape[1:]
PHOTON_SHAPE = Xphoton_train.shape[1:]



In [84]:

    
jet_input = Input(JET_SHAPE)
jet_channel = Masking(mask_value=-999, name='jet_masking')(jet_input)
jet_channel = GRU(25, name='jet_gru')(jet_channel)
jet_channel = Dropout(0.3, name='jet_dropout')(jet_channel)

muon_input = Input(MUON_SHAPE)
muon_channel = Masking(mask_value=-999, name='muon_masking')(muon_input)
muon_channel = GRU(10, name='muon_gru')(muon_channel)
muon_channel = Dropout(0.3, name='muon_dropout')(muon_channel)

electron_input = Input(ELECTRON_SHAPE)
electron_channel = Masking(mask_value=-999, name='electron_masking')(electron_input)
electron_channel = GRU(10, name='electron_gru')(electron_channel)
electron_channel = Dropout(0.3, name='electron_dropout')(electron_channel)

photon_input = Input(PHOTON_SHAPE)
photon_channel = Masking(mask_value=-999, name='photon_masking')(photon_input)
photon_channel = GRU(10, name='photon_gru')(photon_channel)
photon_channel = Dropout(0.3, name='photon_dropout')(photon_channel)



In [90]:

    
combined = concatenate([jet_channel, muon_channel, electron_channel, photon_channel])
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined_outputs = Dense(3, activation='softmax')(combined)



In [91]:

    
combined_rnn = Model(inputs=[jet_input, muon_input, electron_input, photon_input], outputs=combined_outputs)
combined_rnn.summary()









    



__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            (None, 9, 5)         0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            (None, 3, 6)         0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 2, 6)         0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 2, 5)         0                                            
__________________________________________________________________________________________________
jet_masking (Masking)           (None, 9, 5)         0           input_2[0][0]                    
__________________________________________________________________________________________________
muon_masking (Masking)          (None, 3, 6)         0           input_3[0][0]                    
__________________________________________________________________________________________________
electron_masking (Masking)      (None, 2, 6)         0           input_4[0][0]                    
__________________________________________________________________________________________________
photon_masking (Masking)        (None, 2, 5)         0           input_5[0][0]                    
__________________________________________________________________________________________________
jet_gru (GRU)                   (None, 25)           2325        jet_masking[0][0]                
__________________________________________________________________________________________________
muon_gru (GRU)                  (None, 10)           510         muon_masking[0][0]               
__________________________________________________________________________________________________
electron_gru (GRU)              (None, 10)           510         electron_masking[0][0]           
__________________________________________________________________________________________________
photon_gru (GRU)                (None, 10)           480         photon_masking[0][0]             
__________________________________________________________________________________________________
jet_dropout (Dropout)           (None, 25)           0           jet_gru[0][0]                    
__________________________________________________________________________________________________
muon_dropout (Dropout)          (None, 10)           0           muon_gru[0][0]                   
__________________________________________________________________________________________________
electron_dropout (Dropout)      (None, 10)           0           electron_gru[0][0]               
__________________________________________________________________________________________________
photon_dropout (Dropout)        (None, 10)           0           photon_gru[0][0]                 
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 55)           0           jet_dropout[0][0]                
                                                                 muon_dropout[0][0]               
                                                                 electron_dropout[0][0]           
                                                                 photon_dropout[0][0]             
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 64)           3584        concatenate_3[0][0]              
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 64)           0           dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 64)           4160        dropout_4[0][0]                  
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 64)           0           dense_7[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 64)           4160        dropout_5[0][0]                  
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 64)           0           dense_8[0][0]                    
__________________________________________________________________________________________________
dense_9 (Dense)                 (None, 3)            195         dropout_6[0][0]                  
==================================================================================================
Total params: 15,924
Trainable params: 15,924
Non-trainable params: 0
__________________________________________________________________________________________________



In [92]:

    
combined_rnn.compile('adam', 'categorical_crossentropy')



In [93]:

    
print 'Training:'
try:
    combined_rnn.fit([Xjet_train, Xmuon_train, Xelectron_train, Xphoton_train], y_train, batch_size=16,
            class_weight={
                0 : 0.20 * (float(len(y)) / (y == 0).sum()),
                1 : 0.40 * (float(len(y)) / (y == 1).sum()),
                2 : 0.40 * (float(len(y)) / (y == 2).sum())
        },
        callbacks = [
            EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
            ModelCheckpoint('./models/combinedrnn_tutorial-progress', monitor='val_loss', verbose=True, save_best_only=True)
        ],
    epochs=30, 
    validation_split = 0.2) 

except KeyboardInterrupt:
    print 'Training ended early.'









    



Training:






    



/home/ubuntu/venv/micky-work/lib/python2.7/site-packages/ipykernel_launcher.py:14: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  






    



Train on 92150 samples, validate on 23038 samples
Epoch 1/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.5260Epoch 00001: val_loss improved from inf to 0.43021, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 96s 1ms/step - loss: 0.5260 - val_loss: 0.4302
Epoch 2/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4370Epoch 00002: val_loss improved from 0.43021 to 0.42557, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 94s 1ms/step - loss: 0.4370 - val_loss: 0.4256
Epoch 3/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4305Epoch 00003: val_loss did not improve
92150/92150 [==============================] - 86s 932us/step - loss: 0.4306 - val_loss: 0.4304
Epoch 4/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4315Epoch 00004: val_loss did not improve
92150/92150 [==============================] - 84s 910us/step - loss: 0.4315 - val_loss: 0.4307
Epoch 5/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4333Epoch 00005: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4332 - val_loss: 0.4455
Epoch 6/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4316Epoch 00006: val_loss improved from 0.42557 to 0.42022, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 83s 897us/step - loss: 0.4316 - val_loss: 0.4202
Epoch 7/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4289Epoch 00007: val_loss did not improve
92150/92150 [==============================] - 84s 915us/step - loss: 0.4289 - val_loss: 0.4412
Epoch 8/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4317Epoch 00008: val_loss did not improve
92150/92150 [==============================] - 83s 898us/step - loss: 0.4317 - val_loss: 0.4434
Epoch 9/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4327Epoch 00009: val_loss did not improve
92150/92150 [==============================] - 83s 904us/step - loss: 0.4327 - val_loss: 0.4300
Epoch 10/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4270Epoch 00010: val_loss improved from 0.42022 to 0.41622, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 84s 915us/step - loss: 0.4270 - val_loss: 0.4162
Epoch 11/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4276Epoch 00011: val_loss did not improve
92150/92150 [==============================] - 85s 925us/step - loss: 0.4276 - val_loss: 0.4386
Epoch 12/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4262Epoch 00012: val_loss did not improve
92150/92150 [==============================] - 83s 897us/step - loss: 0.4262 - val_loss: 0.4508
Epoch 13/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4310Epoch 00013: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4310 - val_loss: 0.4336
Epoch 14/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4251Epoch 00014: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4251 - val_loss: 0.4474
Epoch 15/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4289Epoch 00015: val_loss did not improve
92150/92150 [==============================] - 84s 907us/step - loss: 0.4289 - val_loss: 0.4300
Epoch 16/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4266Epoch 00016: val_loss did not improve
92150/92150 [==============================] - 83s 898us/step - loss: 0.4265 - val_loss: 0.4411
Epoch 17/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4262Epoch 00017: val_loss did not improve
92150/92150 [==============================] - 83s 895us/step - loss: 0.4262 - val_loss: 0.4372
Epoch 18/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4283Epoch 00018: val_loss did not improve
92150/92150 [==============================] - 83s 896us/step - loss: 0.4283 - val_loss: 0.4370
Epoch 19/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4239Epoch 00019: val_loss did not improve
92150/92150 [==============================] - 83s 905us/step - loss: 0.4239 - val_loss: 0.4401
Epoch 20/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4298Epoch 00020: val_loss did not improve
92150/92150 [==============================] - 83s 902us/step - loss: 0.4298 - val_loss: 0.4304
Epoch 00020: early stopping



In [95]:

    
# -- load in best network
combined_rnn.load_weights('./models/combinedrnn_tutorial-progress')
print 'Saving weights...'
combined_rnn.save_weights('./models/combinedrnn_tutorial.h5', overwrite=True)

json_string = combined_rnn.to_json()
open('./models/combinedrnn_tutorial.json', 'w').write(json_string)









    



Saving weights...



In [96]:

    
yhat_rnn = combined_rnn.predict([Xjet_test, Xmuon_test, Xelectron_test, Xphoton_test], verbose = True, batch_size = 512)









    



76793/76793 [==============================] - 2s 27us/step



In [101]:

    
# Compute confusion matrix
cnf_matrix_rnn = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(yhat_rnn, axis=1), sample_weight=w_test)
np.set_printoptions(precision=2)
plot_confusion_matrix(cnf_matrix_rnn, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
                      normalize=True,
                      title='Normalized confusion matrix')









    



Normalized confusion matrix
[[ 0.93  0.02  0.05]
 [ 0.03  0.6   0.37]
 [ 0.02  0.03  0.95]]



In [102]:

    
# -- turn the predictions back into class labels
yhat_rnn_cls = np.argmax(yhat_rnn, axis=1)



In [103]:

    
# -- do the same for the truth labels
y_test_cls = np.argmax(y_test, axis=1)



In [104]:

    
print 'Signal efficiency:', w_test[(y_test_cls == 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls == 0].sum()









    



Signal efficiency: 0.934622



In [105]:

    
b_eff = w_test[(y_test_cls != 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff









    



Background efficiency: 0.0179076
Background rej: 55.8421979781

	NJet	NMuon	MET_px	MET_py
0	3	1	20.238705	1.876128
1	5	1	-29.892015	16.498444
2	5	1	-24.056999	-46.160618
3	3	1	-30.039965	5.851555
4	3	1	4.518302	117.570068

	NJet	Jet_Px	Jet_Py	Jet_Pz	Jet_E	Jet_btag	Jet_ID	NMuon	Muon_Px	Muon_Py	...	MClepton_px	MClepton_py	MClepton_pz	MCleptonPDGid	MCneutrino_px	MCneutrino_py	MCneutrino_pz	NPrimaryVertices	triggerIsoMu24	EventWeight
7	3	[-39.2862, 49.3453, -7.69329]	[-32.3978, 5.15671, 41.4265]	[-65.3803, -95.7182, -70.6679]	[83.5105, 108.047, 82.8404]	[2.42398, -1.0, -1.0]	[True, True, True]	1	[-34.0692]	[5.64295]	...	-34.540741	5.717676	-10.324351	-13	43.943020	4.711743	-16.245531	4	True	0.279545
16	5	[184.591, 10.8253, -72.3689, -43.3258, 2.57552]	[117.452, -166.958, 48.44, -5.91005, 42.4344]	[379.501, -51.8267, 452.426, 33.3998, -77.0341]	[438.532, 176.045, 461.069, 56.1953, 88.4542]	[-1.0, 3.20327, -1.0, -1.0, -1.0]	[True, True, True, True, True]	1	[42.3929]	[-35.9437]	...	43.464474	-37.058472	-30.771671	13	-21.166794	19.246643	-13.815475	4	True	0.279545
24	5	[-92.04, 29.9482, 24.2262, 42.4436, -18.0844]	[51.4753, -84.1564, 48.5667, -8.55752, 24.7327]	[-52.9665, -181.678, -10.7411, -31.6566, -43.3...	[121.717, 203.098, 57.0088, 53.9551, 53.2984]	[-1.0, -1.0, -1.0, -1.0, -1.0]	[True, True, True, True, True]	1	[26.0595]	[0.139715]	...	27.888340	-2.613286	-114.308418	13	-43.849094	-32.715321	-46.295307	8	True	0.299037
37	3	[92.6861, -21.7365, 21.6693]	[-126.292, 52.302, 31.9107]	[88.5404, -12.5433, -18.5118]	[180.864, 58.5981, 43.3167]	[2.59038, 2.73967, -1.0]	[True, True, True]	1	[-13.8492]	[38.0815]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	1	True	0.270251
40	3	[-97.5345, -19.9276, 2.89748]	[-13.185, -80.3989, 30.9127]	[-11.907, 51.528, -91.2712]	[101.636, 98.2916, 96.7967]	[2.89632, -1.0, -1.0]	[True, True, True]	1	[72.6027]	[-9.59597]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	16	True	0.010469
51	4	[13.2568, 84.261, -4.9345, -33.4614]	[188.902, -129.347, -121.279, -62.2332]	[-67.8184, 185.424, 164.392, 53.9148]	[202.167, 242.673, 205.04, 89.4054]	[2.31161, -1.0, -1.0, 3.40863]	[True, True, True, True]	2	[-41.887, 8.97538]	[108.296, 21.3534]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	6	True	0.293529
57	4	[86.7032, -84.3351, -60.7989, 38.2517]	[0.745445, -0.972272, 49.1331, -22.5668]	[-77.6728, -19.5303, 3.48934, 48.336]	[117.088, 87.3901, 78.8732, 66.4349]	[1.94137, -1.0, -1.0, -1.0]	[True, True, True, True]	0	[]	[]	...	-51.339859	47.421997	9.806538	-13	7.199954	33.037640	64.709595	13	True	0.026791
58	0	[]	[]	[]	[]	[]	[]	1	[-73.5007]	[47.405]	...	-77.774635	49.710968	-163.584732	13	-14.959000	-17.021935	-129.785049	20	True	0.010469
68	3	[-72.5721, -37.3589, 54.3353]	[-21.6138, 44.3861, -11.165]	[-203.388, -141.191, -148.702]	[217.305, 152.979, 159.159]	[1.64367, 2.30578, -1.0]	[True, True, True]	1	[-1.18855]	[-36.1914]	...	-1.134651	-35.692688	-109.194305	13	67.564972	-8.594560	-78.256340	6	True	0.286201
80	3	[394.21, -66.0473, 8.33537]	[-174.289, 96.3359, -45.8379]	[-363.242, 1.3422, -32.5724]	[574.341, 117.572, 57.576]	[2.80352, 1.69599, -1.0]	[True, True, True]	1	[-66.4606]	[-1.64239]	...	-100.356377	-2.219039	-10.412981	-13	-246.632401	112.364296	-81.893837	3	True	0.271386
86	4	[65.0804, 47.7328, -41.3639, -15.2447]	[-84.0002, 73.0816, -21.3092, 30.1185]	[-19.6535, -286.044, -80.5942, -45.1493]	[109.014, 299.209, 93.831, 56.4601]	[3.79264, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[9.05585]	[-24.4455]	...	8.528934	-24.427439	27.513771	13	-51.809479	8.691632	-20.814299	11	True	0.105165
96	2	[-40.122, 27.6997]	[-44.2169, -41.4156]	[10.1575, 29.0914]	[62.1224, 58.2074]	[-1.0, 1.66511]	[True, True]	1	[42.9513]	[45.6999]	...	44.989742	46.907940	50.970924	-13	13.318195	-31.906128	19.553530	10	True	0.299981
107	5	[-100.62, 32.0011, -35.6917, 28.3667, 3.8716]	[-4.6241, -39.0739, 26.2937, -25.9119, 34.3113]	[-135.39, 38.0203, -23.802, -58.4347, 29.4247]	[171.725, 64.9216, 51.6265, 70.5236, 45.8437]	[-1.0, -1.0, -1.0, 3.47004, -1.0]	[True, True, True, True, True]	1	[32.2751]	[-59.3023]	...	33.404510	-60.193768	-54.171921	13	44.509659	19.568098	-44.442898	1	True	0.270251
122	3	[10.4771, -43.1103, 23.1228]	[65.1101, -9.28895, 33.4949]	[-64.5278, 20.2662, 20.8267]	[92.8245, 48.7273, 46.165]	[4.1136, -1.0, -1.0]	[True, True, True]	1	[-0.298645]	[-52.4379]	...	-2.104305	-95.161293	-309.347382	-15	-8.769296	-14.520910	4.469729	6	True	0.286201
133	3	[-104.634, 48.1258, 12.1684]	[29.6702, -66.15, 34.2625]	[-72.2929, -229.216, 13.3559]	[131.668, 243.703, 38.9503]	[3.45501, -1.0, -1.0]	[True, True, True]	0	[]	[]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	1	True	0.270251
144	3	[-75.713, 58.5136, 31.0808]	[44.9549, -38.7253, -41.9073]	[-195.356, -125.271, -35.4589]	[214.638, 144.113, 64.1693]	[2.9908, 3.62441, -1.0]	[True, True, True]	1	[-45.2115]	[5.31683]	...	-44.851936	5.279506	-0.475743	13	20.658106	36.965542	-45.506985	6	True	0.286201
148	4	[-113.872, -32.332, 34.4798, 10.6671]	[-23.9376, 35.5097, -6.87441, 30.7233]	[87.0612, 45.5352, 72.4027, -44.7416]	[145.824, 66.5828, 80.8721, 55.7227]	[2.06658, -1.0, -1.0, 3.2202]	[True, True, True, True]	1	[74.4426]	[-47.3361]	...	74.646675	-48.276752	-21.981726	13	-14.983271	-19.674046	-33.766380	19	True	0.014366
157	2	[39.0372, 18.8909]	[56.7757, -41.463]	[-154.183, -153.292]	[169.354, 160.027]	[-1.0, 2.21366]	[True, True]	1	[-81.4154]	[11.1185]	...	-88.916168	11.958924	-100.715332	-13	-8.342011	-37.894085	-57.389412	5	True	0.279545
168	4	[-23.9239, -6.59758, -14.6238, 45.4713]	[60.4995, -51.068, 46.1813, -10.2181]	[57.153, -172.568, -82.0874, -18.6692]	[87.7235, 180.256, 95.6667, 50.63]	[3.08056, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[42.5798]	[-37.9202]	...	42.677971	-38.684910	25.826906	13	-35.230225	-32.083885	29.348593	4	True	0.274432
171	2	[-18.0533, 45.1819]	[75.6471, -25.322]	[-59.945, -22.2292]	[98.8365, 57.0231]	[1.58192, -1.0]	[True, True]	1	[-2.2919]	[-55.9058]	...	-2.266898	-56.387188	84.049629	13	17.545727	17.956202	95.806236	3	True	0.279545
175	2	[-62.7099, -16.6717]	[62.0686, -68.4154]	[-56.106, -164.618]	[105.655, 179.547]	[-1.0, -1.0]	[True, True]	1	[52.0206]	[100.328]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	3	True	0.293529
188	2	[10.1792, -26.8712]	[55.9482, -16.998]	[-89.4134, 72.4999]	[106.531, 79.6631]	[-1.0, 2.40431]	[True, True]	1	[-65.1655]	[22.1073]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	6	True	0.293529
194	3	[69.7044, 28.6426, -30.5756]	[-21.1823, 21.4084, -5.88853]	[189.136, 125.371, 28.3306]	[203.069, 130.501, 42.8366]	[2.25507, -1.0, -1.0]	[True, True, True]	1	[-84.0036]	[-38.9238]	...	-79.938377	-35.873989	-74.092720	13	-14.628411	30.902468	-51.598186	5	True	0.293529
197	4	[-81.0861, 35.7697, 45.4883, -7.59584]	[-22.679, -41.042, 24.9598, 37.9333]	[22.1584, -50.1099, -306.429, -61.9009]	[87.7846, 74.7338, 311.244, 73.3563]	[4.60365, 1.62578, -1.0, -1.0]	[True, True, True, True]	1	[-21.7847]	[10.4468]	...	-21.555775	10.569125	16.874752	13	66.121849	17.503078	1.514331	5	True	0.286201
203	4	[-92.6663, -29.3145, 43.617, 22.991]	[-9.88577, -39.3474, 0.0992902, 32.9985]	[-19.9208, 24.1042, -28.9692, 73.4268]	[96.1515, 55.6889, 53.1124, 84.3405]	[2.15808, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[51.9452]	[-34.679]	...	51.943840	-35.272839	-15.022160	13	-14.474733	10.930140	20.586962	11	True	0.160826
204	1	[21.5621]	[-30.0244]	[160.531]	[164.844]	[-1.0]	[True]	2	[109.452, -9.8738]	[191.047, -12.5724]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	4	True	0.274432
219	2	[55.6175, 32.9164]	[39.0514, -11.1956]	[-44.6516, -31.1788]	[82.3264, 47.1459]	[2.18723, -1.0]	[True, True]	1	[-17.0189]	[47.6267]	...	-17.056755	47.598736	0.132445	-13	-18.238882	4.040609	71.326790	4	True	0.279545
223	2	[-63.6374, 21.9639]	[58.9319, -21.1155]	[131.927, 79.1002]	[159.269, 85.0185]	[-1.0, -1.0]	[True, True]	1	[39.4852]	[-28.8416]	...	40.519344	-29.519676	67.463295	-13	2.028334	13.116418	-16.692287	25	True	0.001839
235	1	[-22.13]	[-44.9609]	[-56.7319]	[77.0054]	[-1.0]	[True]	1	[18.7418]	[37.8549]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	15	True	0.005580
268	3	[40.9751, -26.4305, 8.81603]	[0.197465, 19.2683, -30.2287]	[-10.3833, 126.664, 35.724]	[42.9108, 130.983, 47.8053]	[-1.0, 1.85155, -1.0]	[True, True, True]	0	[]	[]	...	-53.865688	5.485102	147.813019	13	6.192799	-14.861897	285.897095	7	True	0.293529
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
36671	4	[63.6192, 15.5392, -24.603, -22.424]	[46.3087, -54.6324, 33.719, 33.788]	[-201.403, 121.437, -185.224, -41.4796]	[216.771, 134.415, 189.95, 59.4113]	[-1.0, 2.30456, -1.0, -1.0]	[True, True, True, True]	1	[44.2237]	[1.19632]	...	44.393093	4.441749	17.649075	13	-37.079338	4.005792	22.262280	13	True	0.192474
36674	3	[-63.3877, -42.8766, 42.7456]	[16.5414, 9.93516, 3.20814]	[57.6559, 93.1895, 27.8355]	[87.8452, 103.46, 51.6133]	[2.389, -1.0, -1.0]	[True, True, True]	1	[-19.4274]	[-29.8691]	...	-19.507946	-30.081053	47.349102	-13	57.827797	-55.537346	26.413647	16	True	0.105165
36678	4	[-42.7893, 41.4222, -12.0665, 17.7019]	[-36.8382, -15.6533, 37.0771, 30.5058]	[-41.9049, -4.82216, 5.23354, -70.8694]	[70.8273, 45.4714, 39.5967, 79.481]	[3.14142, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[-1.03951]	[-37.0587]	...	-0.284460	-37.510612	-173.577927	-13	-17.592842	-28.825563	-12.680113	11	True	0.192474
36691	4	[-58.3385, -18.4484, 15.4408, 37.792]	[-4.25548, -43.2476, 42.527, 3.44773]	[145.612, 57.8691, 70.6434, -58.7726]	[157.184, 75.1443, 84.1751, 70.7867]	[-1.0, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[-45.0365]	[-3.44792]	...	-46.260303	-3.330531	-12.245513	13	34.181591	14.543386	6.494123	4	True	0.279545
36697	0	[]	[]	[]	[]	[]	[]	2	[72.8115, -32.1987]	[-41.4184, 46.7349]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	15	True	0.036131
36698	2	[49.845, -38.1155]	[-31.4604, -12.8079]	[-105.783, -222.021]	[121.478, 225.912]	[-1.0, -1.0]	[True, True]	1	[39.8958]	[-8.11494]	...	41.240803	-6.950862	-118.234520	-13	-24.029903	32.869717	-56.536091	5	True	0.274432
36704	3	[44.5244, -22.9694, -5.49336]	[52.0617, -22.2976, -29.943]	[-123.699, 45.3537, -114.566]	[141.7, 56.1422, 118.853]	[-1.0, -1.0, -1.0]	[True, True, True]	1	[-18.3659]	[38.4539]	...	-19.019670	38.604202	-15.343031	-13	-85.043907	6.205180	-118.760750	8	True	0.253713
36705	4	[95.7703, -91.2611, -22.8162, 11.3628]	[-61.2077, -23.8256, -90.0745, 70.7234]	[14.1595, 200.652, 21.1822, 282.059]	[116.274, 222.271, 96.4095, 291.297]	[-1.0, -1.0, -1.0, -1.0]	[True, True, True, True]	1	[-68.7636]	[41.4596]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	10	True	0.293189
36723	2	[-29.0196, 40.0986]	[-28.8848, -6.73161]	[-10.89, 10.5535]	[43.0001, 42.402]	[1.72804, -1.0]	[True, True]	1	[48.2104]	[10.7573]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	3	True	0.271386
36728	4	[-37.2348, -19.6443, 35.2389, 36.7954]	[-58.4847, 53.7423, 23.7087, 0.172772]	[-140.538, -103.277, -81.6349, -79.4282]	[157.469, 118.497, 92.416, 87.6949]	[-1.0, -1.0, -1.0, -1.0]	[True, True, True, True]	0	[]	[]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	5	True	0.293529
36733	2	[90.0657, -63.0919]	[57.5107, -48.7488]	[-176.365, 13.9594]	[207.504, 81.852]	[2.16326, 3.64163]	[True, True]	1	[29.2887]	[-13.7026]	...	29.956720	-13.935602	5.496824	13	-3.975806	18.813091	100.464325	4	True	0.293529
36761	5	[-72.7327, 87.8201, -48.6379, 35.3672, -29.3188]	[-71.4104, 25.7807, -24.6549, 37.8814, 26.399]	[-25.5412, -27.7219, 69.2121, 166.603, -101.946]	[105.677, 96.5188, 88.2745, 174.643, 109.666]	[2.64415, -1.0, 2.74785, -1.0, -1.0]	[True, True, True, True, True]	1	[-15.0587]	[-19.3267]	...	-14.784895	-19.563482	-41.428085	13	44.433407	46.769634	-128.488770	9	True	0.277205
36773	2	[42.3535, 7.84763]	[7.22512, -38.6475]	[24.7581, 90.8231]	[50.1142, 99.2029]	[2.67679, -1.0]	[True, True]	1	[-72.876]	[53.7722]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	5	True	0.286201
36777	5	[-99.6067, -3.34446, 32.1012, 42.7883, -11.6452]	[-25.3014, 92.4234, -32.9203, 0.946451, 36.2387]	[155.932, 174.57, 38.0382, 25.5367, 18.2656]	[187.385, 198.147, 60.2857, 50.5626, 42.798]	[-1.0, 3.07208, -1.0, -1.0, -1.0]	[True, True, True, True, True]	1	[23.5196]	[-33.3833]	...	19.755816	-44.092232	172.988556	-13	28.066628	-17.915775	9.881391	6	True	0.293529
36779	3	[-163.892, -104.229, 43.5479]	[15.7904, -1.47729, -1.82685]	[-387.744, -61.9309, -63.0709]	[423.49, 123.1, 77.3157]	[-1.0, -1.0, 2.4724]	[True, True, True]	1	[204.758]	[3.03201]	...	194.055786	2.588275	31.950975	-13	36.918297	-33.213650	-13.392416	17	True	0.014366
36845	2	[56.3493, 19.3332]	[-62.0922, 23.8897]	[98.1019, 8.61856]	[129.654, 32.2253]	[2.90254, -1.0]	[True, True]	1	[-58.4214]	[49.5135]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	10	True	0.277205
36846	2	[63.5405, 61.2992]	[33.7838, 6.70778]	[-10.9178, 69.2736]	[73.4685, 93.2766]	[-1.0, -1.0]	[True, True]	1	[-79.0116]	[-19.7364]	...	-77.586594	-19.407608	-66.658394	13	-72.814438	-36.851547	17.604248	6	True	0.293529
36854	6	[23.5022, -29.0738, 27.8149, 27.4214, 0.959874...	[-79.9664, -53.5415, 44.6099, 26.0418, 33.4664...	[7.103, 22.0331, -156.058, -22.6882, -58.1608,...	[85.9293, 65.432, 165.279, 44.604, 67.6031, 13...	[-1.0, -1.0, -1.0, -1.0, -1.0, -1.0]	[True, True, True, True, True, True]	1	[-32.9304]	[34.6853]	...	-32.763885	34.414585	-20.146509	13	-34.493790	9.358992	-145.859955	19	True	0.026791
36862	4	[65.2048, 26.1962, -27.163, 38.1059]	[-77.5005, 70.3165, -45.9601, 33.931]	[-55.8269, -3.97152, 95.8268, 287.091]	[117.233, 75.8283, 110.272, 291.82]	[2.6823, 2.13792, -1.0, -1.0]	[True, True, True, True]	0	[]	[]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	9	True	0.277205
36864	4	[-105.143, 96.585, -95.1253, 80.7612]	[-153.554, 39.1784, 37.6713, -11.9996]	[-238.466, -25.2841, -9.86635, -101.42]	[304.335, 108.426, 103.746, 130.728]	[-1.0, 4.3339, 4.28816, -1.0]	[True, True, True, True]	0	[]	[]	...	18.866938	-15.149240	7.708319	-13	1.578786	85.071571	34.158749	11	True	0.160826
36873	4	[237.71, -193.887, 35.7849, 28.4387]	[-69.9921, 0.191406, 13.5153, -21.4827]	[-152.786, 436.524, -0.336736, 2.15682]	[293.461, 478.051, 39.3145, 36.544]	[-1.0, 3.04585, -1.0, 3.54336]	[True, True, True, True]	1	[-45.9498]	[59.5273]	...	-48.707134	60.437092	219.109711	-13	-32.495464	14.442889	15.860429	9	True	0.293189
36884	3	[68.6953, -68.5364, 46.7583]	[22.5236, 14.8628, -42.646]	[15.618, -29.9445, 35.7689]	[75.5679, 77.1019, 73.1977]	[2.38775, 2.44255, 1.78135]	[True, True, True]	1	[-10.2928]	[25.1898]	...	-10.396090	25.239723	-11.126865	13	-42.048183	-10.150770	74.172813	3	True	0.270077
36888	2	[-99.3072, -29.3348]	[49.2617, -29.6785]	[155.757, -6.78339]	[191.845, 42.8853]	[-1.0, 4.18899]	[True, True]	1	[114.711]	[-47.2675]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	8	True	0.299981
36892	4	[-158.273, -67.7991, -33.143, -1.02856]	[201.345, -24.0705, -49.9765, -31.1794]	[-56.5885, 108.054, -133.396, -40.5998]	[270.567, 130.381, 146.749, 51.5353]	[-1.0, 4.01285, -1.0, -1.0]	[True, True, True, True]	1	[-101.12]	[-152.441]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	9	True	0.277205
36897	3	[-107.69, -46.7154, -15.9445]	[-24.167, 8.60087, -26.171]	[-6.4825, 41.4792, 96.2556]	[111.979, 63.7637, 101.178]	[2.38118, -1.0, -1.0]	[True, True, True]	1	[137.716]	[21.6976]	...	146.659790	22.631247	36.031197	13	51.945667	-31.295113	-22.553812	9	True	0.277205
36900	4	[36.9388, 27.8715, 35.8173, -23.6561]	[35.4823, 34.4244, -21.416, -19.9391]	[-53.2023, -18.4092, -60.2866, 21.9757]	[74.1006, 48.2303, 74.2057, 38.408]	[2.69591, 2.68174, -1.0, -1.0]	[True, True, True, True]	1	[-31.3944]	[30.6773]	...	-31.754948	30.867546	0.585443	-13	-73.327507	-50.762230	-15.318438	5	True	0.271386
36906	3	[60.7559, -6.67839, 10.8504]	[-23.811, -47.7918, -33.6265]	[78.5075, 165.349, 5.05383]	[103.611, 172.752, 36.5233]	[-1.0, 1.40379, 4.02371]	[True, True, True]	1	[-78.0497]	[14.8932]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	19	True	0.036131
36924	1	[44.4301]	[22.9213]	[-53.3913]	[74.282]	[-1.0]	[True]	1	[14.2536]	[-34.5261]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	15	True	0.105165
36932	2	[-189.224, 161.017]	[289.557, -116.198]	[-246.186, -161.373]	[425.525, 257.883]	[-1.0, -1.0]	[True, True]	1	[-23.1623]	[-215.169]	...	0.000000	0.000000	0.000000	0	0.000000	0.000000	0.000000	5	True	0.293529
36936	4	[130.752, -88.6566, 52.0555, -38.6537]	[-25.4033, 73.8251, -74.2651, -30.6417]	[-120.235, -287.902, -124.879, -128.15]	[180.098, 310.715, 154.604, 137.425]	[-1.0, 2.17192, -1.0, -1.0]	[True, True, True, True]	1	[-44.8889]	[42.0714]	...	-42.990280	40.279636	-78.633194	-13	13.631536	-7.189492	9.089453	12	True	0.019638

	NJet	Jet_Px	Jet_Py	Jet_Pz	Jet_E	Jet_btag	Jet_ID	NMuon	Muon_Px	Muon_Py	...	MClepton_py	MClepton_pz	MCleptonPDGid	MCneutrino_px	MCneutrino_py	MCneutrino_pz	NPrimaryVertices	triggerIsoMu24	EventWeight	true flavor
71068	2	[-39.515, -32.7724]	[-7.06055, 4.96038]	[-59.9577, 1.15056]	[72.4383, 33.8005]	[-1.0, -1.0]	[True, True]	1	[36.5116]	[-27.2592]	...	0.000000	0.000000	0	0.000000	0.000000	0.000000	7	True	0.540515	1
1249	5	[60.9035, -111.827, -27.5385, 52.0547, 52.3307]	[117.735, -9.45221, -51.6361, 13.4452, 7.99291]	[151.038, -72.7721, -35.0566, 143.468, -8.85212]	[202.914, 135.274, 70.2694, 153.809, 55.7241]	[-1.0, -1.0, -1.0, -1.0, -1.0]	[True, True, True, True, True]	0	[]	[]	...	-27.402386	-25.099463	13	-129.215714	-97.106895	43.848606	14	True	0.192474	0
143500	2	[-53.283, -38.1472]	[-76.5362, -9.61194]	[-86.3619, 32.1003]	[128.014, 51.698]	[-1.0, -1.0]	[True, True]	1	[39.2392]	[16.2889]	...	0.000000	0.000000	0	0.000000	0.000000	0.000000	10	True	2.393087	2
1186	3	[-17.622, -42.9345, -26.1204]	[-59.8529, 38.5907, 16.7819]	[-13.7983, -1.9199, 128.346]	[64.3067, 58.404, 132.271]	[1.52388, -1.0, -1.0]	[True, True, True]	2	[63.2695, 34.3971]	[3.4336, -23.4732]	...	0.000000	0.000000	0	0.000000	0.000000	0.000000	9	True	0.299981	0
28224	2	[49.646, 34.655]	[19.6789, -29.2292]	[-194.703, -144.563]	[202.0, 151.752]	[-1.0, -1.0]	[True, True]	2	[-60.5106, -35.1742]	[37.2119, -34.9203]	...	0.000000	0.000000	0	0.000000	0.000000	0.000000	20	True	0.005656	1