Introduction to Deep Learning with Keras

Michela Paganini - Yale University

High Energy Phenomenology, Experiment and Cosmology Seminar Series

This is the second of two notebooks for this tutorial. If you are unfamiliar with data handling techniques in a Python environment, see the previous notebook.

What is Keras? Why Using Keras?

  • Modular, powerful and intuitive Deep Learning Python library built on

    and and

    • Let you call C++ efficiently while using syntax of higher level programming language
    • Offer arbitrary tensor function definition, symbolic graph building, autodifferentiation, optimization, parallelism
  • Keras doesn’t itself handle any tensor ops; it relies on these tensor manipulation libraries

  • Connects with both of them via the abstract Keras backend
  • Provides you with high level API, simple neural network building blocks
  • Now integral part of TensorFlow via `tf.keras` module
  • Model and data parallelism across different devices

Other Properties of Keras

Developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

https://keras.io
  • Minimalist, user-friendly interface
  • Extremely well documented, lots of working examples
  • Very shallow learning curve $\rightarrow$ by far one of the best tools for both beginners and experts
  • Open-source, developed and maintained by a community of contributors, and publicly hosted on GitHub
  • Extensible: possibility to customize layers

From the Keras website:

Load and Plot Data


In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
%matplotlib inline

Let's load the utility functions we defined in the previous notebook.


In [2]:
import glob
from root_numpy import root2array
from numpy.lib.recfunctions import stack_arrays

def root2pandas(files_path, tree_name, **kwargs):
    '''
    Args:
    -----
        files_path: a string like './data/*.root', for example
        tree_name: a string like 'Collection_Tree' corresponding to the name of the folder inside the root 
                   file that we want to open
        kwargs: arguments taken by root2array, such as branches to consider, start, stop, step, etc
    Returns:
    --------    
        output_panda: a pandas dataframe like allbkg_df in which all the info from the root file will be stored
    
    Note:
    -----
        if you are working with .root files that contain different branches, you might have to mask your data
        in that case, return pd.DataFrame(ss.data)
    '''
    # -- create list of .root files to process
    files = glob.glob(files_path)
    
    # -- process ntuples into rec arrays
    ss = stack_arrays([root2array(fpath, tree_name, **kwargs).view(np.recarray) for fpath in files])

    try:
        return pd.DataFrame(ss)
    except Exception:
        return pd.DataFrame(ss.data)
    
    
def flatten(column):
    '''
    Args:
    -----
        column: a column of a pandas df whose entries are lists (or regular entries -- in which case nothing is done)
                e.g.: my_df['some_variable'] 

    Returns:
    --------    
        flattened out version of the column. 

        For example, it will turn:
        [1791, 2719, 1891]
        [1717, 1, 0, 171, 9181, 537, 12]
        [82, 11]
        ...
        into:
        1791, 2719, 1891, 1717, 1, 0, 171, 9181, 537, 12, 82, 11, ...
    '''
    try:
        return np.array([v for e in column for v in e])
    except (TypeError, ValueError):
        return column


Welcome to JupyROOT 6.08/02

Load all data in memory into dataframes.


In [3]:
# MC signal:
ttbar = root2pandas('files/ttbar.root', 'events') 
# MC backgrounds:
dy = root2pandas('files/dy.root', 'events')
wj = root2pandas('files/wjets.root', 'events')
ww = root2pandas('files/ww.root', 'events')
wz = root2pandas('files/wz.root', 'events')
zz = root2pandas('files/zz.root', 'events')
singletop = root2pandas('files/single_top.root', 'events')
qcd = root2pandas('files/qcd.root', 'events')
# data:
data = root2pandas('files/data.root', 'events')

All samples except from the $t\bar{t}$ one are produced using muon trigger. To enforce that on the $t\bar{t}$ sample as well, we use the branch called triggerIsoMu24 which contains a boolean to indicate if any event would have passed the specified trigger.


In [4]:
# -- step by step:
ttbar['triggerIsoMu24']


Out[4]:
0        False
1        False
2        False
3        False
4        False
5        False
6        False
7         True
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16        True
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24        True
25       False
26       False
27       False
28       False
29       False
         ...  
36911    False
36912    False
36913    False
36914    False
36915    False
36916    False
36917    False
36918    False
36919    False
36920    False
36921    False
36922    False
36923    False
36924     True
36925    False
36926    False
36927    False
36928    False
36929    False
36930    False
36931    False
36932     True
36933    False
36934    False
36935    False
36936     True
36937    False
36938    False
36939    False
36940    False
Name: triggerIsoMu24, dtype: bool

In [5]:
ttbar[ttbar['triggerIsoMu24']] # slicing doesn't automatically reset the indices


Out[5]:
NJet Jet_Px Jet_Py Jet_Pz Jet_E Jet_btag Jet_ID NMuon Muon_Px Muon_Py ... MClepton_px MClepton_py MClepton_pz MCleptonPDGid MCneutrino_px MCneutrino_py MCneutrino_pz NPrimaryVertices triggerIsoMu24 EventWeight
7 3 [-39.2862, 49.3453, -7.69329] [-32.3978, 5.15671, 41.4265] [-65.3803, -95.7182, -70.6679] [83.5105, 108.047, 82.8404] [2.42398, -1.0, -1.0] [True, True, True] 1 [-34.0692] [5.64295] ... -34.540741 5.717676 -10.324351 -13 43.943020 4.711743 -16.245531 4 True 0.279545
16 5 [184.591, 10.8253, -72.3689, -43.3258, 2.57552] [117.452, -166.958, 48.44, -5.91005, 42.4344] [379.501, -51.8267, 452.426, 33.3998, -77.0341] [438.532, 176.045, 461.069, 56.1953, 88.4542] [-1.0, 3.20327, -1.0, -1.0, -1.0] [True, True, True, True, True] 1 [42.3929] [-35.9437] ... 43.464474 -37.058472 -30.771671 13 -21.166794 19.246643 -13.815475 4 True 0.279545
24 5 [-92.04, 29.9482, 24.2262, 42.4436, -18.0844] [51.4753, -84.1564, 48.5667, -8.55752, 24.7327] [-52.9665, -181.678, -10.7411, -31.6566, -43.3... [121.717, 203.098, 57.0088, 53.9551, 53.2984] [-1.0, -1.0, -1.0, -1.0, -1.0] [True, True, True, True, True] 1 [26.0595] [0.139715] ... 27.888340 -2.613286 -114.308418 13 -43.849094 -32.715321 -46.295307 8 True 0.299037
37 3 [92.6861, -21.7365, 21.6693] [-126.292, 52.302, 31.9107] [88.5404, -12.5433, -18.5118] [180.864, 58.5981, 43.3167] [2.59038, 2.73967, -1.0] [True, True, True] 1 [-13.8492] [38.0815] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 1 True 0.270251
40 3 [-97.5345, -19.9276, 2.89748] [-13.185, -80.3989, 30.9127] [-11.907, 51.528, -91.2712] [101.636, 98.2916, 96.7967] [2.89632, -1.0, -1.0] [True, True, True] 1 [72.6027] [-9.59597] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 16 True 0.010469
51 4 [13.2568, 84.261, -4.9345, -33.4614] [188.902, -129.347, -121.279, -62.2332] [-67.8184, 185.424, 164.392, 53.9148] [202.167, 242.673, 205.04, 89.4054] [2.31161, -1.0, -1.0, 3.40863] [True, True, True, True] 2 [-41.887, 8.97538] [108.296, 21.3534] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 6 True 0.293529
57 4 [86.7032, -84.3351, -60.7989, 38.2517] [0.745445, -0.972272, 49.1331, -22.5668] [-77.6728, -19.5303, 3.48934, 48.336] [117.088, 87.3901, 78.8732, 66.4349] [1.94137, -1.0, -1.0, -1.0] [True, True, True, True] 0 [] [] ... -51.339859 47.421997 9.806538 -13 7.199954 33.037640 64.709595 13 True 0.026791
58 0 [] [] [] [] [] [] 1 [-73.5007] [47.405] ... -77.774635 49.710968 -163.584732 13 -14.959000 -17.021935 -129.785049 20 True 0.010469
68 3 [-72.5721, -37.3589, 54.3353] [-21.6138, 44.3861, -11.165] [-203.388, -141.191, -148.702] [217.305, 152.979, 159.159] [1.64367, 2.30578, -1.0] [True, True, True] 1 [-1.18855] [-36.1914] ... -1.134651 -35.692688 -109.194305 13 67.564972 -8.594560 -78.256340 6 True 0.286201
80 3 [394.21, -66.0473, 8.33537] [-174.289, 96.3359, -45.8379] [-363.242, 1.3422, -32.5724] [574.341, 117.572, 57.576] [2.80352, 1.69599, -1.0] [True, True, True] 1 [-66.4606] [-1.64239] ... -100.356377 -2.219039 -10.412981 -13 -246.632401 112.364296 -81.893837 3 True 0.271386
86 4 [65.0804, 47.7328, -41.3639, -15.2447] [-84.0002, 73.0816, -21.3092, 30.1185] [-19.6535, -286.044, -80.5942, -45.1493] [109.014, 299.209, 93.831, 56.4601] [3.79264, -1.0, -1.0, -1.0] [True, True, True, True] 1 [9.05585] [-24.4455] ... 8.528934 -24.427439 27.513771 13 -51.809479 8.691632 -20.814299 11 True 0.105165
96 2 [-40.122, 27.6997] [-44.2169, -41.4156] [10.1575, 29.0914] [62.1224, 58.2074] [-1.0, 1.66511] [True, True] 1 [42.9513] [45.6999] ... 44.989742 46.907940 50.970924 -13 13.318195 -31.906128 19.553530 10 True 0.299981
107 5 [-100.62, 32.0011, -35.6917, 28.3667, 3.8716] [-4.6241, -39.0739, 26.2937, -25.9119, 34.3113] [-135.39, 38.0203, -23.802, -58.4347, 29.4247] [171.725, 64.9216, 51.6265, 70.5236, 45.8437] [-1.0, -1.0, -1.0, 3.47004, -1.0] [True, True, True, True, True] 1 [32.2751] [-59.3023] ... 33.404510 -60.193768 -54.171921 13 44.509659 19.568098 -44.442898 1 True 0.270251
122 3 [10.4771, -43.1103, 23.1228] [65.1101, -9.28895, 33.4949] [-64.5278, 20.2662, 20.8267] [92.8245, 48.7273, 46.165] [4.1136, -1.0, -1.0] [True, True, True] 1 [-0.298645] [-52.4379] ... -2.104305 -95.161293 -309.347382 -15 -8.769296 -14.520910 4.469729 6 True 0.286201
133 3 [-104.634, 48.1258, 12.1684] [29.6702, -66.15, 34.2625] [-72.2929, -229.216, 13.3559] [131.668, 243.703, 38.9503] [3.45501, -1.0, -1.0] [True, True, True] 0 [] [] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 1 True 0.270251
144 3 [-75.713, 58.5136, 31.0808] [44.9549, -38.7253, -41.9073] [-195.356, -125.271, -35.4589] [214.638, 144.113, 64.1693] [2.9908, 3.62441, -1.0] [True, True, True] 1 [-45.2115] [5.31683] ... -44.851936 5.279506 -0.475743 13 20.658106 36.965542 -45.506985 6 True 0.286201
148 4 [-113.872, -32.332, 34.4798, 10.6671] [-23.9376, 35.5097, -6.87441, 30.7233] [87.0612, 45.5352, 72.4027, -44.7416] [145.824, 66.5828, 80.8721, 55.7227] [2.06658, -1.0, -1.0, 3.2202] [True, True, True, True] 1 [74.4426] [-47.3361] ... 74.646675 -48.276752 -21.981726 13 -14.983271 -19.674046 -33.766380 19 True 0.014366
157 2 [39.0372, 18.8909] [56.7757, -41.463] [-154.183, -153.292] [169.354, 160.027] [-1.0, 2.21366] [True, True] 1 [-81.4154] [11.1185] ... -88.916168 11.958924 -100.715332 -13 -8.342011 -37.894085 -57.389412 5 True 0.279545
168 4 [-23.9239, -6.59758, -14.6238, 45.4713] [60.4995, -51.068, 46.1813, -10.2181] [57.153, -172.568, -82.0874, -18.6692] [87.7235, 180.256, 95.6667, 50.63] [3.08056, -1.0, -1.0, -1.0] [True, True, True, True] 1 [42.5798] [-37.9202] ... 42.677971 -38.684910 25.826906 13 -35.230225 -32.083885 29.348593 4 True 0.274432
171 2 [-18.0533, 45.1819] [75.6471, -25.322] [-59.945, -22.2292] [98.8365, 57.0231] [1.58192, -1.0] [True, True] 1 [-2.2919] [-55.9058] ... -2.266898 -56.387188 84.049629 13 17.545727 17.956202 95.806236 3 True 0.279545
175 2 [-62.7099, -16.6717] [62.0686, -68.4154] [-56.106, -164.618] [105.655, 179.547] [-1.0, -1.0] [True, True] 1 [52.0206] [100.328] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 3 True 0.293529
188 2 [10.1792, -26.8712] [55.9482, -16.998] [-89.4134, 72.4999] [106.531, 79.6631] [-1.0, 2.40431] [True, True] 1 [-65.1655] [22.1073] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 6 True 0.293529
194 3 [69.7044, 28.6426, -30.5756] [-21.1823, 21.4084, -5.88853] [189.136, 125.371, 28.3306] [203.069, 130.501, 42.8366] [2.25507, -1.0, -1.0] [True, True, True] 1 [-84.0036] [-38.9238] ... -79.938377 -35.873989 -74.092720 13 -14.628411 30.902468 -51.598186 5 True 0.293529
197 4 [-81.0861, 35.7697, 45.4883, -7.59584] [-22.679, -41.042, 24.9598, 37.9333] [22.1584, -50.1099, -306.429, -61.9009] [87.7846, 74.7338, 311.244, 73.3563] [4.60365, 1.62578, -1.0, -1.0] [True, True, True, True] 1 [-21.7847] [10.4468] ... -21.555775 10.569125 16.874752 13 66.121849 17.503078 1.514331 5 True 0.286201
203 4 [-92.6663, -29.3145, 43.617, 22.991] [-9.88577, -39.3474, 0.0992902, 32.9985] [-19.9208, 24.1042, -28.9692, 73.4268] [96.1515, 55.6889, 53.1124, 84.3405] [2.15808, -1.0, -1.0, -1.0] [True, True, True, True] 1 [51.9452] [-34.679] ... 51.943840 -35.272839 -15.022160 13 -14.474733 10.930140 20.586962 11 True 0.160826
204 1 [21.5621] [-30.0244] [160.531] [164.844] [-1.0] [True] 2 [109.452, -9.8738] [191.047, -12.5724] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 4 True 0.274432
219 2 [55.6175, 32.9164] [39.0514, -11.1956] [-44.6516, -31.1788] [82.3264, 47.1459] [2.18723, -1.0] [True, True] 1 [-17.0189] [47.6267] ... -17.056755 47.598736 0.132445 -13 -18.238882 4.040609 71.326790 4 True 0.279545
223 2 [-63.6374, 21.9639] [58.9319, -21.1155] [131.927, 79.1002] [159.269, 85.0185] [-1.0, -1.0] [True, True] 1 [39.4852] [-28.8416] ... 40.519344 -29.519676 67.463295 -13 2.028334 13.116418 -16.692287 25 True 0.001839
235 1 [-22.13] [-44.9609] [-56.7319] [77.0054] [-1.0] [True] 1 [18.7418] [37.8549] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 15 True 0.005580
268 3 [40.9751, -26.4305, 8.81603] [0.197465, 19.2683, -30.2287] [-10.3833, 126.664, 35.724] [42.9108, 130.983, 47.8053] [-1.0, 1.85155, -1.0] [True, True, True] 0 [] [] ... -53.865688 5.485102 147.813019 13 6.192799 -14.861897 285.897095 7 True 0.293529
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
36671 4 [63.6192, 15.5392, -24.603, -22.424] [46.3087, -54.6324, 33.719, 33.788] [-201.403, 121.437, -185.224, -41.4796] [216.771, 134.415, 189.95, 59.4113] [-1.0, 2.30456, -1.0, -1.0] [True, True, True, True] 1 [44.2237] [1.19632] ... 44.393093 4.441749 17.649075 13 -37.079338 4.005792 22.262280 13 True 0.192474
36674 3 [-63.3877, -42.8766, 42.7456] [16.5414, 9.93516, 3.20814] [57.6559, 93.1895, 27.8355] [87.8452, 103.46, 51.6133] [2.389, -1.0, -1.0] [True, True, True] 1 [-19.4274] [-29.8691] ... -19.507946 -30.081053 47.349102 -13 57.827797 -55.537346 26.413647 16 True 0.105165
36678 4 [-42.7893, 41.4222, -12.0665, 17.7019] [-36.8382, -15.6533, 37.0771, 30.5058] [-41.9049, -4.82216, 5.23354, -70.8694] [70.8273, 45.4714, 39.5967, 79.481] [3.14142, -1.0, -1.0, -1.0] [True, True, True, True] 1 [-1.03951] [-37.0587] ... -0.284460 -37.510612 -173.577927 -13 -17.592842 -28.825563 -12.680113 11 True 0.192474
36691 4 [-58.3385, -18.4484, 15.4408, 37.792] [-4.25548, -43.2476, 42.527, 3.44773] [145.612, 57.8691, 70.6434, -58.7726] [157.184, 75.1443, 84.1751, 70.7867] [-1.0, -1.0, -1.0, -1.0] [True, True, True, True] 1 [-45.0365] [-3.44792] ... -46.260303 -3.330531 -12.245513 13 34.181591 14.543386 6.494123 4 True 0.279545
36697 0 [] [] [] [] [] [] 2 [72.8115, -32.1987] [-41.4184, 46.7349] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 15 True 0.036131
36698 2 [49.845, -38.1155] [-31.4604, -12.8079] [-105.783, -222.021] [121.478, 225.912] [-1.0, -1.0] [True, True] 1 [39.8958] [-8.11494] ... 41.240803 -6.950862 -118.234520 -13 -24.029903 32.869717 -56.536091 5 True 0.274432
36704 3 [44.5244, -22.9694, -5.49336] [52.0617, -22.2976, -29.943] [-123.699, 45.3537, -114.566] [141.7, 56.1422, 118.853] [-1.0, -1.0, -1.0] [True, True, True] 1 [-18.3659] [38.4539] ... -19.019670 38.604202 -15.343031 -13 -85.043907 6.205180 -118.760750 8 True 0.253713
36705 4 [95.7703, -91.2611, -22.8162, 11.3628] [-61.2077, -23.8256, -90.0745, 70.7234] [14.1595, 200.652, 21.1822, 282.059] [116.274, 222.271, 96.4095, 291.297] [-1.0, -1.0, -1.0, -1.0] [True, True, True, True] 1 [-68.7636] [41.4596] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 10 True 0.293189
36723 2 [-29.0196, 40.0986] [-28.8848, -6.73161] [-10.89, 10.5535] [43.0001, 42.402] [1.72804, -1.0] [True, True] 1 [48.2104] [10.7573] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 3 True 0.271386
36728 4 [-37.2348, -19.6443, 35.2389, 36.7954] [-58.4847, 53.7423, 23.7087, 0.172772] [-140.538, -103.277, -81.6349, -79.4282] [157.469, 118.497, 92.416, 87.6949] [-1.0, -1.0, -1.0, -1.0] [True, True, True, True] 0 [] [] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 5 True 0.293529
36733 2 [90.0657, -63.0919] [57.5107, -48.7488] [-176.365, 13.9594] [207.504, 81.852] [2.16326, 3.64163] [True, True] 1 [29.2887] [-13.7026] ... 29.956720 -13.935602 5.496824 13 -3.975806 18.813091 100.464325 4 True 0.293529
36761 5 [-72.7327, 87.8201, -48.6379, 35.3672, -29.3188] [-71.4104, 25.7807, -24.6549, 37.8814, 26.399] [-25.5412, -27.7219, 69.2121, 166.603, -101.946] [105.677, 96.5188, 88.2745, 174.643, 109.666] [2.64415, -1.0, 2.74785, -1.0, -1.0] [True, True, True, True, True] 1 [-15.0587] [-19.3267] ... -14.784895 -19.563482 -41.428085 13 44.433407 46.769634 -128.488770 9 True 0.277205
36773 2 [42.3535, 7.84763] [7.22512, -38.6475] [24.7581, 90.8231] [50.1142, 99.2029] [2.67679, -1.0] [True, True] 1 [-72.876] [53.7722] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 5 True 0.286201
36777 5 [-99.6067, -3.34446, 32.1012, 42.7883, -11.6452] [-25.3014, 92.4234, -32.9203, 0.946451, 36.2387] [155.932, 174.57, 38.0382, 25.5367, 18.2656] [187.385, 198.147, 60.2857, 50.5626, 42.798] [-1.0, 3.07208, -1.0, -1.0, -1.0] [True, True, True, True, True] 1 [23.5196] [-33.3833] ... 19.755816 -44.092232 172.988556 -13 28.066628 -17.915775 9.881391 6 True 0.293529
36779 3 [-163.892, -104.229, 43.5479] [15.7904, -1.47729, -1.82685] [-387.744, -61.9309, -63.0709] [423.49, 123.1, 77.3157] [-1.0, -1.0, 2.4724] [True, True, True] 1 [204.758] [3.03201] ... 194.055786 2.588275 31.950975 -13 36.918297 -33.213650 -13.392416 17 True 0.014366
36845 2 [56.3493, 19.3332] [-62.0922, 23.8897] [98.1019, 8.61856] [129.654, 32.2253] [2.90254, -1.0] [True, True] 1 [-58.4214] [49.5135] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 10 True 0.277205
36846 2 [63.5405, 61.2992] [33.7838, 6.70778] [-10.9178, 69.2736] [73.4685, 93.2766] [-1.0, -1.0] [True, True] 1 [-79.0116] [-19.7364] ... -77.586594 -19.407608 -66.658394 13 -72.814438 -36.851547 17.604248 6 True 0.293529
36854 6 [23.5022, -29.0738, 27.8149, 27.4214, 0.959874... [-79.9664, -53.5415, 44.6099, 26.0418, 33.4664... [7.103, 22.0331, -156.058, -22.6882, -58.1608,... [85.9293, 65.432, 165.279, 44.604, 67.6031, 13... [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0] [True, True, True, True, True, True] 1 [-32.9304] [34.6853] ... -32.763885 34.414585 -20.146509 13 -34.493790 9.358992 -145.859955 19 True 0.026791
36862 4 [65.2048, 26.1962, -27.163, 38.1059] [-77.5005, 70.3165, -45.9601, 33.931] [-55.8269, -3.97152, 95.8268, 287.091] [117.233, 75.8283, 110.272, 291.82] [2.6823, 2.13792, -1.0, -1.0] [True, True, True, True] 0 [] [] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 9 True 0.277205
36864 4 [-105.143, 96.585, -95.1253, 80.7612] [-153.554, 39.1784, 37.6713, -11.9996] [-238.466, -25.2841, -9.86635, -101.42] [304.335, 108.426, 103.746, 130.728] [-1.0, 4.3339, 4.28816, -1.0] [True, True, True, True] 0 [] [] ... 18.866938 -15.149240 7.708319 -13 1.578786 85.071571 34.158749 11 True 0.160826
36873 4 [237.71, -193.887, 35.7849, 28.4387] [-69.9921, 0.191406, 13.5153, -21.4827] [-152.786, 436.524, -0.336736, 2.15682] [293.461, 478.051, 39.3145, 36.544] [-1.0, 3.04585, -1.0, 3.54336] [True, True, True, True] 1 [-45.9498] [59.5273] ... -48.707134 60.437092 219.109711 -13 -32.495464 14.442889 15.860429 9 True 0.293189
36884 3 [68.6953, -68.5364, 46.7583] [22.5236, 14.8628, -42.646] [15.618, -29.9445, 35.7689] [75.5679, 77.1019, 73.1977] [2.38775, 2.44255, 1.78135] [True, True, True] 1 [-10.2928] [25.1898] ... -10.396090 25.239723 -11.126865 13 -42.048183 -10.150770 74.172813 3 True 0.270077
36888 2 [-99.3072, -29.3348] [49.2617, -29.6785] [155.757, -6.78339] [191.845, 42.8853] [-1.0, 4.18899] [True, True] 1 [114.711] [-47.2675] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 8 True 0.299981
36892 4 [-158.273, -67.7991, -33.143, -1.02856] [201.345, -24.0705, -49.9765, -31.1794] [-56.5885, 108.054, -133.396, -40.5998] [270.567, 130.381, 146.749, 51.5353] [-1.0, 4.01285, -1.0, -1.0] [True, True, True, True] 1 [-101.12] [-152.441] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 9 True 0.277205
36897 3 [-107.69, -46.7154, -15.9445] [-24.167, 8.60087, -26.171] [-6.4825, 41.4792, 96.2556] [111.979, 63.7637, 101.178] [2.38118, -1.0, -1.0] [True, True, True] 1 [137.716] [21.6976] ... 146.659790 22.631247 36.031197 13 51.945667 -31.295113 -22.553812 9 True 0.277205
36900 4 [36.9388, 27.8715, 35.8173, -23.6561] [35.4823, 34.4244, -21.416, -19.9391] [-53.2023, -18.4092, -60.2866, 21.9757] [74.1006, 48.2303, 74.2057, 38.408] [2.69591, 2.68174, -1.0, -1.0] [True, True, True, True] 1 [-31.3944] [30.6773] ... -31.754948 30.867546 0.585443 -13 -73.327507 -50.762230 -15.318438 5 True 0.271386
36906 3 [60.7559, -6.67839, 10.8504] [-23.811, -47.7918, -33.6265] [78.5075, 165.349, 5.05383] [103.611, 172.752, 36.5233] [-1.0, 1.40379, 4.02371] [True, True, True] 1 [-78.0497] [14.8932] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 19 True 0.036131
36924 1 [44.4301] [22.9213] [-53.3913] [74.282] [-1.0] [True] 1 [14.2536] [-34.5261] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 15 True 0.105165
36932 2 [-189.224, 161.017] [289.557, -116.198] [-246.186, -161.373] [425.525, 257.883] [-1.0, -1.0] [True, True] 1 [-23.1623] [-215.169] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 5 True 0.293529
36936 4 [130.752, -88.6566, 52.0555, -38.6537] [-25.4033, 73.8251, -74.2651, -30.6417] [-120.235, -287.902, -124.879, -128.15] [180.098, 310.715, 154.604, 137.425] [-1.0, 2.17192, -1.0, -1.0] [True, True, True, True] 1 [-44.8889] [42.0714] ... -42.990280 40.279636 -78.633194 -13 13.631536 -7.189492 9.089453 12 True 0.019638

4515 rows × 51 columns


In [6]:
ttbar = ttbar[ttbar['triggerIsoMu24']].reset_index(drop=True)

In [8]:
ttbar.head()


Out[8]:
NJet Jet_Px Jet_Py Jet_Pz Jet_E Jet_btag Jet_ID NMuon Muon_Px Muon_Py ... MClepton_px MClepton_py MClepton_pz MCleptonPDGid MCneutrino_px MCneutrino_py MCneutrino_pz NPrimaryVertices triggerIsoMu24 EventWeight
0 3 [-39.2862, 49.3453, -7.69329] [-32.3978, 5.15671, 41.4265] [-65.3803, -95.7182, -70.6679] [83.5105, 108.047, 82.8404] [2.42398, -1.0, -1.0] [True, True, True] 1 [-34.0692] [5.64295] ... -34.540741 5.717676 -10.324351 -13 43.943020 4.711743 -16.245531 4 True 0.279545
1 5 [184.591, 10.8253, -72.3689, -43.3258, 2.57552] [117.452, -166.958, 48.44, -5.91005, 42.4344] [379.501, -51.8267, 452.426, 33.3998, -77.0341] [438.532, 176.045, 461.069, 56.1953, 88.4542] [-1.0, 3.20327, -1.0, -1.0, -1.0] [True, True, True, True, True] 1 [42.3929] [-35.9437] ... 43.464474 -37.058472 -30.771671 13 -21.166794 19.246643 -13.815475 4 True 0.279545
2 5 [-92.04, 29.9482, 24.2262, 42.4436, -18.0844] [51.4753, -84.1564, 48.5667, -8.55752, 24.7327] [-52.9665, -181.678, -10.7411, -31.6566, -43.3... [121.717, 203.098, 57.0088, 53.9551, 53.2984] [-1.0, -1.0, -1.0, -1.0, -1.0] [True, True, True, True, True] 1 [26.0595] [0.139715] ... 27.888340 -2.613286 -114.308418 13 -43.849094 -32.715321 -46.295307 8 True 0.299037
3 3 [92.6861, -21.7365, 21.6693] [-126.292, 52.302, 31.9107] [88.5404, -12.5433, -18.5118] [180.864, 58.5981, 43.3167] [2.59038, 2.73967, -1.0] [True, True, True] 1 [-13.8492] [38.0815] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 1 True 0.270251
4 3 [-97.5345, -19.9276, 2.89748] [-13.185, -80.3989, 30.9127] [-11.907, 51.528, -91.2712] [101.636, 98.2916, 96.7967] [2.89632, -1.0, -1.0] [True, True, True] 1 [72.6027] [-9.59597] ... 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 16 True 0.010469

5 rows × 51 columns

Let's say we want to start by training a simple model that only relies on event-level variables. The ones available in these samples are:


In [9]:
# names of event-level branches
npart = ['NJet', 'NMuon', 'NElectron', 'NPhoton', 'MET_px', 'MET_py']

If you are a physicist, the first thing you might want to do is to plot them. We do so very easily using `matplotlib`:


In [10]:
for key in npart: # loop through the event-level branches and plot them on separate histograms
    
    # -- set font and canvas size (optional)
    matplotlib.rcParams.update({'font.size': 16})
    fig = plt.figure(figsize=(8,8), dpi=100)
    
    # -- declare common binning strategy (otherwise every histogram will have its own binning)
    bins = np.linspace(min(ttbar[key]), max(ttbar[key]) + 1, 30)
    
    # plot!
    _ = plt.hist(ttbar[key], histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$', linewidth=2)
    _ = plt.hist(dy[key], histtype='step', normed=False, bins=bins, weights=dy['EventWeight'], label='Drell Yan')
    _ = plt.hist(wj[key], histtype='step', normed=False, bins=bins, weights=wj['EventWeight'], label=r'$W$ + jets')
    _ = plt.hist(ww[key], histtype='step', normed=False, bins=bins, weights=ww['EventWeight'], label=r'$WW$')
    _ = plt.hist(wz[key], histtype='step', normed=False, bins=bins, weights=wz['EventWeight'], label=r'$WZ$')
    _ = plt.hist(zz[key], histtype='step', normed=False, bins=bins, weights=zz['EventWeight'], label=r'$ZZ$')
    _ = plt.hist(singletop[key], histtype='step', normed=False, bins=bins, weights=singletop['EventWeight'], label=r'single $t$')
    _ = plt.hist(qcd[key], histtype='step', normed=False, bins=bins, weights=qcd['EventWeight'], label='QCD', color='salmon')
    
    plt.xlabel(key)
    plt.yscale('log')
    plt.legend(loc='best')
    plt.show()


Stack all the backgrounds (shown only for one branch here):


In [11]:
import matplotlib.cm as cm
colors = cm.cool(np.linspace(0, 1, 7))

In [12]:
plt.figure(figsize=(7,7))
bins = np.linspace(0, 10, 11)
_ = plt.hist([
                dy['NJet'], wj['NJet'], ww['NJet'], wz['NJet'], zz['NJet'], singletop['NJet'], qcd['NJet']
             ],
             histtype='stepfilled',
             normed=False,
             bins=bins,
             weights=[
                 dy['EventWeight'], wj['EventWeight'], ww['EventWeight'], wz['EventWeight'], zz['EventWeight'], singletop['EventWeight'], qcd['EventWeight']
             ],
             label=[ r'Drell Yan', r'$W$ + jets', r'$WW$', r'$WZ$', r'$ZZ$', r'single $t$', 'QCD'
             ],
             stacked=True,
             color=colors)

plt.hist(ttbar['NJet'],
         histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$',
         linewidth=2, color='black', linestyle='dashed')

plt.yscale('log')
plt.xlabel('Number of Jets')
plt.legend()


Out[12]:
<matplotlib.legend.Legend at 0x122178c10>

Plot 2D histogram of $t\bar{t}$ MET_px and MET_py:


In [13]:
from matplotlib.colors import LogNorm

plt.figure(figsize=(5,5)) # square canvas
_ = plt.hist2d(ttbar['MET_px'], ttbar['MET_py'], bins=40, cmap='binary', norm=LogNorm())

# add decorations
plt.xlabel(r'MET$_{p_x}$') # LaTeX!
plt.ylabel(r'MET$_{p_y}$', fontsize=20)
_ = plt.colorbar()


Turn Data into ML-Compatible Inputs

What are ML-Compatible Inputs?

In the simplest scenarios of tabular data, Keras, just like scikit-learn, takes as inputs the following objects:

  • $X$

    an ndarray of dimensions [nb_examples, nb_features] containing the distributions to be used as inputs to the model. Each row is an object to classify, each column corresponds to a specific variable.
  • $y$

    an array of dimensions [nb_examples] containing the truth labels indicating the class each object belongs to (for classification), or the continuous target values (for regression).
  • $w$

    (optional) an array of dimensions [nb_examples] containing the weights to be assigned to each example

The indices of these objects must map to the same examples. In general, you will want to shuffle and split them into a training and test set.

Forward Pass

If we want to stack multiple dataframes into a single one, we can "concatenate" them. To simplify our classification problem, in this tutorial I will only focus on a three-class classification task, in which we aim to separate $t\bar{t}$ events from two of the main background sources: Drell Yan and W+jets events.


In [14]:
del ww, wz, zz, singletop, qcd, data

In [15]:
# -- this will only contain TTbar, Drell Yan and W+jets events (all branches)
df_full = pd.concat((ttbar, dy, wj), ignore_index=True)

However, we decided we were only going to train on event-level variables, so this is would be a more useful df:


In [16]:
# recall: npart was the list of branch names corresponding to event-level variables
df =  pd.concat((ttbar[npart], dy[npart], wj[npart]), ignore_index=True)
df.head()


Out[16]:
NJet NMuon NElectron NPhoton MET_px MET_py
0 3 1 0 0 20.238705 1.876128
1 5 1 0 0 -29.892015 16.498444
2 5 1 0 0 -24.056999 -46.160618
3 3 1 0 0 -30.039965 5.851555
4 3 1 0 0 4.518302 117.570068

Now, turn your new df the desired ndarray $X$ that can be directly used for ML applications using this handy pandas function:


In [17]:
X = df.as_matrix()

In [18]:
type(X)


Out[18]:
numpy.ndarray

In [19]:
X.shape


Out[19]:
(191981, 6)

The weight array $w$ can also easily be extracted using a similar procedure:


In [20]:
w =  pd.concat((ttbar['EventWeight'], dy['EventWeight'], wj['EventWeight']), ignore_index=True).values

In [22]:
type(w)


Out[22]:
numpy.ndarray

Finally, generate an array of truth labels $y$ to distinguish among the different classes in the problem:


In [23]:
y = []
for _df, ID in [(ttbar, 0), (dy, 1), (wj, 2)]:
    y.extend([ID] * _df.shape[0])
y = np.array(y)

In [24]:
# -- check what we created
y


Out[24]:
array([0, 0, 0, ..., 2, 2, 2])

Extra Pre-Processing Steps: Shuffling, Splitting into Train-Test, Scaling Inputs

The sklearn function `train_test_split` will randomly shuffle and automatically split all your objects into train and test subsets.


In [25]:
ix = range(X.shape[0]) # array of indices, just to keep track of them for safety reasons and future checks
X_train, X_test, y_train, y_test, w_train, w_test, ix_train, ix_test = train_test_split(X, y, w, ix, train_size=0.6)


/Users/mp744/venvs/keras2tf1/lib/python2.7/site-packages/sklearn/model_selection/_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
  FutureWarning)

It is common practice to scale the inputs to Neural Nets such that they have approximately similar ranges. Without this step, you might end up with variables whose values span very different orders of magnitude. This will create problems in the NN convergence due to very wild fluctuations in the magnitude of the internal weights. To take care of the scaling, we use the sklearn StandardScaler:


In [26]:
from sklearn.preprocessing import StandardScaler

In [27]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Training a NN using Keras

Neural Netwroks

  • Stack of tensor operators
  • Series of linear and non-linear transformations with the goal of finding optimal parameters to transform inputs and approximate targets
  • Deep nets > shallow nets

1. Multilayer Perceptron (MLP)

Dense

  • Densely connected layer, where all inputs are connected to all outputs
  • Core layer of an MLP
  • Linear transformation of the input vector $x \in \mathbb{R}^n$, which can be expressed using the $n \times m$ matrix $W \in \mathbb{R}^{n \times m}$ as:

    $u = Wx + b$
    where $b \in \mathbb{R}^m$ is the bias unit

  • All entries in both $W$ and $b$ are trainable

  • In Keras:

    keras.layers.Dense(
                    units,
                    activation=None,
                    use_bias=True,
                    kernel_initializer='glorot_uniform',
                    bias_initializer='zeros', kernel_regularizer=None,
                    bias_regularizer=None,
                    activity_regularizer=None,
                    kernel_constraint=None,
                    bias_constraint=None
    )
  • input_dim (or input_shape) are necessary arguments for the 1st layer of the net:

# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)

# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))

Activation Functions

  • Mathematical way of quantifying the activation state of a node $\rightarrow$ whether it's firing or not
  • Non-linear activation functions are the key to Deep Learning
  • Allow NNs to learn complex, non-linear transformations of the inputs
  • Some popular choices: Available activations: softmax, elu, selu, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear. Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, and can be found in the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.

Regularization

  • Series of methods to avoid overfitting
  • Mathematical encouragement towards simpler models
  • Explicitly penalize weights that get too large
  • Two main categories:
  • In Keras, Dropout is added in as a layer. It masks the outputs of the previous layer such that some of them will randomly become inactive and will not contribute to information propagation:
    keras.layers.Dropout(rate, noise_shape=None, seed=None)
    

Build a simple Keras classifier


In [28]:
from keras.models import Model
from keras.layers import Dense, Dropout, Input


Using TensorFlow backend.
2017-11-14 23:55:27.843006: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843031: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843040: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 23:55:27.843046: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

In [29]:
inputs = Input(shape=(X_train.shape[1], )) # placeholder

hidden = Dense(10, activation='relu')(inputs)
hidden = Dropout(0.2)(hidden)
hidden = Dense(20, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
hidden = Dense(30, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
outputs = Dense(3, activation='softmax')(hidden)
# last layer has to have the same dimensionality as the number of classes we want to predict, here 3

model = Model(inputs, outputs)

In [30]:
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 6)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                70        
_________________________________________________________________
dropout_1 (Dropout)          (None, 10)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 20)                220       
_________________________________________________________________
dropout_2 (Dropout)          (None, 20)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 30)                630       
_________________________________________________________________
dropout_3 (Dropout)          (None, 30)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 93        
=================================================================
Total params: 1,013
Trainable params: 1,013
Non-trainable params: 0
_________________________________________________________________

In [31]:
from keras.utils.vis_utils import plot_model
plot_model(model, 'temp.png', show_shapes=True)

Satisfied with your net?

Objective / Loss / Cost

  • Mathematical way of quantifying how much ŷ deviates from y
  • Dictates how strongly we penalize certain types of mistakes
  • Cost of inaccurately classifying an event (“cost function”)

Optimizers

(animations from A. Radford)

Now you need to declare what loss function and optimizer to use. We pass these as arguments to compile:


In [26]:
model.compile('adam', 'sparse_categorical_crossentropy')

Training


In [27]:
from keras.callbacks import EarlyStopping, ModelCheckpoint

In [28]:
from collections import Counter
Counter(y_test)
# uneven classes


Out[28]:
Counter({0: 1796, 1: 31146, 2: 43851})

In [31]:
print 'Training:'
try:
    model.fit(
        X_train, y_train, class_weight={ # rebalance class representation
            0 : 0.20 * (float(len(y)) / (y == 0).sum()),
            1 : 0.40 * (float(len(y)) / (y == 1).sum()),
            2 : 0.40 * (float(len(y)) / (y == 2).sum())
        },
        callbacks = [
            EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
            ModelCheckpoint('./models/tutorial-progress.h5', monitor='val_loss', verbose=True, save_best_only=True)
        ],
        epochs=20, 
        validation_split = 0.2,
        verbose=True
) 
except KeyboardInterrupt:
    print 'Training ended early.'


Training:
Train on 92150 samples, validate on 23038 samples
Epoch 1/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4397Epoch 00001: val_loss improved from inf to 0.42322, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4398 - val_loss: 0.4232
Epoch 2/20
92032/92150 [============================>.] - ETA: 0s - loss: 0.4388Epoch 00002: val_loss improved from 0.42322 to 0.41837, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4388 - val_loss: 0.4184
Epoch 3/20
91680/92150 [============================>.] - ETA: 0s - loss: 0.4400Epoch 00003: val_loss did not improve
92150/92150 [==============================] - 8s 88us/step - loss: 0.4401 - val_loss: 0.4207
Epoch 4/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4416Epoch 00004: val_loss improved from 0.41837 to 0.41834, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 86us/step - loss: 0.4416 - val_loss: 0.4183
Epoch 5/20
91648/92150 [============================>.] - ETA: 0s - loss: 0.4420Epoch 00005: val_loss improved from 0.41834 to 0.41505, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 87us/step - loss: 0.4429 - val_loss: 0.4150
Epoch 6/20
91680/92150 [============================>.] - ETA: 0s - loss: 0.4401Epoch 00006: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4405 - val_loss: 0.4218
Epoch 7/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4383Epoch 00007: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4383 - val_loss: 0.4208
Epoch 8/20
91552/92150 [============================>.] - ETA: 0s - loss: 0.4384Epoch 00008: val_loss did not improve
92150/92150 [==============================] - 8s 85us/step - loss: 0.4386 - val_loss: 0.4232
Epoch 9/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4391Epoch 00009: val_loss did not improve
92150/92150 [==============================] - 8s 87us/step - loss: 0.4391 - val_loss: 0.4194
Epoch 10/20
92096/92150 [============================>.] - ETA: 0s - loss: 0.4384Epoch 00010: val_loss improved from 0.41505 to 0.41338, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 87us/step - loss: 0.4383 - val_loss: 0.4134
Epoch 11/20
91936/92150 [============================>.] - ETA: 0s - loss: 0.4409Epoch 00011: val_loss did not improve
92150/92150 [==============================] - 8s 87us/step - loss: 0.4407 - val_loss: 0.4186
Epoch 12/20
92032/92150 [============================>.] - ETA: 0s - loss: 0.4404Epoch 00012: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4403 - val_loss: 0.4164
Epoch 13/20
92128/92150 [============================>.] - ETA: 0s - loss: 0.4405Epoch 00013: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4404 - val_loss: 0.4312
Epoch 14/20
91968/92150 [============================>.] - ETA: 0s - loss: 0.4389Epoch 00014: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4390 - val_loss: 0.4165
Epoch 15/20
91840/92150 [============================>.] - ETA: 0s - loss: 0.4416Epoch 00015: val_loss improved from 0.41338 to 0.41242, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 85us/step - loss: 0.4417 - val_loss: 0.4124
Epoch 16/20
91616/92150 [============================>.] - ETA: 0s - loss: 0.4365Epoch 00016: val_loss improved from 0.41242 to 0.41008, saving model to ./models/tutorial-progress.h5
92150/92150 [==============================] - 8s 85us/step - loss: 0.4366 - val_loss: 0.4101
Epoch 17/20
91744/92150 [============================>.] - ETA: 0s - loss: 0.4374Epoch 00017: val_loss did not improve
92150/92150 [==============================] - 8s 85us/step - loss: 0.4378 - val_loss: 0.4219
Epoch 18/20
91584/92150 [============================>.] - ETA: 0s - loss: 0.4395Epoch 00018: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4393 - val_loss: 0.4243
Epoch 19/20
91872/92150 [============================>.] - ETA: 0s - loss: 0.4410Epoch 00019: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4412 - val_loss: 0.4162
Epoch 20/20
91584/92150 [============================>.] - ETA: 0s - loss: 0.4349Epoch 00020: val_loss did not improve
92150/92150 [==============================] - 8s 86us/step - loss: 0.4348 - val_loss: 0.4209

In [32]:
# -- load in best network
model.load_weights('./models/tutorial-progress.h5')

# -- Save network weights and structure
print 'Saving model...'
model.save_weights('./models/tutorial.h5', overwrite=True)
json_string = model.to_json()
open('./models/tutorial.json', 'w').write(json_string)
print 'Done'


Saving model...
Done

Testing


In [33]:
print 'Testing...'
yhat = model.predict(X_test, verbose = True, batch_size = 512)


Testing...
76793/76793 [==============================] - 0s 3us/step

In [34]:
# predictions
yhat


Out[34]:
array([[  1.80225354e-04,   9.99819815e-01,   2.92606961e-09],
       [  6.79788063e-04,   2.35158458e-01,   7.64161706e-01],
       [  3.93254653e-04,   2.32965067e-01,   7.66641676e-01],
       ..., 
       [  9.69352317e-04,   2.21174046e-01,   7.77856588e-01],
       [  4.15727874e-04,   2.39787340e-01,   7.59796917e-01],
       [  2.84072390e-04,   2.67122030e-01,   7.32593954e-01]], dtype=float32)

In [35]:
# -- turn them into classes
yhat_cls = np.argmax(yhat, axis=1)

Visualize performance with confusion matrix:


In [36]:
import itertools
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [37]:
# Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, yhat_cls, sample_weight=w_test)
np.set_printoptions(precision=2)

In [38]:
plot_confusion_matrix(cnf_matrix, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
                      normalize=True,
                      title='Normalized confusion matrix')


Normalized confusion matrix
[[ 0.94  0.03  0.03]
 [ 0.03  0.67  0.3 ]
 [ 0.03  0.05  0.92]]

In [39]:
# signal eff = weighted tpr --> out of all signal events, what % for we classify as signal?
print 'Signal efficiency:', w_test[(y_test == 0) & (yhat_cls == 0)].sum() / w_test[y_test == 0].sum()

# bkg eff = weighted fpr --> out of all bkg events, what % do we classify as signal?
b_eff = w_test[(y_test != 0) & (yhat_cls == 0)].sum() / w_test[y_test != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff


Signal efficiency: 0.941405
Background efficiency: 0.0267202
Background rej: 37.4248125521

If you want to, you could further analyze the results by, for example, looking at all events that were assigned to class 0 ($t\bar{t}$) and learn more about their physical characteristics. This could provide insight into what the network is focusing in its decision-making process.


In [42]:
# -- events that got assigned to class 0
predicted_ttbar = df_full.iloc[np.array(ix_test)[yhat_cls == 0]]

In [43]:
predicted_ttbar['true flavor'] = y_test[yhat_cls == 0]


/home/ubuntu/venv/micky-work/lib/python2.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

In [44]:
predicted_ttbar.head()


Out[44]:
NJet Jet_Px Jet_Py Jet_Pz Jet_E Jet_btag Jet_ID NMuon Muon_Px Muon_Py ... MClepton_py MClepton_pz MCleptonPDGid MCneutrino_px MCneutrino_py MCneutrino_pz NPrimaryVertices triggerIsoMu24 EventWeight true flavor
71068 2 [-39.515, -32.7724] [-7.06055, 4.96038] [-59.9577, 1.15056] [72.4383, 33.8005] [-1.0, -1.0] [True, True] 1 [36.5116] [-27.2592] ... 0.000000 0.000000 0 0.000000 0.000000 0.000000 7 True 0.540515 1
1249 5 [60.9035, -111.827, -27.5385, 52.0547, 52.3307] [117.735, -9.45221, -51.6361, 13.4452, 7.99291] [151.038, -72.7721, -35.0566, 143.468, -8.85212] [202.914, 135.274, 70.2694, 153.809, 55.7241] [-1.0, -1.0, -1.0, -1.0, -1.0] [True, True, True, True, True] 0 [] [] ... -27.402386 -25.099463 13 -129.215714 -97.106895 43.848606 14 True 0.192474 0
143500 2 [-53.283, -38.1472] [-76.5362, -9.61194] [-86.3619, 32.1003] [128.014, 51.698] [-1.0, -1.0] [True, True] 1 [39.2392] [16.2889] ... 0.000000 0.000000 0 0.000000 0.000000 0.000000 10 True 2.393087 2
1186 3 [-17.622, -42.9345, -26.1204] [-59.8529, 38.5907, 16.7819] [-13.7983, -1.9199, 128.346] [64.3067, 58.404, 132.271] [1.52388, -1.0, -1.0] [True, True, True] 2 [63.2695, 34.3971] [3.4336, -23.4732] ... 0.000000 0.000000 0 0.000000 0.000000 0.000000 9 True 0.299981 0
28224 2 [49.646, 34.655] [19.6789, -29.2292] [-194.703, -144.563] [202.0, 151.752] [-1.0, -1.0] [True, True] 2 [-60.5106, -35.1742] [37.2119, -34.9203] ... 0.000000 0.000000 0 0.000000 0.000000 0.000000 20 True 0.005656 1

5 rows × 52 columns

In a 3-class classifier, the outputs lie on a 2D triangle with vertices at (1,0,0), (0,1,0), (0,0,1).


In [45]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')
for label, color, marker in [(0, 'r', '^'), (1, 'b', 'o'), (2, 'yellow', 'x')]:
    ax.scatter(yhat[y_test == label, 0], yhat[y_test == label, 1], yhat[y_test == label, 2], c=color, marker=marker, alpha=0.01)

ax.set_xlabel(r'$P(t\bar{t})$')
ax.set_ylabel('P(Drell Yan)')
ax.set_zlabel('P(W + jets)')
ax.view_init(30, 30)
plt.show()


To do some model introspection, add intermediate hidden layers to the outputs:


In [42]:
# model = Model(inputs, [outputs, hidden])
# model.compile('adam', 'sparse_categorical_crossentropy') 
# model.load_weights('./models/tutorial-progress.h5')
# _, last_layer = model.predict(X_test, verbose = True, batch_size = 512)

2. Recurrent Neural Networks

Let's look at a fancier way of solving the same classification problem. In this case we will use Recurrent Neural Netwroks. These allow you to process variable length sequences of data. For example, we can use them to describe an event in terms of the properties of its jets, whose number varies event by event. We could also describe the same event using the properties of its muons, or any other particle that appears in it. Because the number of particles of each type changes in each event, we need the flexibility of RNNs to process this type of data.

We will build an event-level classifier with 4 recurrent branches for the 4 different types of particles in the event.


In [66]:
jetvars = [key for key in df_full.keys() if key.startswith('Jet')]
jetvars.remove('Jet_ID')
print jetvars

muonvars = [key for key in df_full.keys() if key.startswith('Muon')]
print muonvars

photonvars = [key for key in df_full.keys() if key.startswith('Photon')]
print photonvars

electronvars = [key for key in df_full.keys() if key.startswith('Electron')]
print electronvars


['Jet_Px', 'Jet_Py', 'Jet_Pz', 'Jet_E', 'Jet_btag']
['Muon_Px', 'Muon_Py', 'Muon_Pz', 'Muon_E', 'Muon_Charge', 'Muon_Iso']
['Photon_Px', 'Photon_Py', 'Photon_Pz', 'Photon_E', 'Photon_Iso']
['Electron_Px', 'Electron_Py', 'Electron_Pz', 'Electron_E', 'Electron_Charge', 'Electron_Iso']

In [67]:
df_jets = df_full[jetvars].copy()
df_electrons = df_full[electronvars].copy()
df_muons = df_full[muonvars].copy()
df_photons = df_full[photonvars].copy()

In [68]:
num_electrons = max([len(e) for e in df_electrons.Electron_E])
num_electrons


Out[68]:
2

In [69]:
num_muons = max([len(m) for m in df_muons.Muon_E])
num_muons


Out[69]:
3

In [70]:
num_photons = max([len(gam) for gam in df_photons.Photon_E])
num_photons


Out[70]:
2

In [71]:
num_jets = max([len(j) for j in df_jets.Jet_E])
num_jets


Out[71]:
9

Just for the sake of variety, you can either have class labels like (0, 1, 2, 3, ...) and train using spare_categorical_crossentropy as you loss function -- like we did before -- or, equivalently, you can have class labels like ([1, 0, 0, 0, ...], [0, 1, 0, 0, ...], ...) and train using categorical_crossentropy.


In [72]:
from keras.utils.np_utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_train


Out[72]:
array([[ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       ..., 
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])

In [73]:
def create_stream(df, num_obj, sort_col):
   
    n_variables = df.shape[1]
    var_names = df.keys()

    data = np.zeros((df.shape[0], num_obj, n_variables), dtype='float32')

    # -- call functions to build X (a.k.a. data)                                                                                                                                                                      
    sort_objects(df, data, sort_col, num_obj)
    
    # -- ix_{train, test} from above or from previously stored ordering
    Xobj_train = data[ix_train]
    Xobj_test = data[ix_test]
    
    #print 'Scaling features ...'
    scale(Xobj_train, var_names, savevars=True) # scale training sample and save scaling
    scale(Xobj_test, var_names, savevars=False) # apply scaling to test set
    return Xobj_train, Xobj_test

In [74]:
def sort_objects(df, data, SORT_COL, max_nobj):
    ''' 
    sort objects using your preferred variable
    
    Args:
    -----
        df: a dataframe with event-level structure where each event is described by a sequence of jets, muons, etc.
        data: an array of shape (nb_events, nb_particles, nb_features)
        SORT_COL: a string representing the column to sort the objects by
        max_nobj: number of particles to cut off at. if >, truncate, else, -999 pad
    
    Returns:
    --------
        modifies @a data in place. Pads with -999
    
    '''
    import tqdm
    # i = event number, event = all the variables for that event 
    for i, event in tqdm.tqdm(df.iterrows(), total=df.shape[0]): 

        # objs = [[pt's], [eta's], ...] of particles for each event 
        objs = np.array(
                [v.tolist() for v in event.get_values()], 
                dtype='float32'
            )[:, (np.argsort(event[SORT_COL]))[::-1]]

        # total number of tracks per jet      
        nobjs = objs.shape[1] 

        # take all tracks unless there are more than n_tracks 
        data[i, :(min(nobjs, max_nobj)), :] = objs.T[:(min(nobjs, max_nobj)), :] 

        # default value for missing tracks 
        data[i, (min(nobjs, max_nobj)):, :  ] = -999

In [75]:
def scale(data, var_names, savevars, VAR_FILE_NAME='scaling.json'):
    ''' 
    Args:
    -----
        data: a numpy array of shape (nb_events, nb_particles, n_variables)
        var_names: list of keys to be used for the model
        savevars: bool -- True for training, False for testing
                  it decides whether we want to fit on data to find mean and std 
                  or if we want to use those stored in the json file 
    
    Returns:
    --------
        modifies data in place, writes out scaling dictionary
    '''
    import json
    
    scale = {}
    if savevars: 
        for v, name in enumerate(var_names):
            #print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
            f = data[:, :, v]
            slc = f[f != -999]
            m, s = slc.mean(), slc.std()
            slc -= m
            slc /= s
            data[:, :, v][f != -999] = slc.astype('float32')
            scale[name] = {'mean' : float(m), 'sd' : float(s)}
            
        with open(VAR_FILE_NAME, 'wb') as varfile:
            json.dump(scale, varfile)

    else:
        with open(VAR_FILE_NAME, 'rb') as varfile:
            varinfo = json.load(varfile)

        for v, name in enumerate(var_names):
            #print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
            f = data[:, :, v]
            slc = f[f != -999]
            m = varinfo[name]['mean']
            s = varinfo[name]['sd']
            slc -= m
            slc /= s
            data[:, :, v][f != -999] = slc.astype('float32')

In [77]:
Xjet_train, Xjet_test = create_stream(df_jets, num_jets, sort_col='Jet_btag')


100%|██████████| 191981/191981 [00:22<00:00, 8468.88it/s]

In [78]:
Xphoton_train, Xphoton_test = create_stream(df_photons, num_photons, sort_col='Photon_E')


100%|██████████| 191981/191981 [00:24<00:00, 7774.78it/s]

In [79]:
Xmuon_train, Xmuon_test = create_stream(df_muons, num_muons, sort_col='Muon_E')


100%|██████████| 191981/191981 [00:23<00:00, 8088.31it/s]

In [80]:
Xelectron_train, Xelectron_test = create_stream(df_electrons, num_electrons, sort_col='Electron_E')


100%|██████████| 191981/191981 [00:23<00:00, 8074.72it/s]

In [89]:
from keras.layers import Masking, GRU, concatenate

In [83]:
JET_SHAPE = Xjet_train.shape[1:]
MUON_SHAPE = Xmuon_train.shape[1:]
ELECTRON_SHAPE = Xelectron_train.shape[1:]
PHOTON_SHAPE = Xphoton_train.shape[1:]

In [84]:
jet_input = Input(JET_SHAPE)
jet_channel = Masking(mask_value=-999, name='jet_masking')(jet_input)
jet_channel = GRU(25, name='jet_gru')(jet_channel)
jet_channel = Dropout(0.3, name='jet_dropout')(jet_channel)

muon_input = Input(MUON_SHAPE)
muon_channel = Masking(mask_value=-999, name='muon_masking')(muon_input)
muon_channel = GRU(10, name='muon_gru')(muon_channel)
muon_channel = Dropout(0.3, name='muon_dropout')(muon_channel)

electron_input = Input(ELECTRON_SHAPE)
electron_channel = Masking(mask_value=-999, name='electron_masking')(electron_input)
electron_channel = GRU(10, name='electron_gru')(electron_channel)
electron_channel = Dropout(0.3, name='electron_dropout')(electron_channel)

photon_input = Input(PHOTON_SHAPE)
photon_channel = Masking(mask_value=-999, name='photon_masking')(photon_input)
photon_channel = GRU(10, name='photon_gru')(photon_channel)
photon_channel = Dropout(0.3, name='photon_dropout')(photon_channel)

In [90]:
combined = concatenate([jet_channel, muon_channel, electron_channel, photon_channel])
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined_outputs = Dense(3, activation='softmax')(combined)

In [91]:
combined_rnn = Model(inputs=[jet_input, muon_input, electron_input, photon_input], outputs=combined_outputs)
combined_rnn.summary()


__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            (None, 9, 5)         0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            (None, 3, 6)         0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 2, 6)         0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 2, 5)         0                                            
__________________________________________________________________________________________________
jet_masking (Masking)           (None, 9, 5)         0           input_2[0][0]                    
__________________________________________________________________________________________________
muon_masking (Masking)          (None, 3, 6)         0           input_3[0][0]                    
__________________________________________________________________________________________________
electron_masking (Masking)      (None, 2, 6)         0           input_4[0][0]                    
__________________________________________________________________________________________________
photon_masking (Masking)        (None, 2, 5)         0           input_5[0][0]                    
__________________________________________________________________________________________________
jet_gru (GRU)                   (None, 25)           2325        jet_masking[0][0]                
__________________________________________________________________________________________________
muon_gru (GRU)                  (None, 10)           510         muon_masking[0][0]               
__________________________________________________________________________________________________
electron_gru (GRU)              (None, 10)           510         electron_masking[0][0]           
__________________________________________________________________________________________________
photon_gru (GRU)                (None, 10)           480         photon_masking[0][0]             
__________________________________________________________________________________________________
jet_dropout (Dropout)           (None, 25)           0           jet_gru[0][0]                    
__________________________________________________________________________________________________
muon_dropout (Dropout)          (None, 10)           0           muon_gru[0][0]                   
__________________________________________________________________________________________________
electron_dropout (Dropout)      (None, 10)           0           electron_gru[0][0]               
__________________________________________________________________________________________________
photon_dropout (Dropout)        (None, 10)           0           photon_gru[0][0]                 
__________________________________________________________________________________________________
concatenate_3 (Concatenate)     (None, 55)           0           jet_dropout[0][0]                
                                                                 muon_dropout[0][0]               
                                                                 electron_dropout[0][0]           
                                                                 photon_dropout[0][0]             
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 64)           3584        concatenate_3[0][0]              
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 64)           0           dense_6[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 64)           4160        dropout_4[0][0]                  
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 64)           0           dense_7[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 64)           4160        dropout_5[0][0]                  
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 64)           0           dense_8[0][0]                    
__________________________________________________________________________________________________
dense_9 (Dense)                 (None, 3)            195         dropout_6[0][0]                  
==================================================================================================
Total params: 15,924
Trainable params: 15,924
Non-trainable params: 0
__________________________________________________________________________________________________

In [92]:
combined_rnn.compile('adam', 'categorical_crossentropy')

In [93]:
print 'Training:'
try:
    combined_rnn.fit([Xjet_train, Xmuon_train, Xelectron_train, Xphoton_train], y_train, batch_size=16,
            class_weight={
                0 : 0.20 * (float(len(y)) / (y == 0).sum()),
                1 : 0.40 * (float(len(y)) / (y == 1).sum()),
                2 : 0.40 * (float(len(y)) / (y == 2).sum())
        },
        callbacks = [
            EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
            ModelCheckpoint('./models/combinedrnn_tutorial-progress', monitor='val_loss', verbose=True, save_best_only=True)
        ],
    epochs=30, 
    validation_split = 0.2) 

except KeyboardInterrupt:
    print 'Training ended early.'


Training:
/home/ubuntu/venv/micky-work/lib/python2.7/site-packages/ipykernel_launcher.py:14: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  
Train on 92150 samples, validate on 23038 samples
Epoch 1/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.5260Epoch 00001: val_loss improved from inf to 0.43021, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 96s 1ms/step - loss: 0.5260 - val_loss: 0.4302
Epoch 2/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4370Epoch 00002: val_loss improved from 0.43021 to 0.42557, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 94s 1ms/step - loss: 0.4370 - val_loss: 0.4256
Epoch 3/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4305Epoch 00003: val_loss did not improve
92150/92150 [==============================] - 86s 932us/step - loss: 0.4306 - val_loss: 0.4304
Epoch 4/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4315Epoch 00004: val_loss did not improve
92150/92150 [==============================] - 84s 910us/step - loss: 0.4315 - val_loss: 0.4307
Epoch 5/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4333Epoch 00005: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4332 - val_loss: 0.4455
Epoch 6/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4316Epoch 00006: val_loss improved from 0.42557 to 0.42022, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 83s 897us/step - loss: 0.4316 - val_loss: 0.4202
Epoch 7/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4289Epoch 00007: val_loss did not improve
92150/92150 [==============================] - 84s 915us/step - loss: 0.4289 - val_loss: 0.4412
Epoch 8/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4317Epoch 00008: val_loss did not improve
92150/92150 [==============================] - 83s 898us/step - loss: 0.4317 - val_loss: 0.4434
Epoch 9/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4327Epoch 00009: val_loss did not improve
92150/92150 [==============================] - 83s 904us/step - loss: 0.4327 - val_loss: 0.4300
Epoch 10/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4270Epoch 00010: val_loss improved from 0.42022 to 0.41622, saving model to ./models/combinedrnn_tutorial-progress
92150/92150 [==============================] - 84s 915us/step - loss: 0.4270 - val_loss: 0.4162
Epoch 11/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4276Epoch 00011: val_loss did not improve
92150/92150 [==============================] - 85s 925us/step - loss: 0.4276 - val_loss: 0.4386
Epoch 12/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4262Epoch 00012: val_loss did not improve
92150/92150 [==============================] - 83s 897us/step - loss: 0.4262 - val_loss: 0.4508
Epoch 13/30
92128/92150 [============================>.] - ETA: 0s - loss: 0.4310Epoch 00013: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4310 - val_loss: 0.4336
Epoch 14/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4251Epoch 00014: val_loss did not improve
92150/92150 [==============================] - 82s 895us/step - loss: 0.4251 - val_loss: 0.4474
Epoch 15/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4289Epoch 00015: val_loss did not improve
92150/92150 [==============================] - 84s 907us/step - loss: 0.4289 - val_loss: 0.4300
Epoch 16/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4266Epoch 00016: val_loss did not improve
92150/92150 [==============================] - 83s 898us/step - loss: 0.4265 - val_loss: 0.4411
Epoch 17/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4262Epoch 00017: val_loss did not improve
92150/92150 [==============================] - 83s 895us/step - loss: 0.4262 - val_loss: 0.4372
Epoch 18/30
92096/92150 [============================>.] - ETA: 0s - loss: 0.4283Epoch 00018: val_loss did not improve
92150/92150 [==============================] - 83s 896us/step - loss: 0.4283 - val_loss: 0.4370
Epoch 19/30
92112/92150 [============================>.] - ETA: 0s - loss: 0.4239Epoch 00019: val_loss did not improve
92150/92150 [==============================] - 83s 905us/step - loss: 0.4239 - val_loss: 0.4401
Epoch 20/30
92144/92150 [============================>.] - ETA: 0s - loss: 0.4298Epoch 00020: val_loss did not improve
92150/92150 [==============================] - 83s 902us/step - loss: 0.4298 - val_loss: 0.4304
Epoch 00020: early stopping

In [95]:
# -- load in best network
combined_rnn.load_weights('./models/combinedrnn_tutorial-progress')
print 'Saving weights...'
combined_rnn.save_weights('./models/combinedrnn_tutorial.h5', overwrite=True)

json_string = combined_rnn.to_json()
open('./models/combinedrnn_tutorial.json', 'w').write(json_string)


Saving weights...

In [96]:
yhat_rnn = combined_rnn.predict([Xjet_test, Xmuon_test, Xelectron_test, Xphoton_test], verbose = True, batch_size = 512)


76793/76793 [==============================] - 2s 27us/step

In [101]:
# Compute confusion matrix
cnf_matrix_rnn = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(yhat_rnn, axis=1), sample_weight=w_test)
np.set_printoptions(precision=2)
plot_confusion_matrix(cnf_matrix_rnn, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
                      normalize=True,
                      title='Normalized confusion matrix')


Normalized confusion matrix
[[ 0.93  0.02  0.05]
 [ 0.03  0.6   0.37]
 [ 0.02  0.03  0.95]]

In [102]:
# -- turn the predictions back into class labels
yhat_rnn_cls = np.argmax(yhat_rnn, axis=1)

In [103]:
# -- do the same for the truth labels
y_test_cls = np.argmax(y_test, axis=1)

In [104]:
print 'Signal efficiency:', w_test[(y_test_cls == 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls == 0].sum()


Signal efficiency: 0.934622

In [105]:
b_eff = w_test[(y_test_cls != 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff


Background efficiency: 0.0179076
Background rej: 55.8421979781

Further Readings

  1. List of all IML meetings
  2. CERN Academic Training lecture series on Machine Learning (slides and videos)
  3. "Machine Learning Algorithms for b-Jet Tagging at the ATLAS Experiment" (ATL-PHYS-PROC-2017-211)
  4. "Deep Learning in Flavour Tagging at the ATLAS Experiment (ATL-PHYS-PROC-2017-191)
  5. "Machine and Deep Learning Techniques in Heavy-Ion Collisions with ALICE" (arXiv)
  6. "Deep learning approach to the Higgs boson CP measurement in H → τ τ decay and associated systematics" (arXiv)
  7. Luke de Oliveira's DataScience@LHC 2015 talk on "A ground-up construction of deep learning" (video)
  8. Peter Sadowski's DataScience@LHC 2015 deep learning tutorial (video)
  9. Pierre Baldi's DataScience@LHC 2015 talk on "Deep learning and its apllications to the natural sciences" (video)
  10. Yann LeCun's CERN colloquium (video)
  11. Fernanda Psihas's 2017 IML Workshop talk on "Deep convolutional networks for event reconstruction and particle tagging on NOvA and DUNE" (video)

In [ ]: