This is the second of two notebooks for this tutorial. If you are unfamiliar with data handling techniques in a Python environment, see the previous notebook.
Modular, powerful and intuitive Deep Learning Python library built on
Keras doesn’t itself handle any tensor ops; it relies on these tensor manipulation libraries
Developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
https://keras.io
From the Keras website:
In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
%matplotlib inline
Let's load the utility functions we defined in the previous notebook.
In [2]:
import glob
from root_numpy import root2array
from numpy.lib.recfunctions import stack_arrays
def root2pandas(files_path, tree_name, **kwargs):
'''
Args:
-----
files_path: a string like './data/*.root', for example
tree_name: a string like 'Collection_Tree' corresponding to the name of the folder inside the root
file that we want to open
kwargs: arguments taken by root2array, such as branches to consider, start, stop, step, etc
Returns:
--------
output_panda: a pandas dataframe like allbkg_df in which all the info from the root file will be stored
Note:
-----
if you are working with .root files that contain different branches, you might have to mask your data
in that case, return pd.DataFrame(ss.data)
'''
# -- create list of .root files to process
files = glob.glob(files_path)
# -- process ntuples into rec arrays
ss = stack_arrays([root2array(fpath, tree_name, **kwargs).view(np.recarray) for fpath in files])
try:
return pd.DataFrame(ss)
except Exception:
return pd.DataFrame(ss.data)
def flatten(column):
'''
Args:
-----
column: a column of a pandas df whose entries are lists (or regular entries -- in which case nothing is done)
e.g.: my_df['some_variable']
Returns:
--------
flattened out version of the column.
For example, it will turn:
[1791, 2719, 1891]
[1717, 1, 0, 171, 9181, 537, 12]
[82, 11]
...
into:
1791, 2719, 1891, 1717, 1, 0, 171, 9181, 537, 12, 82, 11, ...
'''
try:
return np.array([v for e in column for v in e])
except (TypeError, ValueError):
return column
Load all data in memory into dataframes.
In [3]:
# MC signal:
ttbar = root2pandas('files/ttbar.root', 'events')
# MC backgrounds:
dy = root2pandas('files/dy.root', 'events')
wj = root2pandas('files/wjets.root', 'events')
ww = root2pandas('files/ww.root', 'events')
wz = root2pandas('files/wz.root', 'events')
zz = root2pandas('files/zz.root', 'events')
singletop = root2pandas('files/single_top.root', 'events')
qcd = root2pandas('files/qcd.root', 'events')
# data:
data = root2pandas('files/data.root', 'events')
All samples except from the $t\bar{t}$ one are produced using muon trigger. To enforce that on the $t\bar{t}$ sample as well, we use the branch called triggerIsoMu24 which contains a boolean to indicate if any event would have passed the specified trigger.
In [4]:
# -- step by step:
ttbar['triggerIsoMu24']
Out[4]:
In [5]:
ttbar[ttbar['triggerIsoMu24']] # slicing doesn't automatically reset the indices
Out[5]:
In [6]:
ttbar = ttbar[ttbar['triggerIsoMu24']].reset_index(drop=True)
In [8]:
ttbar.head()
Out[8]:
Let's say we want to start by training a simple model that only relies on event-level variables. The ones available in these samples are:
In [9]:
# names of event-level branches
npart = ['NJet', 'NMuon', 'NElectron', 'NPhoton', 'MET_px', 'MET_py']
If you are a physicist, the first thing you might want to do is to plot them. We do so very easily using `matplotlib`:
In [10]:
for key in npart: # loop through the event-level branches and plot them on separate histograms
# -- set font and canvas size (optional)
matplotlib.rcParams.update({'font.size': 16})
fig = plt.figure(figsize=(8,8), dpi=100)
# -- declare common binning strategy (otherwise every histogram will have its own binning)
bins = np.linspace(min(ttbar[key]), max(ttbar[key]) + 1, 30)
# plot!
_ = plt.hist(ttbar[key], histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$', linewidth=2)
_ = plt.hist(dy[key], histtype='step', normed=False, bins=bins, weights=dy['EventWeight'], label='Drell Yan')
_ = plt.hist(wj[key], histtype='step', normed=False, bins=bins, weights=wj['EventWeight'], label=r'$W$ + jets')
_ = plt.hist(ww[key], histtype='step', normed=False, bins=bins, weights=ww['EventWeight'], label=r'$WW$')
_ = plt.hist(wz[key], histtype='step', normed=False, bins=bins, weights=wz['EventWeight'], label=r'$WZ$')
_ = plt.hist(zz[key], histtype='step', normed=False, bins=bins, weights=zz['EventWeight'], label=r'$ZZ$')
_ = plt.hist(singletop[key], histtype='step', normed=False, bins=bins, weights=singletop['EventWeight'], label=r'single $t$')
_ = plt.hist(qcd[key], histtype='step', normed=False, bins=bins, weights=qcd['EventWeight'], label='QCD', color='salmon')
plt.xlabel(key)
plt.yscale('log')
plt.legend(loc='best')
plt.show()
Stack all the backgrounds (shown only for one branch here):
In [11]:
import matplotlib.cm as cm
colors = cm.cool(np.linspace(0, 1, 7))
In [12]:
plt.figure(figsize=(7,7))
bins = np.linspace(0, 10, 11)
_ = plt.hist([
dy['NJet'], wj['NJet'], ww['NJet'], wz['NJet'], zz['NJet'], singletop['NJet'], qcd['NJet']
],
histtype='stepfilled',
normed=False,
bins=bins,
weights=[
dy['EventWeight'], wj['EventWeight'], ww['EventWeight'], wz['EventWeight'], zz['EventWeight'], singletop['EventWeight'], qcd['EventWeight']
],
label=[ r'Drell Yan', r'$W$ + jets', r'$WW$', r'$WZ$', r'$ZZ$', r'single $t$', 'QCD'
],
stacked=True,
color=colors)
plt.hist(ttbar['NJet'],
histtype='step', normed=False, bins=bins, weights=ttbar['EventWeight'], label=r'$t\overline{t}$',
linewidth=2, color='black', linestyle='dashed')
plt.yscale('log')
plt.xlabel('Number of Jets')
plt.legend()
Out[12]:
Plot 2D histogram of $t\bar{t}$ MET_px and MET_py:
In [13]:
from matplotlib.colors import LogNorm
plt.figure(figsize=(5,5)) # square canvas
_ = plt.hist2d(ttbar['MET_px'], ttbar['MET_py'], bins=40, cmap='binary', norm=LogNorm())
# add decorations
plt.xlabel(r'MET$_{p_x}$') # LaTeX!
plt.ylabel(r'MET$_{p_y}$', fontsize=20)
_ = plt.colorbar()
In the simplest scenarios of tabular data, Keras, just like scikit-learn, takes as inputs the following objects:
ndarray of dimensions [nb_examples, nb_features] containing the distributions to be used as inputs to the model. Each row is an object to classify, each column corresponds to a specific variable.array of dimensions [nb_examples] containing the truth labels indicating the class each object belongs to (for classification), or the continuous target values (for regression).array of dimensions [nb_examples] containing the weights to be assigned to each exampleThe indices of these objects must map to the same examples. In general, you will want to shuffle and split them into a training and test set.
If we want to stack multiple dataframes into a single one, we can "concatenate" them. To simplify our classification problem, in this tutorial I will only focus on a three-class classification task, in which we aim to separate $t\bar{t}$ events from two of the main background sources: Drell Yan and W+jets events.
In [14]:
del ww, wz, zz, singletop, qcd, data
In [15]:
# -- this will only contain TTbar, Drell Yan and W+jets events (all branches)
df_full = pd.concat((ttbar, dy, wj), ignore_index=True)
However, we decided we were only going to train on event-level variables, so this is would be a more useful df:
In [16]:
# recall: npart was the list of branch names corresponding to event-level variables
df = pd.concat((ttbar[npart], dy[npart], wj[npart]), ignore_index=True)
df.head()
Out[16]:
Now, turn your new df the desired ndarray $X$ that can be directly used for ML applications using this handy pandas function:
In [17]:
X = df.as_matrix()
In [18]:
type(X)
Out[18]:
In [19]:
X.shape
Out[19]:
The weight array $w$ can also easily be extracted using a similar procedure:
In [20]:
w = pd.concat((ttbar['EventWeight'], dy['EventWeight'], wj['EventWeight']), ignore_index=True).values
In [22]:
type(w)
Out[22]:
Finally, generate an array of truth labels $y$ to distinguish among the different classes in the problem:
In [23]:
y = []
for _df, ID in [(ttbar, 0), (dy, 1), (wj, 2)]:
y.extend([ID] * _df.shape[0])
y = np.array(y)
In [24]:
# -- check what we created
y
Out[24]:
The sklearn function `train_test_split` will randomly shuffle and automatically split all your objects into train and test subsets.
In [25]:
ix = range(X.shape[0]) # array of indices, just to keep track of them for safety reasons and future checks
X_train, X_test, y_train, y_test, w_train, w_test, ix_train, ix_test = train_test_split(X, y, w, ix, train_size=0.6)
It is common practice to scale the inputs to Neural Nets such that they have approximately similar ranges. Without this step, you might end up with variables whose values span very different orders of magnitude. This will create problems in the NN convergence due to very wild fluctuations in the magnitude of the internal weights. To take care of the scaling, we use the sklearn StandardScaler:
In [26]:
from sklearn.preprocessing import StandardScaler
In [27]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Linear transformation of the input vector $x \in \mathbb{R}^n$, which can be expressed using the $n \times m$ matrix $W \in \mathbb{R}^{n \times m}$ as:
All entries in both $W$ and $b$ are trainable
In Keras:
keras.layers.Dense(
units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None
)
input_dim (or input_shape) are necessary arguments for the 1st layer of the net:
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
keras.layers.advanced_activations. These include PReLU and LeakyReLU.keras.layers.Dropout(rate, noise_shape=None, seed=None)
In [28]:
from keras.models import Model
from keras.layers import Dense, Dropout, Input
In [29]:
inputs = Input(shape=(X_train.shape[1], )) # placeholder
hidden = Dense(10, activation='relu')(inputs)
hidden = Dropout(0.2)(hidden)
hidden = Dense(20, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
hidden = Dense(30, activation='relu')(hidden)
hidden = Dropout(0.2)(hidden)
outputs = Dense(3, activation='softmax')(hidden)
# last layer has to have the same dimensionality as the number of classes we want to predict, here 3
model = Model(inputs, outputs)
In [30]:
model.summary()
In [31]:
from keras.utils.vis_utils import plot_model
plot_model(model, 'temp.png', show_shapes=True)
Satisfied with your net?
(animations from A. Radford)
Now you need to declare what loss function and optimizer to use. We pass these as arguments to compile:
In [26]:
model.compile('adam', 'sparse_categorical_crossentropy')
In [27]:
from keras.callbacks import EarlyStopping, ModelCheckpoint
In [28]:
from collections import Counter
Counter(y_test)
# uneven classes
Out[28]:
In [31]:
print 'Training:'
try:
model.fit(
X_train, y_train, class_weight={ # rebalance class representation
0 : 0.20 * (float(len(y)) / (y == 0).sum()),
1 : 0.40 * (float(len(y)) / (y == 1).sum()),
2 : 0.40 * (float(len(y)) / (y == 2).sum())
},
callbacks = [
EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
ModelCheckpoint('./models/tutorial-progress.h5', monitor='val_loss', verbose=True, save_best_only=True)
],
epochs=20,
validation_split = 0.2,
verbose=True
)
except KeyboardInterrupt:
print 'Training ended early.'
In [32]:
# -- load in best network
model.load_weights('./models/tutorial-progress.h5')
# -- Save network weights and structure
print 'Saving model...'
model.save_weights('./models/tutorial.h5', overwrite=True)
json_string = model.to_json()
open('./models/tutorial.json', 'w').write(json_string)
print 'Done'
In [33]:
print 'Testing...'
yhat = model.predict(X_test, verbose = True, batch_size = 512)
In [34]:
# predictions
yhat
Out[34]:
In [35]:
# -- turn them into classes
yhat_cls = np.argmax(yhat, axis=1)
Visualize performance with confusion matrix:
In [36]:
import itertools
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
In [37]:
# Compute confusion matrix
cnf_matrix = confusion_matrix(y_test, yhat_cls, sample_weight=w_test)
np.set_printoptions(precision=2)
In [38]:
plot_confusion_matrix(cnf_matrix, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
normalize=True,
title='Normalized confusion matrix')
In [39]:
# signal eff = weighted tpr --> out of all signal events, what % for we classify as signal?
print 'Signal efficiency:', w_test[(y_test == 0) & (yhat_cls == 0)].sum() / w_test[y_test == 0].sum()
# bkg eff = weighted fpr --> out of all bkg events, what % do we classify as signal?
b_eff = w_test[(y_test != 0) & (yhat_cls == 0)].sum() / w_test[y_test != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff
If you want to, you could further analyze the results by, for example, looking at all events that were assigned to class 0 ($t\bar{t}$) and learn more about their physical characteristics. This could provide insight into what the network is focusing in its decision-making process.
In [42]:
# -- events that got assigned to class 0
predicted_ttbar = df_full.iloc[np.array(ix_test)[yhat_cls == 0]]
In [43]:
predicted_ttbar['true flavor'] = y_test[yhat_cls == 0]
In [44]:
predicted_ttbar.head()
Out[44]:
In a 3-class classifier, the outputs lie on a 2D triangle with vertices at (1,0,0), (0,1,0), (0,0,1).
In [45]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')
for label, color, marker in [(0, 'r', '^'), (1, 'b', 'o'), (2, 'yellow', 'x')]:
ax.scatter(yhat[y_test == label, 0], yhat[y_test == label, 1], yhat[y_test == label, 2], c=color, marker=marker, alpha=0.01)
ax.set_xlabel(r'$P(t\bar{t})$')
ax.set_ylabel('P(Drell Yan)')
ax.set_zlabel('P(W + jets)')
ax.view_init(30, 30)
plt.show()
To do some model introspection, add intermediate hidden layers to the outputs:
In [42]:
# model = Model(inputs, [outputs, hidden])
# model.compile('adam', 'sparse_categorical_crossentropy')
# model.load_weights('./models/tutorial-progress.h5')
# _, last_layer = model.predict(X_test, verbose = True, batch_size = 512)
Let's look at a fancier way of solving the same classification problem. In this case we will use Recurrent Neural Netwroks. These allow you to process variable length sequences of data. For example, we can use them to describe an event in terms of the properties of its jets, whose number varies event by event. We could also describe the same event using the properties of its muons, or any other particle that appears in it. Because the number of particles of each type changes in each event, we need the flexibility of RNNs to process this type of data.
We will build an event-level classifier with 4 recurrent branches for the 4 different types of particles in the event.
In [66]:
jetvars = [key for key in df_full.keys() if key.startswith('Jet')]
jetvars.remove('Jet_ID')
print jetvars
muonvars = [key for key in df_full.keys() if key.startswith('Muon')]
print muonvars
photonvars = [key for key in df_full.keys() if key.startswith('Photon')]
print photonvars
electronvars = [key for key in df_full.keys() if key.startswith('Electron')]
print electronvars
In [67]:
df_jets = df_full[jetvars].copy()
df_electrons = df_full[electronvars].copy()
df_muons = df_full[muonvars].copy()
df_photons = df_full[photonvars].copy()
In [68]:
num_electrons = max([len(e) for e in df_electrons.Electron_E])
num_electrons
Out[68]:
In [69]:
num_muons = max([len(m) for m in df_muons.Muon_E])
num_muons
Out[69]:
In [70]:
num_photons = max([len(gam) for gam in df_photons.Photon_E])
num_photons
Out[70]:
In [71]:
num_jets = max([len(j) for j in df_jets.Jet_E])
num_jets
Out[71]:
Just for the sake of variety, you can either have class labels like (0, 1, 2, 3, ...) and train using spare_categorical_crossentropy as you loss function -- like we did before -- or, equivalently, you can have class labels like ([1, 0, 0, 0, ...], [0, 1, 0, 0, ...], ...) and train using categorical_crossentropy.
In [72]:
from keras.utils.np_utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
y_train
Out[72]:
In [73]:
def create_stream(df, num_obj, sort_col):
n_variables = df.shape[1]
var_names = df.keys()
data = np.zeros((df.shape[0], num_obj, n_variables), dtype='float32')
# -- call functions to build X (a.k.a. data)
sort_objects(df, data, sort_col, num_obj)
# -- ix_{train, test} from above or from previously stored ordering
Xobj_train = data[ix_train]
Xobj_test = data[ix_test]
#print 'Scaling features ...'
scale(Xobj_train, var_names, savevars=True) # scale training sample and save scaling
scale(Xobj_test, var_names, savevars=False) # apply scaling to test set
return Xobj_train, Xobj_test
In [74]:
def sort_objects(df, data, SORT_COL, max_nobj):
'''
sort objects using your preferred variable
Args:
-----
df: a dataframe with event-level structure where each event is described by a sequence of jets, muons, etc.
data: an array of shape (nb_events, nb_particles, nb_features)
SORT_COL: a string representing the column to sort the objects by
max_nobj: number of particles to cut off at. if >, truncate, else, -999 pad
Returns:
--------
modifies @a data in place. Pads with -999
'''
import tqdm
# i = event number, event = all the variables for that event
for i, event in tqdm.tqdm(df.iterrows(), total=df.shape[0]):
# objs = [[pt's], [eta's], ...] of particles for each event
objs = np.array(
[v.tolist() for v in event.get_values()],
dtype='float32'
)[:, (np.argsort(event[SORT_COL]))[::-1]]
# total number of tracks per jet
nobjs = objs.shape[1]
# take all tracks unless there are more than n_tracks
data[i, :(min(nobjs, max_nobj)), :] = objs.T[:(min(nobjs, max_nobj)), :]
# default value for missing tracks
data[i, (min(nobjs, max_nobj)):, : ] = -999
In [75]:
def scale(data, var_names, savevars, VAR_FILE_NAME='scaling.json'):
'''
Args:
-----
data: a numpy array of shape (nb_events, nb_particles, n_variables)
var_names: list of keys to be used for the model
savevars: bool -- True for training, False for testing
it decides whether we want to fit on data to find mean and std
or if we want to use those stored in the json file
Returns:
--------
modifies data in place, writes out scaling dictionary
'''
import json
scale = {}
if savevars:
for v, name in enumerate(var_names):
#print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
f = data[:, :, v]
slc = f[f != -999]
m, s = slc.mean(), slc.std()
slc -= m
slc /= s
data[:, :, v][f != -999] = slc.astype('float32')
scale[name] = {'mean' : float(m), 'sd' : float(s)}
with open(VAR_FILE_NAME, 'wb') as varfile:
json.dump(scale, varfile)
else:
with open(VAR_FILE_NAME, 'rb') as varfile:
varinfo = json.load(varfile)
for v, name in enumerate(var_names):
#print 'Scaling feature %s of %s (%s).' % (v + 1, len(var_names), name)
f = data[:, :, v]
slc = f[f != -999]
m = varinfo[name]['mean']
s = varinfo[name]['sd']
slc -= m
slc /= s
data[:, :, v][f != -999] = slc.astype('float32')
In [77]:
Xjet_train, Xjet_test = create_stream(df_jets, num_jets, sort_col='Jet_btag')
In [78]:
Xphoton_train, Xphoton_test = create_stream(df_photons, num_photons, sort_col='Photon_E')
In [79]:
Xmuon_train, Xmuon_test = create_stream(df_muons, num_muons, sort_col='Muon_E')
In [80]:
Xelectron_train, Xelectron_test = create_stream(df_electrons, num_electrons, sort_col='Electron_E')
In [89]:
from keras.layers import Masking, GRU, concatenate
In [83]:
JET_SHAPE = Xjet_train.shape[1:]
MUON_SHAPE = Xmuon_train.shape[1:]
ELECTRON_SHAPE = Xelectron_train.shape[1:]
PHOTON_SHAPE = Xphoton_train.shape[1:]
In [84]:
jet_input = Input(JET_SHAPE)
jet_channel = Masking(mask_value=-999, name='jet_masking')(jet_input)
jet_channel = GRU(25, name='jet_gru')(jet_channel)
jet_channel = Dropout(0.3, name='jet_dropout')(jet_channel)
muon_input = Input(MUON_SHAPE)
muon_channel = Masking(mask_value=-999, name='muon_masking')(muon_input)
muon_channel = GRU(10, name='muon_gru')(muon_channel)
muon_channel = Dropout(0.3, name='muon_dropout')(muon_channel)
electron_input = Input(ELECTRON_SHAPE)
electron_channel = Masking(mask_value=-999, name='electron_masking')(electron_input)
electron_channel = GRU(10, name='electron_gru')(electron_channel)
electron_channel = Dropout(0.3, name='electron_dropout')(electron_channel)
photon_input = Input(PHOTON_SHAPE)
photon_channel = Masking(mask_value=-999, name='photon_masking')(photon_input)
photon_channel = GRU(10, name='photon_gru')(photon_channel)
photon_channel = Dropout(0.3, name='photon_dropout')(photon_channel)
In [90]:
combined = concatenate([jet_channel, muon_channel, electron_channel, photon_channel])
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined = Dense(64, activation = 'relu')(combined)
combined = Dropout(0.3)(combined)
combined_outputs = Dense(3, activation='softmax')(combined)
In [91]:
combined_rnn = Model(inputs=[jet_input, muon_input, electron_input, photon_input], outputs=combined_outputs)
combined_rnn.summary()
In [92]:
combined_rnn.compile('adam', 'categorical_crossentropy')
In [93]:
print 'Training:'
try:
combined_rnn.fit([Xjet_train, Xmuon_train, Xelectron_train, Xphoton_train], y_train, batch_size=16,
class_weight={
0 : 0.20 * (float(len(y)) / (y == 0).sum()),
1 : 0.40 * (float(len(y)) / (y == 1).sum()),
2 : 0.40 * (float(len(y)) / (y == 2).sum())
},
callbacks = [
EarlyStopping(verbose=True, patience=10, monitor='val_loss'),
ModelCheckpoint('./models/combinedrnn_tutorial-progress', monitor='val_loss', verbose=True, save_best_only=True)
],
epochs=30,
validation_split = 0.2)
except KeyboardInterrupt:
print 'Training ended early.'
In [95]:
# -- load in best network
combined_rnn.load_weights('./models/combinedrnn_tutorial-progress')
print 'Saving weights...'
combined_rnn.save_weights('./models/combinedrnn_tutorial.h5', overwrite=True)
json_string = combined_rnn.to_json()
open('./models/combinedrnn_tutorial.json', 'w').write(json_string)
In [96]:
yhat_rnn = combined_rnn.predict([Xjet_test, Xmuon_test, Xelectron_test, Xphoton_test], verbose = True, batch_size = 512)
In [101]:
# Compute confusion matrix
cnf_matrix_rnn = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(yhat_rnn, axis=1), sample_weight=w_test)
np.set_printoptions(precision=2)
plot_confusion_matrix(cnf_matrix_rnn, classes=[r'$t\overline{t}$', 'Drell Yan', r'$W$ + jets'],
normalize=True,
title='Normalized confusion matrix')
In [102]:
# -- turn the predictions back into class labels
yhat_rnn_cls = np.argmax(yhat_rnn, axis=1)
In [103]:
# -- do the same for the truth labels
y_test_cls = np.argmax(y_test, axis=1)
In [104]:
print 'Signal efficiency:', w_test[(y_test_cls == 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls == 0].sum()
In [105]:
b_eff = w_test[(y_test_cls != 0) & (yhat_rnn_cls == 0)].sum() / w_test[y_test_cls != 0].sum()
print 'Background efficiency:', b_eff
print 'Background rej:', 1 / b_eff
In [ ]: