| notebook.community

Previous analysis

Jet Parton

JetParton is a truth-level feature. These are:

+-5 = b
+- 4 = c
+-3 = s
+-2 = u
+- 1 = d
0 = gluon.

Features

We should not use the absolute JetP and JetPt. The fact that JetPT looks different for b-jets in the training is simply because we used some top events which have higher jet PT. In the real analysis, we want to measure in bins of JetPT the yields of b, c, and light. So we don't want to have the classifier using PT as a feature (or P).

SV - secondary vertex

SVM: SV mass;
SVMC: SV corrected mass;
SVR: minimum radial distance from PV to any 2-body SV combo in the n-body SV;
SVPT: this is actually pt(SV)/pt(jet) so goes from 0 to 1;
SVDR: Delta R between the SV direction of flight and the jet axis;
SVN: number of tracks in the SV (>= 2);
SVQ: abs value of the net charge of tracks in the SV;
SVFDChi2: FD chi2 of the SV from the PV - we used log of this in our algorithm;
SVSumIPChi2: sum of IPChi2 of all tracks in the SV - we used log of this in our algorithm.

Jet features (used by CMS):

JetQ: This is a pt-weighted "jet charge" observable (each particle contributes its electric charge but weighted by the particle's PT relative to the jet PT). It roughly on average corresponds to the initial quark or gluon charge (which is either +- 1/3, +- 2/3 or 0).
JetSigma1 and JetSigma2: These are the major and minor jet "widths" (the jet is a cone, but the energy is distributed within the cone unevenly, so one can define elliptical axes and define the spread along them).
JetMult: Number of particles in the jet. In MC this is the best quark-gluon discriminant. In reality it should be a power one, but we know that the MC is overly optimistic.
JetPTHard: Fraction of jet PT carried by highest PT particle in the jet.
JetPTD: This is another pt weighted variable.
JetNDis: I added this feature mainly for b vs c vs light. It's possible to have displaced tracks in the jet that do not enter the SV. This can especially be true for b jets. This feature may help in identifying b and c jets.

Check if they help:

d['jetM'] = numpy.sqrt(d['JetE'] ** 2 - d['jetP'] ** 2 )
d['SV_jet_M_rel'] = d['SVM'] / d['jetM']
d['SV_jet_MC_rel'] = d['SVMC'] / d['jetM’]

Hard to calibrate:

JetPx,
JetPy,
JetPz,
JetE

Training TODO

Compare separately on:

SV features
jet features
SV+jet features, how this helps

data['log_SVFDChi2'] = numpy.log(data['SVFDChi2'].values)

data['log_SVSumIPChi2'] = numpy.log(data['SVSumIPChi2'].values)

data['SVM_diff'] = numpy.log(data['SVMC'] ** 2 - data['SVM']**2)

data['SV_theta'] = (data['SVMC'] ** 2 - data['SVM']**2) / data['SVPT']

data['SVM_rel'] = numpy.log(data['SVM'] / data['SVMC'] + 0.01)

data['SV_R_FD_rel'] = numpy.tanh(data['SVR'] / data['SVFDChi2'])

data['SV_Q_N_rel'] = 1. * data['SVQ'] / data['SVN']

data = data.drop(['SVFDChi2', 'SVSumIPChi2'], axis=1)