JetParton is a truth-level feature. These are:
We should not use the absolute JetP and JetPt. The fact that JetPT looks different for b-jets in the training is simply because we used some top events which have higher jet PT. In the real analysis, we want to measure in bins of JetPT the yields of b, c, and light. So we don't want to have the classifier using PT as a feature (or P).
SVM
: SV mass;SVMC
: SV corrected mass;SVR
: minimum radial distance from PV to any 2-body SV combo in the n-body SV;SVPT
: this is actually pt(SV)/pt(jet) so goes from 0 to 1;SVDR
: Delta R between the SV direction of flight and the jet axis;SVN
: number of tracks in the SV (>= 2);SVQ
: abs value of the net charge of tracks in the SV;SVFDChi2
: FD chi2 of the SV from the PV - we used log of this in our algorithm;SVSumIPChi2
: sum of IPChi2 of all tracks in the SV - we used log of this in our algorithm.JetQ
: This is a pt-weighted "jet charge" observable (each particle contributes its electric charge but weighted by the particle's PT relative to the jet PT). It roughly on average corresponds to the initial quark or gluon charge (which is either +- 1/3, +- 2/3 or 0).
JetSigma1
and JetSigma2
: These are the major and minor jet "widths" (the jet is a cone, but the energy is distributed within the cone unevenly, so one can define elliptical axes and define the spread along them).
JetMult
: Number of particles in the jet. In MC this is the best quark-gluon discriminant. In reality it should be a power one, but we know that the MC is overly optimistic.
JetPTHard
: Fraction of jet PT carried by highest PT particle in the jet.
JetPTD
: This is another pt weighted variable.
JetNDis
: I added this feature mainly for b vs c vs light. It's possible to have displaced tracks in the jet that do not enter the SV. This can especially be true for b jets. This feature may help in identifying b and c jets.
d['jetM'] = numpy.sqrt(d['JetE'] ** 2 - d['jetP'] ** 2 )
d['SV_jet_M_rel'] = d['SVM'] / d['jetM']
d['SV_jet_MC_rel'] = d['SVMC'] / d['jetM’]
Compare separately on:
data['log_SVFDChi2'] = numpy.log(data['SVFDChi2'].values)
data['log_SVSumIPChi2'] = numpy.log(data['SVSumIPChi2'].values)
data['SVM_diff'] = numpy.log(data['SVMC'] ** 2 - data['SVM']**2)
data['SV_theta'] = (data['SVMC'] ** 2 - data['SVM']**2) / data['SVPT']
data['SVM_rel'] = numpy.log(data['SVM'] / data['SVMC'] + 0.01)
data['SV_R_FD_rel'] = numpy.tanh(data['SVR'] / data['SVFDChi2'])
data['SV_Q_N_rel'] = 1. * data['SVQ'] / data['SVN']
data = data.drop(['SVFDChi2', 'SVSumIPChi2'], axis=1)