Inclusive B-tagging

Authors:

Tatiana Likhomanenko (contact)
Alexey Rogozhnikov
Denis Derkach

Data (from working group):

real data $B^{\pm} \to J/\psi K^{\pm}$ (RECO 14), 2012
real data $B_d \to J/\psi K^*$ (RECO 14), 2012 (use EPM for assymetry estimation)

Apply sPlot to obtain sWeight ~ P(B)

Monte Carlo:

MC $B^{\pm} \to J/\psi K^{\pm}$ for training
MC for cross check
- $B_d \to J\psi K_s$
- $B_d \to J\psi K^*$



In [2]:

    
from IPython.display import Image
import pandas

Old tagging

https://github.com/tata-antares/tagging_LHCb/blob/master/old-tagging.ipynb

We first tested the current algorithm (OS taggers: muon, electron, kaon, vertex). TMVA original method was compared with XGBoost.

isotonic symmetric calibration
use different train-test divisions to calculate $D^2$
compute mean and std
detail see below (the same formulas)

Data

Taggers: electron, muon, kaon and vertex



In [3]:

    
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging-parts.csv').drop(['AUC, with untag', '$\Delta$ AUC, with untag'], axis=1)









    Out[3]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
    
  
  
    
      0
      vtx_xgboost
      18.2008
      0.0495
      0.0499
      0.0013
      0.9084
      0.0234
    
    
      1
      vtx_tmva
      18.2008
      0.0495
      0.0425
      0.0008
      0.7727
      0.0150
    
    
      2
      $K$_xgboost
      19.2642
      0.0509
      0.0520
      0.0009
      1.0009
      0.0173
    
    
      3
      $K$_tmva
      19.2642
      0.0509
      0.0480
      0.0011
      0.9237
      0.0214
    
    
      4
      $e$_xgboost
      1.8382
      0.0157
      0.1674
      0.0068
      0.3077
      0.0127
    
    
      5
      $e$_tmva
      1.8382
      0.0157
      0.1609
      0.0068
      0.2957
      0.0127
    
    
      6
      $\mu$_xgboost
      5.7366
      0.0278
      0.1661
      0.0038
      0.9527
      0.0224
    
    
      7
      $\mu$_tmva
      5.7366
      0.0278
      0.1610
      0.0032
      0.9234
      0.0191

MC

Taggers: electron, muon, kaon and vertex



In [4]:

    
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging-parts-MC.csv').drop(['AUC, with untag', '$\Delta$ AUC, with untag'], axis=1)









    Out[4]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
    
  
  
    
      0
      vtx_xgboost
      9.8330
      0.0257
      0.1150
      8.7538e-05
      1.1306
      0.0031
    
    
      1
      vtx_tmva
      9.8330
      0.0257
      0.1080
      7.1570e-04
      1.0618
      0.0076
    
    
      2
      $K$_xgboost
      17.7597
      0.0345
      0.1124
      5.4507e-05
      1.9958
      0.0040
    
    
      3
      $K$_tmva
      17.7597
      0.0345
      0.1064
      3.0603e-05
      1.8905
      0.0037
    
    
      4
      $e$_xgboost
      2.0230
      0.0117
      0.1303
      2.1707e-04
      0.2636
      0.0016
    
    
      5
      $e$_tmva
      2.0230
      0.0117
      0.1212
      2.3217e-04
      0.2452
      0.0015
    
    
      6
      $\mu$_xgboost
      5.0538
      0.0184
      0.1639
      1.1755e-04
      0.8283
      0.0031
    
    
      7
      $\mu$_tmva
      5.0538
      0.0184
      0.1597
      7.0126e-05
      0.8070
      0.0030

Taggers combination

We then tested a combination with two calibrations for individual taggers:

isotonic regression
logistic regression.

Combination was calibrated using isotonic regression.



In [5]:

    
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[5]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      iso-xgb_combined
      34.4404
      0.0681
      0.0683
      0.0016
      2.3531
      0.0539
      56.9479
    
    
      1
      iso-tmva_combined
      34.4405
      0.0681
      0.0666
      0.0019
      2.2941
      0.0643
      56.8452
    
    
      2
      log-xgb_combined
      34.4405
      0.0681
      0.0717
      0.0008
      2.4710
      0.0289
      56.9369
    
    
      3
      log-tmva_combined
      34.4405
      0.0681
      0.0672
      0.0009
      2.3137
      0.0319
      56.8070



In [6]:

    
pandas.set_option('display.precision', 4)
pandas.read_csv('img/old-tagging-MC.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[6]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      mu
      5.0538
      0.0184
      0.1581
      0.0014
      0.7993
      0.0075
      51.8453
    
    
      1
      vtx
      12.2650
      0.0287
      0.0843
      0.0004
      1.0344
      0.0053
      53.2321
    
    
      2
      K
      17.7597
      0.0345
      0.1055
      0.0004
      1.8743
      0.0078
      55.1437
    
    
      3
      e
      2.0230
      0.0117
      0.1165
      0.0013
      0.2357
      0.0029
      50.6600
    
    
      4
      tmva combination
      29.0435
      0.0442
      0.1092
      0.0004
      3.1703
      0.0129
      57.7786
    
    
      5
      mu
      5.0538
      0.0184
      0.1627
      0.0016
      0.8221
      0.0084
      51.8434
    
    
      6
      vtx
      12.2650
      0.0287
      0.1000
      0.0005
      1.2261
      0.0065
      53.2433
    
    
      7
      K
      17.7597
      0.0345
      0.1120
      0.0004
      1.9883
      0.0082
      55.1665
    
    
      8
      e
      2.0230
      0.0117
      0.1278
      0.0013
      0.2585
      0.0031
      50.6583
    
    
      9
      xgboost combination
      29.0435
      0.0442
      0.1163
      0.0005
      3.3789
      0.0147
      57.8737
    
    
      10
      K* K
      17.9499
      0.0645
      0.1135
      0.0000
      2.0373
      0.0073
      55.1643
    
    
      11
      K* e
      2.0657
      0.0219
      0.1259
      0.0000
      0.2600
      0.0028
      50.6415
    
    
      12
      K* mu
      5.0497
      0.0342
      0.1623
      0.0000
      0.8196
      0.0056
      51.6945
    
    
      13
      K* vtx
      12.4130
      0.0536
      0.1002
      0.0000
      1.2440
      0.0054
      53.2801
    
    
      14
      K* combination
      29.3082
      0.0824
      0.1170
      0.0000
      3.4291
      0.0096
      57.7637
    
    
      15
      Ks K
      17.3906
      0.1153
      0.1150
      0.0000
      1.9997
      0.0133
      54.8647
    
    
      16
      Ks e
      1.9698
      0.0388
      0.1307
      0.0000
      0.2575
      0.0051
      50.6315
    
    
      17
      Ks mu
      4.9352
      0.0614
      0.1658
      0.0000
      0.8185
      0.0102
      51.8175
    
    
      18
      Ks vtx
      11.9824
      0.0957
      0.1036
      0.0000
      1.2411
      0.0099
      52.8970
    
    
      19
      Ks combination
      28.3874
      0.1473
      0.1190
      0.0000
      3.3776
      0.0175
      57.3558

Additional information

Details see in the previous presentation: https://indico.cern.ch/event/369520/contribution/3/attachments/1178333/1704665/15.10.28.Tagging.pdf

$\epsilon_{tag}$ calculation

$$N (\text{B events, passed selection}) = \sum_{\text{B events, passed selection}} sw_i$$$$N (\text{all B events}) = \sum_{\text{all B events}} sw_i,$$

where $sw_i$ - sPLot weight

$$\epsilon_{tag} = \frac{N (\text{passed selection})} {N (\text{all events})}$$$$\Delta\epsilon_{tag} = \frac{\sqrt{N (\text{passed selection})}} {N (\text{all events})}$$

Data for training

data_sw_passed - tracks/vertices with B-sWeight > 1, are used for training
data_sw_not_passed - tracks/vertices with B-sWeight <= 1, are tagged after training

Training

Tracks Features (sig = signal, part = tagger track):

cos_diff_phi = $\cos(\phi^{sig} - \phi^{\rm part})$
diff_pt = $\max(p_T^{part}) - p_T^{part}$
partPt= $p_T^{part}$
max_PID_e_mu = $\max(PIDNN(e), PIDNN(\mu))^{part}$
partP = $p^{part}$
nnkrec = Number of reconstructed vertices
diff_eta = $(\eta^{sig} - \eta^{\rm part})$
EOverP = E/P (from CALO)
sum_PID_k_mu = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(\mu))$
ptB = $p_T^{sig}$

sum_PID_e_mu = $\sum\limits_{i\in part}(PIDNN(e)+PIDNN(\mu))$
sum_PID_k_e = $\sum\limits_{i\in part}(PIDNN(K)+PIDNN(e))$
proj = $(\vec{p}^{sig},\vec{p}^{part})$
PIDNNe = $PIDNN(e)$
PIDNNk = $PIDNN(K)$
PIDNNm = $PIDNN(\mu)$
phi = $\phi^{part}$
IP = number of IPs in the event

max_PID_k_mu = $max(PIDNN(K)+PIDNN(\mu))$
IPerr = error of IP
IPs = IP/IPerr
veloch = dE/dx track charge from the VELO system
max_PID_k_e = $max(PIDNN(K)+PIDNN(e))$
diff_phi = $(\phi^{sig} - \phi^{\rm part})$
ghostProb = ghost probability
IPPU = impact parameter with respect to any other reconstructed primary vertex.
eta = pseudorapity of track particle
partlcs = chi2PerDoF for a track

Vertex Selections

All selection are removed except DaVinci probability cuts

Vertex Features:

mult = multiplicity in the event
nnkrec = number of reconstructed vertices
ptB = signal B transverse momentum
vflag = number of tracks in the vertex
ipsmean = mean of tracks IPs
ptmean = mean pt of the tracks
vcharge = charge of the vertex weigthed by pt
svm = mass of the vertex
svp = momentum of the vertex
BDphiDir = angle betwen B and vertex
svtau = lifetime of the vertex
docamax = mean DOCA of the tracks

Classifier

Try to define B sign using track/vertex sign (to define they have the same signs or opposite).

target = signB * signTrack/signVertex > 0

classifier returns

$$P(\text{track/vertex same sign as B| B sign}) = $$$$ =P(\text{B same sign as track/vertex| track/vertex sign})$$

2-folding training on the full training sample to use full sample for further analysis (folding scheme provides not overfitted model, details: http://yandex.github.io/rep/metaml.html#module-rep.metaml.folding)

Calibration of $P(\text{track/vertex same sign as B| B sign})$

use 2-folding logistic/isotonic calibration for track/vertex classifier's prediction
compare with isotonic/logistic calibration
compare with absent calibration (bad, have shift predictions)

Computation of $p(B^+)$ using $P(\text{track/vertex same sign as B| B sign})$

Compute $p(B^+)$ using this probabilistic model representation (similar to the previous tagging combination):

$$ \frac{P(B^+)}{P(B^-)} = \prod_{track, vertex} \frac{P(\text{track/vertex}|B^+)} {P(\text{track/vertex} |B^-)} = \alpha \qquad $$$$\Rightarrow\qquad P(B^+) = \frac {\alpha}{1+\alpha}, \qquad \qquad [1] $$

where

$$ \frac{P(B^+)}{P(B^-)} = \prod_{track, vertex} \begin{cases} \frac{P(\text{track/vertex same sign as } B| B)}{P(\text{track/vertex opposite sign as } B| B)}, \text{if track/vertex}^+ \\ \\ \frac{P(\text{track/vertex opposite sign as } B| B)}{P(\text{track/vertex same sign as } B| B)}, \text{if track/vertex}^- \end{cases} $$$$p_{mistag} = min(p(B^+), p(B^-))$$

Intermediate estimation $ < D^2 > $ for tracking

Do calibration of $p(B^+)$ and compute $ < D^2 > $ :

use Isotonic calibration (generalization of bins fitting by linear function) - piecewise-constant monotonic function
randomly divide events into two parts (1-train, 2-calibrate)
symmetric isotonic fitting on train and $ < D^2 > $ computation on test
take mean and std for computed $ < D^2 > $

$ < D^2 > $ formula for sample: $$ < D^2 > = \frac{\sum_i[2(p^{mistag}_i - 0.5)]^2 * sw_i}{\sum_i sw_i} = $$ $$ = \frac{\sum_i[2(p_i(B^+) - 0.5)]^2 * sw_i}{\sum_i sw_i}$$

Formula is symmetric and it is not necessary to compute mistag probability

Preliminary estimation

$\epsilon$ calculation

$$\epsilon = < D^2 > * \epsilon_{tag}$$$$\Delta \epsilon = \sqrt{ \left(\frac{\Delta < D^2 > }{ < D^2 > }\right)^2 + \left(\frac{\Delta \epsilon_{tag} }{\epsilon_{tag}} \right)^2 }$$

Combine track-based and vertex-based tagging using formula [1]
symmetric isotonic calibration on random subsample with $D^2$ calculation
take mean and std for computed $ < D^2 > $

Full estimation of systematic error

set random state
train the best model (track and vertex taggers with 2-folding with fixed random state)
do calibration for track and vertex taggers with 2-folding with fixed random state
compute $p(B^+)$
do calibration with isotonic 2-folding (random state is fixed)
compute $ < D^2 > $

This procedure is repeated (from the scratch) for 30 different random states and then we compute mean and std for these 30 values of $ < D^2 > $.

Check calibration of mistag

axis x: predicted mistag probability $$p_{mistag} = min(p(B^+), p(B^-))$$
axis y: true mistag probability (computed for bin) $$p_{mistag} = \frac{N_{wrong}} {N_{wrong} + N_{right}}$$

$$\Delta p_{mistag} = \frac{\sqrt{N_{wrong} N_{right}}} {(N_{wrong} + N_{right})^{1.5}}$$

Stability of calibration

Add random noise after isotonic calibration of $p(B^+)$ for stability:

$$ 0.001 * normal(0, 1)$$

Inclusive tagging (NEW)

Check "OS" and "SS" regions separately (to check that tagging includes "SS" and "OS")
Check dependences on lifetime, lifetime error, number of tracks, momentum, transverse momentum, mass
Asymmetry of charges in events: understanding of high tagging quality or what information we use

Tracking "OS" tagging

https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-OS.ipynb

Take all possible tracks for all B-events.

Apply:

(IPs > 3) & ((abs(diff_eta) > 0.6) | (abs(diff_phi) > 0.825)) - geometrical cuts
(PIDNNp < 0.5) & (PIDNNpi < 0.5) & (ghostProb < 0.4)
((PIDNNk > trk) | (PIDNNm > trm) | (PIDNNe > tre)), trk=0., trm=0., tre=0.

B mass before sWeight cut

B mass after sWeight cut

Number of tracks in event

PIDNN distributions after selection

Preliminary estimation (track OS + vertex OS)

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-OS.ipynb



In [4]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/eff_OS.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[4]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging, PID less
      86.70309
      0.10803
      0.02494
      0.00033
      2.16214
      0.02875
      57.20222

Check calibration of mistag

before calibration

Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)

Tracking "SS" tagging

https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-SS.ipynb

Take all possible tracks for all B-events.

Apply:

(IPs < 3) & (abs(diff_eta) < 0.6) & (abs(diff_phi) < 0.825) & (ghostProb < 0.4)
((PIDNNk > {trk}) | (PIDNNm > {trm}) | (PIDNNe > {tre}) | (PIDNNpi > {trpi}) | (PIDNNp > {trp})), trk=0, trm=0, tre=0, trpi=0, trp=0

B mass before sWeight cut

B mass after sWeight cut

PIDNN distributions after selection

Preliminary estimation (track "SS" only)

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-SS.ipynb



In [5]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/eff_tracking_SS.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[5]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging, PID less
      72.39764
      0.09872
      0.03077
      0.00035
      2.22756
      0.02573
      57.419

Check calibration of mistag

before calibration

Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)

Tracking inclusive tagging

https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-PID-less.ipynb

Take all possible tracks for all B-events.

Apply:

(ghostProb < 0.4)
((PIDNNk > {trk}) | (PIDNNm > {trm}) | (PIDNNe > {tre}) | (PIDNNpi > {trpi}) | (PIDNNp > {trp})), trk=0, trm=0, tre=0, trpi=0, trp=0

B mass before sWeight cut

B mass after sWeight cut

Number of tracks in event

PIDNN distributions after selection

Dependence on PIDNN cuts

(PIDNNp < 0.6) & (PIDNNpi < 0.6) & (ghostProb < 0.4)
( (PIDNNk > 0.7) | (PIDNNm > 0.4) | (PIDNNe > 0.6) )



In [6]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[6]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging
      77.78995
      0.10233
      0.03449
      0.00046
      2.68331
      0.03576
      57.92576

(PIDNNp < 0.6) & (PIDNNpi < 0.6) & (ghostProb < 0.4)
( (PIDNNk > 0.1) | (PIDNNm > 0.1) | (PIDNNe > 0.1) )



In [7]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging_relax1.csv')









    Out[7]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging
      97.0983
      0.1143
      0.0384
      0.0003
      3.7256
      0.0306
      60.5811

(PIDNNpi < 0.6) & (ghostProb < 0.4)
( (PIDNNk > 0.) | (PIDNNm > 0.) | (PIDNNe > 0.) )



In [8]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging_relax2.csv')









    Out[8]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging
      99.208
      0.1156
      0.0408
      0.0004
      4.05
      0.0356
      61.2362

Preliminary estimation (track: OS+SS, OS vertex)

https://github.com/tata-antares/tagging_LHCb/blob/master/combined-tagging-PID-less.ipynb



In [9]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging-PID-less.csv').drop(['$\Delta$ AUC, with untag'], axis=1)









    Out[9]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging, PID less
      99.98595
      0.11601
      0.05873
      0.00043
      5.87239
      0.04359
      64.08899

Preliminary estimation (track: OS+SS, no vertex)

https://github.com/tata-antares/tagging_LHCb/blob/master/track-based-tagging-PID-less.ipynb



In [10]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/new-tagging_full_tracks.csv')









    Out[10]:






  
    
      
      name
      $\epsilon_{tag}, \%$
      $\Delta \epsilon_{tag}, \%$
      $D^2$
      $\Delta D^2$
      $\epsilon, \%$
      $\Delta \epsilon, \%$
      AUC, with untag
    
  
  
    
      0
      Inclusive tagging
      99.98595
      0.11601
      0.06303
      0.00051
      6.30254
      0.05125
      64.43919

Checks on track: OS+SS, OS vertex model

Check calibration of mistag

for signal (B-like events)

for background

before calibration

Symmetric isotonic calibration + random noise * 0.001 (noise for stability of bins)

Tagging power dependency on ...

For B mass, B momentum, B transverse momentum, B lifetime use sidebands as bck and peak region as signal:
- mask_signal = ((Bmass > 5.27) & (Bmass < 5.3))
- mask_bck = ((Bmass < 5.25) | (Bmass > 5.32))
For B lifetime error and number of tracks use sWeights

Procedure:

divide variable into 5 percentile bins
for each bin plot mistag vs true mistag

Signal dependence

Bck dependence

Why effective efficiency is so high for this model (model of tracks probability combination to obtain B probability)?

Let's see on the following characteristic of the event:

$$ -\sum_{track} charge_{track}$$

It seems, that for $B^+$ event it should be around +1 + constant (because we exclude signal part)

Regions:

'OS' region: (IP > 3) & ((abs(diff_eta) > 0.6) | (abs(diff_phi) > 0.825))
'SS' region: (IP < 3) & (abs(diff_eta) < 0.6) & (abs(diff_phi) < 0.825)
full data

"OS" data

"SS" data

Full sample

Add signal track

"OS" data

"SS" data

Full sample

Means of distributions (with signal track and without it)



In [11]:

    
pandas.set_option('display.precision', 5)
pandas.read_csv('img/track_signs_assymetry_means.csv', index_col='name')









    Out[11]:






  
    
      
      $B^+$
      $B^+$, with signal part
      $B^-$
      $B^-$, with signal part
      ROC AUC
      ROC AUC, with signal part
    
    
      name
      
      
      
      
      
      
    
  
  
    
      full
      0.44341
      -0.55659
      -0.57216
      0.42784
      0.57158
      0.56915
    
    
      OS
      0.11117
      -0.88883
      -0.15727
      0.84273
      0.52953
      0.68460
    
    
      SS
      0.17597
      -0.82403
      -0.17810
      0.82190
      0.56770
      0.77769

This ROC AUC score is similar to the current tagging implementation

Charges asymmetry checks on MC sample

"OS" sample

"SS" sample

Full sample

Means of distributions for MC and data (with signal track and without it)



In [12]:

    
pandas.set_option('display.precision', 5)
pandas.concat([pandas.read_csv('img/track_signs_assymetry_means.csv', index_col='name'),
               pandas.read_csv('img/track_signs_assymetry_means_mc.csv', index_col='name')])









    Out[12]:






  
    
      
      $B^+$
      $B^+$, with signal part
      $B^-$
      $B^-$, with signal part
      ROC AUC
      ROC AUC, with signal part
    
    
      name
      
      
      
      
      
      
    
  
  
    
      full
      0.44341
      -0.55659
      -0.57216
      0.42784
      0.57158
      0.56915
    
    
      OS
      0.11117
      -0.88883
      -0.15727
      0.84273
      0.52953
      0.68460
    
    
      SS
      0.17597
      -0.82403
      -0.17810
      0.82190
      0.56770
      0.77769
    
    
      full_mc
      0.31778
      -0.68222
      -0.77006
      0.22994
      0.58490
      0.57159
    
    
      OS_mc
      0.04728
      -0.95272
      -0.28497
      0.71503
      0.54030
      0.69439
    
    
      SS_mc
      0.15520
      -0.84480
      -0.18414
      0.81586
      0.56807
      0.78832

Algorithm uses this information during tracks probabilities combination:

#### Can we use it indeed?

#### What is the source of this asymmetry?

Current tagging algorithm implicitly also use this information!

The asymmery plays a discriminative role even if we choose the random track in the event!

Random is not random!

Checked

modify loss function during vertex training to use track tagging output as baseline (try to correct track predictions using vertex information): doesn't help, but maybe need all vertices for event
different calibrations: bins, logistic, isotonic
normalize number of positive and negative tracks for $B^+$ and $B^-$ separately to avoid track charges asymmetry, the quality drops:
- ROC AUC: 0.62
- $\epsilon: 4.5$
use track sign as feature during training (bad quality):
- ROC AUC: 0.611
- $\epsilon: 3.6$
check as discriminative variable sum of weighted with Pt charges of tracks:

$$ -\frac{\sum_{track} charge_{track} Pt_{track}} {\sum_{track} Pt_{track}}$$

* ROC AUC for all regions ("OS", "SS", full sample) < 0.5006
* doesn't discriminate

TODO

check efficiency for inclusive tagging on the other decays (please, send us your tuples with Flavour tagging checker info)
understand the asymmetry of sum of charges (maybe somebody understands?)
get all vertices from DaVinci (to check if we indeed need vertex to improve tagger)

	name	$\epsilon_{tag}, \%$	$\Delta \epsilon_{tag}, \%$	$D^2$	$\Delta D^2$	$\epsilon, \%$	$\Delta \epsilon, \%$
0	vtx_xgboost	18.2008	0.0495	0.0499	0.0013	0.9084	0.0234
1	vtx_tmva	18.2008	0.0495	0.0425	0.0008	0.7727	0.0150
2	$K$_xgboost	19.2642	0.0509	0.0520	0.0009	1.0009	0.0173
3	$K$_tmva	19.2642	0.0509	0.0480	0.0011	0.9237	0.0214
4	$e$_xgboost	1.8382	0.0157	0.1674	0.0068	0.3077	0.0127
5	$e$_tmva	1.8382	0.0157	0.1609	0.0068	0.2957	0.0127
6	$\mu$_xgboost	5.7366	0.0278	0.1661	0.0038	0.9527	0.0224
7	$\mu$_tmva	5.7366	0.0278	0.1610	0.0032	0.9234	0.0191

	name	$\epsilon_{tag}, \%$	$\Delta \epsilon_{tag}, \%$	$D^2$	$\Delta D^2$	$\epsilon, \%$	$\Delta \epsilon, \%$
0	vtx_xgboost	9.8330	0.0257	0.1150	8.7538e-05	1.1306	0.0031
1	vtx_tmva	9.8330	0.0257	0.1080	7.1570e-04	1.0618	0.0076
2	$K$_xgboost	17.7597	0.0345	0.1124	5.4507e-05	1.9958	0.0040
3	$K$_tmva	17.7597	0.0345	0.1064	3.0603e-05	1.8905	0.0037
4	$e$_xgboost	2.0230	0.0117	0.1303	2.1707e-04	0.2636	0.0016
5	$e$_tmva	2.0230	0.0117	0.1212	2.3217e-04	0.2452	0.0015
6	$\mu$_xgboost	5.0538	0.0184	0.1639	1.1755e-04	0.8283	0.0031
7	$\mu$_tmva	5.0538	0.0184	0.1597	7.0126e-05	0.8070	0.0030

	name	$\epsilon_{tag}, \%$	$\Delta \epsilon_{tag}, \%$	$D^2$	$\Delta D^2$	$\epsilon, \%$	$\Delta \epsilon, \%$	AUC, with untag
0	iso-xgb_combined	34.4404	0.0681	0.0683	0.0016	2.3531	0.0539	56.9479
1	iso-tmva_combined	34.4405	0.0681	0.0666	0.0019	2.2941	0.0643	56.8452
2	log-xgb_combined	34.4405	0.0681	0.0717	0.0008	2.4710	0.0289	56.9369
3	log-tmva_combined	34.4405	0.0681	0.0672	0.0009	2.3137	0.0319	56.8070

	name	$\epsilon_{tag}, \%$	$\Delta \epsilon_{tag}, \%$	$D^2$	$\Delta D^2$	$\epsilon, \%$	$\Delta \epsilon, \%$	AUC, with untag
0	mu	5.0538	0.0184	0.1581	0.0014	0.7993	0.0075	51.8453
1	vtx	12.2650	0.0287	0.0843	0.0004	1.0344	0.0053	53.2321
2	K	17.7597	0.0345	0.1055	0.0004	1.8743	0.0078	55.1437
3	e	2.0230	0.0117	0.1165	0.0013	0.2357	0.0029	50.6600
4	tmva combination	29.0435	0.0442	0.1092	0.0004	3.1703	0.0129	57.7786
5	mu	5.0538	0.0184	0.1627	0.0016	0.8221	0.0084	51.8434
6	vtx	12.2650	0.0287	0.1000	0.0005	1.2261	0.0065	53.2433
7	K	17.7597	0.0345	0.1120	0.0004	1.9883	0.0082	55.1665
8	e	2.0230	0.0117	0.1278	0.0013	0.2585	0.0031	50.6583
9	xgboost combination	29.0435	0.0442	0.1163	0.0005	3.3789	0.0147	57.8737
10	K* K	17.9499	0.0645	0.1135	0.0000	2.0373	0.0073	55.1643
11	K* e	2.0657	0.0219	0.1259	0.0000	0.2600	0.0028	50.6415
12	K* mu	5.0497	0.0342	0.1623	0.0000	0.8196	0.0056	51.6945
13	K* vtx	12.4130	0.0536	0.1002	0.0000	1.2440	0.0054	53.2801
14	K* combination	29.3082	0.0824	0.1170	0.0000	3.4291	0.0096	57.7637
15	Ks K	17.3906	0.1153	0.1150	0.0000	1.9997	0.0133	54.8647
16	Ks e	1.9698	0.0388	0.1307	0.0000	0.2575	0.0051	50.6315
17	Ks mu	4.9352	0.0614	0.1658	0.0000	0.8185	0.0102	51.8175
18	Ks vtx	11.9824	0.0957	0.1036	0.0000	1.2411	0.0099	52.8970
19	Ks combination	28.3874	0.1473	0.1190	0.0000	3.3776	0.0175	57.3558

	$B^+$	$B^+$, with signal part	$B^-$	$B^-$, with signal part	ROC AUC	ROC AUC, with signal part
name
full	0.44341	-0.55659	-0.57216	0.42784	0.57158	0.56915
OS	0.11117	-0.88883	-0.15727	0.84273	0.52953	0.68460
SS	0.17597	-0.82403	-0.17810	0.82190	0.56770	0.77769