Table of Contents


About

This notebook is devoted to uniformGradientBoosting, which is gradient boosting on trees, with custom loss function:

$\text{loss} = \sum_i w_i \exp \left[- \sum_j a_{ij} \textrm{score}_j y_j \right] $

Here $y_j \in \{+1, -1\}$, $\textrm{score}_j \in \mathbb{R}$ - are real class and score prediction of j-th event in the train set

$w_i$ are ones so far, the main problem is to choose appropriate $a_{ij}$ matrix, because there are plenty of variants.

If we take $a_{ij}$ to be identity matrix, it is simply AdaBoost loss.

Particular cases of this loss tested in the current notebook

SimpleKnnLoss(knn) is particular case, where in each line we set ones to closest knn events of the same class, and zeros to all others.
The matrix is square, if we take knn=1, this is the same as Ada loss.

PairwiseKnnLossFunction(knn) we take knn neighbours for each event, for each pair of neighboring events we create separate row in matrix, ones are placed in the corresponding to events columns (thus we have only two 1's in each row). This one gives poor uniformity and doen't semm to have any advantages. If knn=1, this one is equivalent to Ada loss too.

RandomKnnLossFunction(nrows, knn, knnfactor=3), the resulting A matrix wil have nrows rows, each of them is generated so:
we take random event from train dataset, from knn * knnfactor of closest neighbours we take knn in a random way, and place ones to the corresponding columns. Each row has knn 1's.

Mse variation is currently used measure of flatness

I don't rely on it much (though it seems to be adequate measure), so sometimes print plots to compare,
for some target efficiency MSE variation is computed so:

we have some target_efficiencies, we split all into bins, compute
$\text{mse}(eff) = \cfrac{1}{\text{n_bins} \times \text{particles}} \sum_{bin} (\text{mean_eff} - \text{bin_eff} )^2 \times \text{particles_in_bin} $

to obtain some measure of nonuniformity, we take the average of mse(eff) for several efficiencies (i.e. [0.6, 0.7, 0.8, 0.9])

This kind of function is taken because it is more or less independent of number of bins and number of events.


In [1]:
import pandas, numpy
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from IPython.display import display_html
from collections import OrderedDict

import uniformgradientboosting as ugb
import commonutils as utils
import reports
from reports import ClassifiersDict
from uboost import uBoostBDT, uBoostClassifier
from supplementaryclassifiers import HidingClassifier
from config import ipc_profile


binner is ok
correction function is ok
computeSignalKnnIndices is ok
MSE variation is ok
loss is ok
0.802 0.7945 0.794 0.791 0.7925 0.796 0.755
uniform gradient boosting is ok
uboost is ok

Loading data


In [2]:
used_columns = ["Y1", "Y2", "Y3", "M2AB", "M2AC"]

signalDF  = pandas.read_csv('datasets/dalitzplot/signal.csv', sep='\t', usecols=used_columns)
signal5e5DF = pandas.read_csv('datasets/dalitzplot/signal5e5.csv', sep='\t', usecols=used_columns)
bgDF      = pandas.read_csv('datasets/dalitzplot/bkgd.csv', sep='\t', usecols=used_columns)

answers5e5 = numpy.ones(len(signal5e5DF))

assert set(signalDF.columns) == set(signal5e5DF.columns) == set(bgDF.columns), "columns are diffferent"

Distribution of events in different files in the Dalitz variables


In [3]:
def plotDistribution2D(var_name1, var_name2, data_frame, bins=40):
    """The function to plot 2D distribution histograms"""
    H, x, y = pylab.histogram2d(data_frame[var_name1], data_frame[var_name2], bins = 40)
    pylab.xlabel(var_name1)
    pylab.ylabel(var_name2)
    pylab.pcolor(x, y, H, cmap=cm.Blues)
    pylab.colorbar()

pylab.figure(figsize=(18, 6))
subplot(1, 3, 1), pylab.title("signal"),       plotDistribution2D("M2AB", "M2AC", signalDF)
subplot(1, 3, 2), pylab.title("background"),   plotDistribution2D("M2AB", "M2AC", bgDF)
subplot(1, 3, 3), pylab.title("dense signal"), plotDistribution2D("M2AB", "M2AC", signal5e5DF)
pass



In [13]:
def smallReport(classifiers, roc_stages=[50, 100], mse_stages=[100], parallelize=True):
    used_ipc = ipc_profile if parallelize else None
    test_preds = classifiers.fit(trainX, trainY, ipc_profile=used_ipc).test_on(testX, testY, low_memory=True)
    pylab.figure(figsize=(17, 7))

    pylab.subplot(121), pylab.title('Learning curves'), test_preds.learning_curves()
    pylab.subplot(122), pylab.title('Staged MSE'), test_preds.mse_curves(uniform_variables)
    show()
    test_preds.roc(stages=roc_stages).show()
    classifiers.test_on(signal5e5DF, answers5e5, low_memory=True)\
        .efficiency(uniform_variables, stages=mse_stages, target_efficiencies=[0.7])

Preparation of train/test datasets


In [5]:
trainX, trainY, testX, testY = utils.splitOnTestAndTrain(signalDF, bgDF)
train_variables = ["Y1", "Y2", "Y3"]
uniform_variables  = ["M2AB", "M2AC"]

Comparison of uGB with other classifiers

AdaBoost, uBoost, and uniformGradientBoosting

uBoost shows at least not worse quality with essentially better signal flatness, while being a little less uniform in background.

The latest is not surprise, because uBoost wasn't planned to be somehow flat in background.


In [38]:
base_estimator = DecisionTreeClassifier(max_depth=4)
n_estimators = 150 + 1

var_classifiers = ClassifiersDict()
 
var_classifiers['AdaBoost'] = HidingClassifier(train_variables=train_variables, 
                                    base_estimator=AdaBoostClassifier(base_estimator=base_estimator, n_estimators=n_estimators))

knnloss1 = ugb.SimpleKnnLossFunction(uniform_variables, knn=20)
var_classifiers['unifGB20'] = ugb.MyGradientBoostingClassifier(loss=knnloss1, max_depth=4, n_estimators=n_estimators, 
                                                           learning_rate=.5, train_variables=train_variables)

var_classifiers['uBoost']   = uBoostClassifier(uniform_variables=uniform_variables, base_estimator=base_estimator,
                                               n_estimators=n_estimators, train_variables=train_variables, efficiency_steps=12)

flatness_loss = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=0.05, bins=13)
var_classifiers['uGB+FL'] = ugb.MyGradientBoostingClassifier(loss=flatness_loss, max_depth=4, n_estimators=n_estimators, 
                                                           learning_rate=.4, train_variables=train_variables)

var_classifiers.fit(trainX, trainY, ipc_profile=ipc_profile)
pass

In [8]:
var_classifiers.test_on(testX, testY).learning_curves().show().roc(stages=[75, 150]).show()


Out[8]:
<reports.Predictions at 0x615e490>

In [41]:
var_pred5e5 = var_classifiers.test_on(signal5e5DF, answers5e5)
var_pred5e5.mse_curves(uniform_variables).show().efficiency(uniform_variables, stages=[75, 150], target_efficiencies=[0.7])


Stage 75, efficiency=0.70
Stage 150, efficiency=0.70
Out[41]:
<reports.Predictions at 0xf25137d0>

Looking at MSE as a measure of nonuniformity (the less MSE the better)

Here are some examples to understand how values of MSE are correlated with efficiencies on Dalitz plots


In [43]:
effs = [0.6, 0.7, 0.85]
for eff in effs:
    var_pred5e5.efficiency(uniform_variables, target_efficiencies=[eff]) \
        .print_mse(uniform_variables, stages=[100], efficiencies=[eff])
    
display_html("<b>After summing over efficiencies {0} </b>".format(effs), raw=True)
var_pred5e5.print_mse(uniform_variables, stages=[100], efficiencies=effs)


# display_html(reports.computeStagedMseVariation(answers5e5, signal5e5DF, uniform_variables, var_sig5e5_probas_dict, 
#                                             stages=[100], target_efficiencies=effs) )


Stage result, efficiency=0.60
Staged MSE variation
AdaBoost unifGB20 uBoost uGB+FL
100 1.710059 0.70985 0.763969 0.451347

1 rows × 4 columns

Stage result, efficiency=0.70
Staged MSE variation
AdaBoost unifGB20 uBoost uGB+FL
100 1.769065 1.007765 0.769616 0.56769

1 rows × 4 columns

Stage result, efficiency=0.85
Staged MSE variation
AdaBoost unifGB20 uBoost uGB+FL
100 1.290776 0.990239 0.576563 0.622514

1 rows × 4 columns

After summing over efficiencies [0.6, 0.7, 0.85]
Staged MSE variation
AdaBoost unifGB20 uBoost uGB+FL
100 1.604161 0.91288 0.709079 0.551818

1 rows × 4 columns

Out[43]:
<reports.Predictions at 0xf25137d0>

Looking at the background efficiency


In [15]:
pred = var_classifiers.test_on(testX, testY)
pred.mse_curves(uniform_variables, on_signal=False)


Out[15]:
<reports.Predictions at 0x65ba4d0>

Comparison of SimpleKnnLoss with different values of 'knn' parameter

flatness vs efficiency, putting large sknn gives much worse quality with little advantage in flatness,
seems there is some limit in flatness the method can't overcome


In [16]:
sknn_classifiers = ClassifiersDict()
for knn in [1, 5, 10, 20, 30, 60]:
    knnloss = ugb.SimpleKnnLossFunction(uniform_variables, knn=knn)
    sknn_classifiers["sknn=%i" % knn] = ugb.MyGradientBoostingClassifier(loss=knnloss, max_depth=4, n_estimators=201, 
                                    learning_rate=.5, train_variables = train_variables)
smallReport(sknn_classifiers, roc_stages=[100, 200], mse_stages=[200])


We spent 100.69 seconds on parallel training
Stage 200, efficiency=0.70

Comparison of SimpleKnnLoss with different diagonal parameter

we augment A matrix by adding an identity matrix multiplied by some number

$ A:= A + diagonal \times I$, diagonal is real number


In [17]:
sknn2_classifiers = ClassifiersDict()
for diagonal in [0, 1, 2]:
    knnloss = ugb.SimpleKnnLossFunction(uniform_variables, knn=25, diagonal=diagonal)
    sknn2_classifiers["diag=%i" % diagonal] = ugb.MyGradientBoostingClassifier(loss=knnloss, max_depth=4, n_estimators=101, 
                                    learning_rate=.5, train_variables = train_variables)
smallReport(sknn2_classifiers)


We spent 45.52 seconds on parallel training
Stage 100, efficiency=0.70

Comparison of pairwise metrics with different knn

works just a bit flatter then AdaBoost, unfortunately didn't give some essential advantage


In [18]:
pw_classifiers = ClassifiersDict()
for knn in [5, 15]:
    pw_loss = ugb.PairwiseKnnLossFunction(uniform_variables, knn=knn)
    pw_classifiers["pw_knn=%i" % knn] = ugb.MyGradientBoostingClassifier(loss=pw_loss, max_depth=4, n_estimators=101, 
                                    learning_rate=.5, train_variables = train_variables)
smallReport(pw_classifiers)


We spent 61.33 seconds on parallel training
Stage 100, efficiency=0.70

Comparison of RandomKnnLoss with different knn parameter

a good demonstration of quality / flatness tradeoff


In [19]:
rknn_classifiers = ClassifiersDict()
for knn in [1, 6, 10, 20, 30]:
    rknn_loss = ugb.RandomKnnLossFunction(uniform_variables, knn=knn, n_rows=len(trainX) * 3, large_preds_penalty=0.)
    rknn_classifiers["rknn=%i" % knn] = ugb.MyGradientBoostingClassifier(loss=rknn_loss, max_depth=4, n_estimators=101, 
                                    learning_rate=.5, train_variables = train_variables)
    
smallReport(rknn_classifiers)


We spent 80.32 seconds on parallel training
Stage 100, efficiency=0.70

Comparison of RandomKnnLoss with different nrows parameter

surprisingly, nearly nothing depends on the number of rows - neigther the uniformity nor quality


In [20]:
rknn2_classifiers = ClassifiersDict()
for factor in [0.5, 1, 2, 4, 8]:
    n_rows = int(factor * len(trainX))
    rknn2_loss = ugb.RandomKnnLossFunction(uniform_variables, knn=20, n_rows=n_rows)
    rknn2_classifiers["rknn2=%1.1f" % factor] = ugb.MyGradientBoostingClassifier(loss=rknn2_loss, max_depth=4, n_estimators=101, 
                                    learning_rate=.5, train_variables = train_variables)
smallReport(rknn2_classifiers)


We spent 106.34 seconds on parallel training
Stage 100, efficiency=0.70

Comparison with / without subsampling

Usually subsampling increases the convergence and prevents overfitting, but this is not the case.
Subsample = 0.5 behaves very randomly, sometimes it gives better flatness, sometimes worse flatness.


In [21]:
ss_classifiers = ClassifiersDict()
for subsample in [0.5, .7, 0.8, 1.]:
    knnloss = ugb.SimpleKnnLossFunction(uniform_variables, knn=20)
    ss_classifiers["subsample=%1.2f" % subsample] = ugb.MyGradientBoostingClassifier(loss=knnloss, max_depth=4, n_estimators=151, 
                                    learning_rate=.5, train_variables=train_variables, subsample=subsample)

smallReport(ss_classifiers)


We spent 88.21 seconds on parallel training
Stage 100, efficiency=0.70

SimpleKnnLoss with / without distinguishing classes

when we not distinguish classes, the classifier doesn't tend to uniformity, it tries to have at every region that signal has greater probas then bg (and that is all).

This may serve as a good feature for following usage in some other classifier, beacuse it tries to give equal difference in predictions over the Dalitz variables


In [22]:
sknn3_classifiers = ClassifiersDict()
for distinguish_classes in [True, False]:
    knnloss = ugb.SimpleKnnLossFunction(uniform_variables, knn=25, distinguish_classes=distinguish_classes)
    sknn3_classifiers["dist=%s" % str(distinguish_classes)] = \
        ugb.MyGradientBoostingClassifier(loss=knnloss, max_depth=4, n_estimators=101, 
                                         learning_rate=.5, train_variables = train_variables)
smallReport(sknn3_classifiers)


We spent 62.06 seconds on parallel training
Stage 100, efficiency=0.70

More variables


In [23]:
full_used_columns = ["M2AB", "M2AC", "Y1", "Y2", "Y3", "Y4", "XA", "XB", "XC"]

full_signalDF  = pandas.read_csv('datasets/dalitzplot/signal.csv', sep='\t', usecols=full_used_columns)
full_signal5e5DF = pandas.read_csv('datasets/dalitzplot/signal5e5.csv', sep='\t', usecols=full_used_columns)
full_bgDF      = pandas.read_csv('datasets/dalitzplot/bkgd.csv', sep='\t', usecols=full_used_columns)

# preparation oftrain/test
full_trainX, full_trainY, full_testX, full_testY = utils.splitOnTestAndTrain(full_signalDF, full_bgDF)

Results of AdaBoost trained on 3, 4, 5, 6 variables


In [24]:
base_estimator = DecisionTreeClassifier(max_depth=4)
uniform_variables  = ["M2AB", "M2AC"]
n_estimators = 101
full_train_variables = ["Y1", "Y2", "Y3", "Y4", "XA", "XB", "XC"]

full_ada_classifiers = ClassifiersDict()
for n_features in [3, 4, 5, 6]:
    full_ada_classifiers['Ada_Feat=%i' % n_features] = HidingClassifier(train_variables=full_train_variables[:n_features], 
                base_estimator=AdaBoostClassifier(base_estimator=base_estimator, n_estimators=n_estimators))

full_preds = full_ada_classifiers.fit(full_trainX, full_trainY, ipc_profile=ipc_profile).test_on(full_testX, full_testY)
figure(figsize=(17, 7))
subplot(121), full_preds.learning_curves()
subplot(122), full_preds.mse_curves(uniform_variables)


We spent 54.68 seconds on parallel training
Out[24]:
(<matplotlib.axes.AxesSubplot at 0x51319d0>,
 <reports.Predictions at 0x24320910>)

Training classifiers on 4 variables

seems reasonable to try training on 4 variables (because 5, 6 give nearly ideal classification)


In [25]:
n_estimators = 101
full_classifiers = ClassifiersDict()
full4_train_vars = full_train_variables[:4]

full_classifiers['AdaBoost'] = HidingClassifier(train_variables=full4_train_vars, 
                                    base_estimator=AdaBoostClassifier(base_estimator=base_estimator, n_estimators=n_estimators))

knnloss1 = ugb.SimpleKnnLossFunction(uniform_variables, knn=10)
full_classifiers['unifGB'] = ugb.MyGradientBoostingClassifier(loss=knnloss1, max_depth=4, n_estimators=n_estimators, 
                                                           learning_rate=.5, train_variables=full4_train_vars)

full_classifiers['uBoost']   = uBoostClassifier(uniform_variables=uniform_variables, base_estimator=base_estimator,
                                               n_estimators=n_estimators, train_variables=full4_train_vars, 
                                               efficiency_steps=12, ipc_profile=ipc_profile)

faltnessloss1 = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=0.05, bins=17)
full_classifiers['unifGB'] = ugb.MyGradientBoostingClassifier(loss=faltnessloss1, max_depth=4, n_estimators=n_estimators, 
                                                           learning_rate=0.4, train_variables=full4_train_vars)

full_preds = full_ada_classifiers.fit(full_trainX, full_trainY, ipc_profile=ipc_profile).test_on(full_testX, full_testY)
figure(figsize=(17, 7))
subplot(121), full_preds.learning_curves()
subplot(122), full_preds.mse_curves(uniform_variables)


We spent 70.78 seconds on parallel training
Out[25]:
(<matplotlib.axes.AxesSubplot at 0x24881410>,
 <reports.Predictions at 0x6530b50>)

Distance-dependent algorithms

Square A matrix is constructed by taking $$a_{ij} = \begin{cases} 0, & \text{if class}_i \neq \text{class}_j \\ 0, & \text{if j-th event is not in knn of i-th event }\\ f(r), & \text{otherwise, where $r$ is distance(i,j)} \end{cases}$$

$f(r)$ is the function to choose

The results of these experiments are too unstable and not reliable,
after rebuilding (or reshuffling datasets), the results may significally change.

Need more experiments here (or better - good theoretical idea why some function should be preferred), there are too many things to play with.

As is A matrix

NB: clip(x,a,b) is function that places the result x between a and b (to avoid singularities from log, 1/r and so on)


In [26]:
functions = {'exp': lambda r: numpy.exp(-50*r),
             'exp2': lambda r: numpy.exp(-1000*r*r),
             'exp3': lambda r: numpy.exp(-2000*r*r),
             'log': lambda r: numpy.clip(-numpy.log(r), 0, 7),
             '1/r': lambda r: numpy.clip(1/r, 0, 100),
             '1/sqrt(r)': lambda r: numpy.clip(r ** -0.5, 0, 10)
            }

dist_classifiers = ClassifiersDict()
for name, func in functions.iteritems():
    loss = ugb.DistanceBasedKnnFunction(uniform_variables, knn=150, distance_dependence=func, row_normalize=False)
    dist_classifiers[name] = \
        ugb.MyGradientBoostingClassifier(loss=loss, max_depth=4, n_estimators=101, 
                                         learning_rate=.5, train_variables = train_variables)
smallReport(dist_classifiers, parallelize=False)


Classifier          log is learnt in 74.59 seconds
Classifier         exp3 is learnt in 71.20 seconds
Classifier         exp2 is learnt in 70.21 seconds
Classifier          exp is learnt in 68.71 seconds
Classifier    1/sqrt(r) is learnt in 70.67 seconds
Classifier          1/r is learnt in 73.01 seconds
Stage 100, efficiency=0.70

Normed (over the row) matrix

in this case we mutiply each row of the A matrix to have sum of elements in each row equal to 1.


In [27]:
dist2_classifiers = ClassifiersDict()
for name, func in functions.iteritems():
    loss = ugb.DistanceBasedKnnFunction(uniform_variables, knn=150, distance_dependence=func, row_normalize=True)
    dist2_classifiers[name + "+norm"] = \
        ugb.MyGradientBoostingClassifier(loss=loss, max_depth=4, n_estimators=101, 
                                         learning_rate=.5, train_variables = train_variables)
smallReport(dist2_classifiers, parallelize=False)


Classifier     log+norm is learnt in 71.02 seconds
Classifier    exp3+norm is learnt in 69.35 seconds
Classifier    exp2+norm is learnt in 76.84 seconds
Classifier     exp+norm is learnt in 63.91 seconds
Classifier 1/sqrt(r)+norm is learnt in 66.47 seconds
Classifier     1/r+norm is learnt in 74.16 seconds
Stage 100, efficiency=0.70

Dense matrices


In [28]:
functions = {'exp': lambda r: numpy.exp(-50*r),
             'exp2': lambda r: numpy.exp(-1000*r*r),
             'exp3': lambda r: numpy.exp(-2000*r*r),
             'log': lambda r: numpy.clip(-numpy.log(r), 0, 7),
             '1/r': lambda r: numpy.clip(1/r, 0, 100),
             '1/sqrt(r)': lambda r: numpy.clip(r ** -0.5, 0, 10)
            }

dist3_classifiers = ClassifiersDict()
for name, func in functions.iteritems():
    loss = ugb.DistanceBasedKnnFunction(uniform_variables, knn=100, distance_dependence=func, row_normalize=True)
    dist3_classifiers[name] = \
        ugb.MyGradientBoostingClassifier(loss=loss, max_depth=4, n_estimators=101, 
                                         learning_rate=.5, train_variables = train_variables)
smallReport(dist3_classifiers, parallelize=False)


Classifier          log is learnt in 61.00 seconds
Classifier         exp3 is learnt in 59.19 seconds
Classifier         exp2 is learnt in 67.67 seconds
Classifier          exp is learnt in 60.78 seconds
Classifier    1/sqrt(r) is learnt in 53.70 seconds
Classifier          1/r is learnt in 47.64 seconds
Stage 100, efficiency=0.70

The average distance to n-th neighbour

just to feel the real scale


In [29]:
from sklearn.neighbors import NearestNeighbors
data = trainX[trainY > 0.5]
knn = 100
r, inds = NearestNeighbors(n_neighbors=knn).fit(data).kneighbors(data)
plot(numpy.arange(knn), r.mean(axis=0))


Out[29]:
[<matplotlib.lines.Line2D at 0x7f52078f4210>]

Testing with constant $f(r)$ function,

this should give the same results as sknn


In [30]:
dist4_classifiers = ClassifiersDict()
for knn in [1, 10, 20, 30]:
    loss = ugb.DistanceBasedKnnFunction(uniform_variables, knn=knn, distance_dependence=lambda r: (r + 1e5)**0, row_normalize=True)
    dist4_classifiers['knn=%i' %knn] = ugb.MyGradientBoostingClassifier(loss=loss, max_depth=4, n_estimators=101, 
                                                               learning_rate=.5, train_variables=train_variables)
smallReport(dist4_classifiers, parallelize=False)


Classifier        knn=1 is learnt in 21.93 seconds
Classifier       knn=10 is learnt in 25.28 seconds
Classifier       knn=20 is learnt in 27.02 seconds
Classifier       knn=30 is learnt in 29.31 seconds
Stage 100, efficiency=0.70

GradientBoosting+flatnessLoss

different learning_rate


In [31]:
fl_classifiers = ClassifiersDict()
knn_loss = ugb.SimpleKnnLossFunction(uniform_variables, knn=25)
fl_classifiers["sknn25"] = ugb.MyGradientBoostingClassifier(loss=knn_loss, max_depth=4, n_estimators=101, 
                                    learning_rate=.5, train_variables = train_variables)
for learning_rate in [ 0.1, 0.2, 0.4, 0.6]:
    flatness_loss = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=0.05, bins=13)
    fl_classifiers["fl+lr=%1.3f" % learning_rate] = ugb.MyGradientBoostingClassifier(loss=flatness_loss, max_depth=4, 
                                n_estimators=101, learning_rate=learning_rate, train_variables = train_variables)

smallReport(fl_classifiers, parallelize=False)


Classifier       sknn25 is learnt in 30.09 seconds
Classifier  fl+lr=0.100 is learnt in 25.11 seconds
Classifier  fl+lr=0.200 is learnt in 24.85 seconds
Classifier  fl+lr=0.400 is learnt in 25.62 seconds
Classifier  fl+lr=0.600 is learnt in 25.87 seconds
Stage 100, efficiency=0.70

Different 'Ada_coefficient' in general, loss is FlatnessLoss + AdaCoeff * AdaLoss
The greater Ada_coefficient, the more we tend to minimize AdaLoss (quality), not FlatnessLoss, this coeficient surves as some kind of tradeoff parameter between flatness and quality


In [32]:
fl2_classifiers = ClassifiersDict()
knn_loss = ugb.SimpleKnnLossFunction(uniform_variables, knn=25)
fl2_classifiers["sknn25"] = ugb.MyGradientBoostingClassifier(loss=knn_loss, max_depth=4, n_estimators=151, 
                                    learning_rate=.5, train_variables = train_variables)
for ada_coeff in [0.01, 0.02, 0.05, 0.1, 0.2]:
    flatness_loss = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=ada_coeff, bins=23)
    fl2_classifiers["fl2=%1.2f" % ada_coeff] = ugb.MyGradientBoostingClassifier(loss=flatness_loss, max_depth=4, 
                                n_estimators=151, learning_rate=0.2, train_variables = train_variables)

smallReport(fl2_classifiers)


We spent 113.51 seconds on parallel training
Stage 100, efficiency=0.70

Dependence on the number of bins

the dependence should be quite small


In [33]:
fl3_classifiers = ClassifiersDict()
knn_loss = ugb.SimpleKnnLossFunction(uniform_variables, knn=25)
fl3_classifiers["sknn25"] = ugb.MyGradientBoostingClassifier(loss=knn_loss, max_depth=4, n_estimators=151, 
                                    learning_rate=.5, train_variables = train_variables)
for bins in [8, 15, 25]:
    flatness_loss = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=0.1, bins=bins)
    fl3_classifiers["fl3=%i" % bins] = ugb.MyGradientBoostingClassifier(loss=flatness_loss, max_depth=4, 
                                n_estimators=151, learning_rate=0.2, train_variables = train_variables)

smallReport(fl3_classifiers)


We spent 42.33 seconds on parallel training
Stage 100, efficiency=0.70

Different powers for flatness loss


In [34]:
fl4_classifiers = ClassifiersDict()
for power in [1., 1.5, 2., 3.]:
    flatness_loss = ugb.FlatnessLossFunction(uniform_variables, ada_coefficient=0.1, power=power)
    fl4_classifiers["fl4=%.1f" % power] = ugb.MyGradientBoostingClassifier(loss=flatness_loss, max_depth=4, 
                                n_estimators=151, learning_rate=0.2, train_variables = train_variables)

smallReport(fl4_classifiers)


We spent 78.75 seconds on parallel training
Stage 100, efficiency=0.70

In [45]:
fl4_classifiers.test_on(testX, testY).mse_curves(uniform_variables, on_signal=False)


Out[45]:
<reports.Predictions at 0x67a0050>

In [ ]: