With eacxh boosting iteration (step), the learning attention of the classifier is applied to the training data, but the learning focus is on different examples of the set through adaptive weighting (saids this from adabag paper). When all iterations are complete, the classifiers produced in each iteration are combined into a final classifier based on the training data. Adaboost can only be applied to binary classification problems. Though, Adaboost.M! and SAMME can be applied to multclass classification problems.
The concept of the margin is important. The margin is related to the certainty of its classification. It is the difference between the support of the correc class an the maximum support of an incorrect class.
All wrong classified examples will have negative margins, and correctly classified ones postive margins. A correctly classified observation with a high degree of confidence will have margins close to one. On the other hand, examples with an uncertain classification will have small margins, that is, margins close to zero. Since a small margin is an instability symptom in the assigned class, the same example could be assigned to different classes by similar classifiers.
A paper on Margins, that I must read is: Kuncheva LI (2004). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons. It creates a visualisation, using a margin dsitribution generating a cumulative distributions of margins.
Provides both boosting and bagging capabilities. For boosting it is only suitable for dichotomous tasks. It implements the AdaBoost.M1 and SAMME algorithms. The classification trees are base classifiers.
Interesting point, the calculated weights that get applied to the next classifier using the previous output of the classifier, are always summed to one, and the weights are applied in a wway that the weight for a wrongly classified observeraion is increaed. what does this mean, it means it forces the classifier in the next iteraiton to focus on the hardest example.
Even better, when the classifier is reasonably accuray, the differences in the weight updates for the next classifier will be greater, as there will be fewer mistakes to focus on, and these take on more importance.
Plis, the alapha constant is used in the final decision rule giving more important to the classifiers that made a lower error (not clear).
Accuracy is estimated in a separated dataset, or though cross validation.
Adabag provide boosting only for classification tasks, that is regression is not available. With Adabag, the difference between Adaboost.M1 and SAMME looks to be the the way alpha constant (the learning rate) is calculated.
Functions Provided by the package
Note, the boosting object returned has an importance vector. This returns the relative importantce/contriution of each variable in the classification task. So it allows quantifying the relative importance of the preditor variables.
Note, there was comment enabling the margin of the class prediction for each observation and the error evolutino to be calculated. We will show this later.
Provides a boosting framework through which mulitple loss functions can be applied. This enables the creation of regression/binary classification/multi classification classifiers. The gbm enables the application of the adaboost.m1 through the use of the adaboost loss function. That is, ababoost, can be seen as a specialisation of Gradient Descent Boosting.
In [3]:
library("adabag")
# load the data set required, in this case iris
data("iris")
# break the data into training
train <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))
# create the classifier, 10 trees and depth of 1 to predict the Species
# (categorical variable) making this multi-classification
iris.adaboost <- boosting(Species ~ ., data = iris[train, ], mfinal = 10, control = rpart.control(maxdepth = 1))
# lets view the data produced by the boost object
iris.adaboost
In [6]:
# build a confusion matrix
table(iris.adaboost$class, iris$Species[train], dnn = c("Predicted Class", "Observed Class"))
In [7]:
# calculate the error rate of the training sample
1 - sum(iris.adaboost$class == iris$Species[train]) / length(iris$Species[train])
Making predictions with the adabag object is different to the gbm. There are some simliarities but also some preferencest to gbm.
Both allow for newdata to be added, and take the respective ensemble model as a parameters. The new data added in both cases contains the values for which the predictionsa re required, and should contain the predictive features. Finally, there is a newmFinal optionm fixnig the number of trees to be used. It allows pruning but does not provide a recommended bumeber of tree, by default all the trees in the obejct are used.
Predictions with Adabag boosting
R Packages
R data packages
Alfaro, E., Gámez, M. and Garcia, N., 2013. Adabag: An R package for classification with boosting and bagging. Journal of Statistical Software, 54(2), pp.1-35.