An iterative algorithm for "ensembling" base learners
Hint 1: You need to figure out the objective function being minimized. For simplicity, assume there are a finite number of weak learners in $\mathscr{F}$
Hint 2: Recall that the exponential loss function is $\ell(h; (x,y)) = \exp(-y h(x))$
Hint 3: Let's write down the objective function being minimized. For simplicity, assume there are a finite number of weak learners in $\mathscr{F}$, say indexed by $j=1, \ldots, m$. Given a weight vector $\vec{\alpha}$, exponential loss over the data for this $\vec{\alpha}$ is: $$\text{Loss}(\vec{\alpha}) = \sum_{i=1}^n \exp \left( - y_i \left(\sum_{j=1}^m \alpha_j h_j(\vec{x}_i)\right)\right)$$ Coordinate descent chooses the smallest coordiante of $\nabla L(\vec{\alpha})$ and updates only this coordinate. Which coordinate is chosen?
Let's explore how bagging (bootstrapped aggregation) works with classifiers to reduce variance, first by evaluating off the shelf tools and then by implementing our own basic bagging classifier.
In both examples we'll be working with the dataset from the forest cover type prediction Kaggle competition, where the aim is to build a multi-class classifier to predict the forest cover type of a 30x30 meter plot of land based on cartographic features. See their notes about the dataset for more background.
In [1]:
import pandas as pd
df = pd.read_csv('forest-cover-type.csv')
df.head()
Out[1]:
Now we extract the X/y features and split them into a 60/40 train / test split so that we can see how well the training set performance generalizes to a heldout set.
In [2]:
X, y = df.iloc[:, 1:-1].values, df.iloc[:, -1].values
In [3]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.6, random_state=0)
Now let's use an off the shelf decision tree classifier and compare its train/test performance with a bagged decision tree.
In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score
models = [
('tree', DecisionTreeClassifier(random_state=0)),
('bagged tree', BaggingClassifier(
DecisionTreeClassifier(random_state=0),
random_state=0,
n_estimators=10))
]
for label, model in models:
model.fit(X_train, y_train)
print("{} training|test accuracy: {:.2f} | {:.2f}".format(
label,
accuracy_score(y_train, model.predict(X_train)),
accuracy_score(y_test, model.predict(X_test))))
Note that both models were able to (nearly) fit the training set perfectly, and that bagging substantially improves test set performance (reduces variance).
Let's look at two hyperparametes associated with the bagging classifier:
The default number of estimators is 10; explore the performance of the bagging classifier with a range values. How many classifiers do we need to reduce variance? What is the point of diminishing returns for this dataset?
In [5]:
# your code goes here!
By default, max_samples is set to 1.0, which means each classifier gets a number of samples equal to the size of the training set.
How do you suppose bagging manages to reduce variance while still using the same number of samples?
Explore how the performance varies as you range max_samples (note, you can use float values between 0.0 and 1.0 to choose a percentage):
In [6]:
# your code goes here!
In [7]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.base import BaseEstimator
import numpy as np
class McBaggingClassifier(BaseEstimator):
def __init__(self, classifier_factory=DecisionTreeClassifier, num_classifiers=10):
self.classifier_factory = classifier_factory
self.num_classifiers = num_classifiers
def fit(self, X, y):
# create num_classifier classifiers calling classifier_factory, each
# fitted with a different sample from X
return self
def predict(self, X):
# get the prediction for each classifier, take a majority vote
return np.ones(X.shape[0])
You should be able to achieve similar performance to scikit-learn's implementation:
In [8]:
our_models = [
('tree', DecisionTreeClassifier(random_state=0)),
('our bagged tree', McBaggingClassifier(
classifier_factory=lambda: DecisionTreeClassifier(random_state=0)
))
]
for label, model in our_models:
model.fit(X_train, y_train)
print("{} training|test accuracy: {:.2f} | {:.2f}".format(
label,
accuracy_score(y_train, model.predict(X_train)),
accuracy_score(y_test, model.predict(X_test))))