Custom learners in Orange

Orange contains many learners which can be used to fit models for classification, regression and other tasks. But it is also very simple to write your own learner. To start, define a subclass of the Orange.classification.Learner base class and implement either one or both of the fit methods: fit works on data matrices represented as numpy arrays, while the more general fit_storage uses the encapsulating Orange.data.Storage object (or a subclass such as Orange.data.Table). After the necessary computations, the learner should produce a fitted model object, derived from the Orange.classification.Model base class, which needs to implement predict or predict_storage.

Linear regression

Linear regression is of course already available through Orange.regression.LinearRegressionLearner, which uses the implementation from scikit-learn. Here, we show a simpler version using normal equations to demonstrate how to write your own numpy-based learner from scratch.

We choose to implement the fit method in this example, since the least squares coefficients are easily computed with standard numpy operations on data matrices. Similarly, the model class implements the predict method, to make predictions of target values for new data instances.


In [1]:
from numpy.linalg import pinv
from Orange.classification import Learner, Model

In [2]:
class LinearRegression(Learner):
    def fit(self, X, Y, W=None):
        coef = pinv(X.T.dot(X)).dot(X.T).dot(Y)
        return LinearRegressionModel(coef)

class LinearRegressionModel(Model):
    def __init__(self, coef):
        self.coef = coef

    def predict(self, X):
        return X.dot(self.coef)

Note that the above simplified version of linear regression does not fit the intercept and ignores instance weights.

We can evaluate its performance with cross-validation on one of the data sets provided in Orange.


In [3]:
import Orange
housing = Orange.data.Table('housing')
learners = [Orange.regression.MeanLearner(), Orange.regression.LinearRegressionLearner(), LinearRegression()]
res = Orange.evaluation.CrossValidation(housing, learners)
Orange.evaluation.RMSE(res)


Out[3]:
array([ 9.20124355,  4.87903715,  5.09814721])

We see that the error is much lower than predicting the mean value, but slightly higher than from the included LinearRegressionLearner from Orange using scikit-learn. Try adding a column of ones to the existing input features to allow model bias and check the improvement.

Wrapper

Sometimes we want to add some additional functionality to an existing learner (or learners), which can be easily done with a wrapper class.

Suppose we wish to know how much time was spent to fit a model. The following wrapper uses an existing learner to fit the model, but measures the time spent and stores it in a special attribute. Because we do not need to manipulate the data matrices (X, Y, W) it is easier to implement the fit_storage method instead of fit, which differs only in its arguments and receives the data packed in a single Orange.data.Storage object.


In [4]:
from time import time

In [5]:
class TimedLearner(Learner):
    def __init__(self, learner):
        self.learner = learner
        self.time = 0

    def fit_storage(self, data):
        t = time()
        model = self.learner(data)
        model.time = time() - t
        self.time += model.time
        return model

This time we did not need to write a Model class since we return the same model instance of the model as the base learner. An additional attribute time is added to the model containing the time in seconds used to fit it. This time is also added to the cumulative time used to fit all models and stored as an attribute of the learner.


In [6]:
tl = TimedLearner(Orange.regression.LinearRegressionLearner())
m1 = tl(housing)
print(m1.time)
m2 = tl(housing)
print(m2.time)
print(tl.time)


0.0021028518676757812
0.0014808177947998047
0.003583669662475586

Wrappers for scikit-learn methods

Coming soon...