Orange contains many learners which can be used to fit models for classification, regression and other tasks. But it is also very simple to write your own learner. To start, define a subclass of the Orange.classification.Learner
base class and implement either one or both of the fit methods: fit
works on data matrices represented as numpy arrays, while the more general fit_storage
uses the encapsulating Orange.data.Storage
object (or a subclass such as Orange.data.Table
).
After the necessary computations, the learner should produce a fitted model object, derived from the Orange.classification.Model
base class, which needs to implement predict
or predict_storage
.
We choose to implement the fit
method in this example, since the least squares coefficients are easily computed with standard numpy operations on data matrices. Similarly, the model class implements the predict
method, to make predictions of target values for new data instances.
In [1]:
from numpy.linalg import pinv
from Orange.classification import Learner, Model
In [2]:
class LinearRegression(Learner):
def fit(self, X, Y, W=None):
coef = pinv(X.T.dot(X)).dot(X.T).dot(Y)
return LinearRegressionModel(coef)
class LinearRegressionModel(Model):
def __init__(self, coef):
self.coef = coef
def predict(self, X):
return X.dot(self.coef)
Note that the above simplified version of linear regression does not fit the intercept and ignores instance weights.
We can evaluate its performance with cross-validation on one of the data sets provided in Orange.
In [3]:
import Orange
housing = Orange.data.Table('housing')
learners = [Orange.regression.MeanLearner(), Orange.regression.LinearRegressionLearner(), LinearRegression()]
res = Orange.evaluation.CrossValidation(housing, learners)
Orange.evaluation.RMSE(res)
Out[3]:
We see that the error is much lower than predicting the mean value, but slightly higher than from the included LinearRegressionLearner
from Orange using scikit-learn. Try adding a column of ones to the existing input features to allow model bias and check the improvement.
Suppose we wish to know how much time was spent to fit a model. The following wrapper uses an existing learner to fit the model, but measures the time spent and stores it in a special attribute.
Because we do not need to manipulate the data matrices (X
, Y
, W
) it is easier to implement the fit_storage
method instead of fit
, which differs only in its arguments and receives the data packed in a single Orange.data.Storage
object.
In [4]:
from time import time
In [5]:
class TimedLearner(Learner):
def __init__(self, learner):
self.learner = learner
self.time = 0
def fit_storage(self, data):
t = time()
model = self.learner(data)
model.time = time() - t
self.time += model.time
return model
This time we did not need to write a Model class since we return the same model instance of the model as the base learner. An additional attribute time
is added to the model containing the time in seconds used to fit it. This time is also added to the cumulative time used to fit all models and stored as an attribute of the learner.
In [6]:
tl = TimedLearner(Orange.regression.LinearRegressionLearner())
m1 = tl(housing)
print(m1.time)
m2 = tl(housing)
print(m2.time)
print(tl.time)