Playing with Regression

This notebook will help us with testing different regression techniques, and demonstrate the diagnostic class which can be used to find the optimal parameters for COBRA.

So for now we will generate a random data-set and try some of the popular regression techniques on it, after it has been loaded to COBRA.

Imports


In [1]:
from pycobra.cobra import Cobra
from pycobra.diagnostics import Diagnostics
import numpy as np
%matplotlib inline

Setting up data set


In [2]:
# setting up our random data-set
rng = np.random.RandomState(1)

# D1 = train machines; D2 = create COBRA; D3 = calibrate epsilon, alpha; D4 = testing
n_features = 20
D1, D2, D3, D4 = 200, 200, 200, 200
D = D1 + D2 + D3 + D4
X = rng.uniform(-1, 1, D * n_features).reshape(D, n_features)
Y = np.power(X[:,1], 2) + np.power(X[:,3], 3) + np.exp(X[:,10]) 
# Y = np.power(X[:,0], 2) + np.power(X[:,1], 3)

# training data-set
X_train = X[:D1 + D2]
X_test = X[D1 + D2 + D3:D1 + D2 + D3 + D4]
X_eps = X[D1 + D2:D1 + D2 + D3]
# for testing
Y_train = Y[:D1 + D2]
Y_test = Y[D1 + D2 + D3:D1 + D2 + D3 + D4]
Y_eps = Y[D1 + D2:D1 + D2 + D3]

Setting up COBRA

Let's up our COBRA machine with the data.


In [3]:
cobra = Cobra(random_state=0, epsilon=0.5)
cobra.fit(X_train, Y_train, default=False)


Out[3]:
Cobra(epsilon=0.5, machines=None, random_state=0)

When we are fitting, we initialise COBRA with an epsilon value of $0.5$ - this is because we are aware of the distribution and 0.5 is a fair guess of what would be a "good" epsilon value, because the data varies from $-1$ to $1$.

If we do not pass the $\epsilon$ parameter, we perform a CV on the training data for an optimised epsilon.

It can be noticed that the default parameter is set as false: this is so we can walk you through what happens when COBRA is set-up, instead of the deafult settings being used.

We're now going to split our dataset into two parts, and shuffle data points.


In [4]:
cobra.split_data(D1, D1 + D2, shuffle_data=True)


Out[4]:
Cobra(epsilon=0.5, machines=None, random_state=0)

Let's load the default machines to COBRA.


In [5]:
cobra.load_default()


Out[5]:
Cobra(epsilon=0.5, machines=None, random_state=0)

We note here that further machines can be loaded using either the loadMachine() and loadSKMachine() methods. The only prerequisite is that the machine has a valid predict() method.

Using COBRA's machines

We've created our random dataset and now we're going to use the default sci-kit machines to see what the results look like.


In [6]:
query = X_test[9].reshape(1, -1)

In [7]:
cobra.machines_


Out[7]:
{'lasso': LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
     max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
     precompute='auto', random_state=0, selection='cyclic', tol=0.0001,
     verbose=False),
 'random_forest': RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
            max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False),
 'ridge': RidgeCV(alphas=(0.1, 1.0, 10.0), cv=None, fit_intercept=True, gcv_mode=None,
     normalize=False, scoring=None, store_cv_values=False),
 'tree': DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
            max_leaf_nodes=None, min_impurity_decrease=0.0,
            min_impurity_split=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=0, splitter='best')}

In [8]:
cobra.machines_['lasso'].predict(query)


Out[8]:
array([ 0.03458938])

In [9]:
cobra.machines_['tree'].predict(query)


Out[9]:
array([ 0.22769628])

In [10]:
cobra.machines_['ridge'].predict(query)


Out[10]:
array([ 0.06747291])

In [11]:
cobra.machines_['random_forest'].predict(query)


Out[11]:
array([ 0.3382969])

Aggregate!

By using the aggregate function we can combine our predictors. You can read about the aggregation procedure either in the original COBRA paper or look around in the source code for the algorithm.

We start by loading each machine's predictions now.


In [12]:
cobra.load_machine_predictions()


Out[12]:
Cobra(epsilon=0.5, machines=None, random_state=0)

In [13]:
cobra.predict(query)


Out[13]:
array([ 0.17416263])

In [14]:
Y_test[9]


Out[14]:
0.0095390633892067367

Optimizing COBRA

To squeeze the best out of COBRA we make use of the COBRA diagnostics class. With a grid based approach to optimizing hyperparameters, we can find out the best epsilon value, number of machines (alpha value), and combination of machines.

Let's check the MSE for each of COBRAs machines:


In [15]:
cobra_diagnostics = Diagnostics(cobra, X_test, Y_test, load_MSE=True)

In [16]:
cobra_diagnostics.machine_MSE


Out[16]:
{'COBRA': 0.11676229232677286,
 'lasso': 0.14418311058520605,
 'random_forest': 0.1087898042447215,
 'ridge': 0.17014683933749455,
 'tree': 0.16080252584960136}

This error is bound by the value $C\mathscr{l}^{\frac{-2}{M + 2}}$ upto a constant $C$, which is problem dependant. For more details, we refer the user to the original paper.


In [17]:
cobra_diagnostics.error_bound


Out[17]:
0.005

Playing with Data-Splitting

When we initially started to set up COBRA, we split our training data into two further parts - $D_k$, and $D_l$. This split was done 50-50, but it is upto us how we wish to do this. The following section will compare 20-80, 60-40, 50-50, 40-60, 80-20 and check for which case we get the best MSE values, for a fixed Epsilon (or use a grid).


In [18]:
cobra_diagnostics.optimal_split(X_eps, Y_eps)


Out[18]:
((0.6, 0.4), 0.094492508707899853)

What we saw was the default result, with the optimal split ratio and the corresponding MSE. We can do a further analysis here by enabling the info and graph options, and using more values to split on.


In [19]:
split = [(0.05, 0.95), (0.10, 0.90), (0.20, 0.80), (0.40, 0.60), (0.50, 0.50), (0.60, 0.40), (0.80, 0.20), (0.90, 0.10), (0.95, 0.05)]

In [20]:
cobra_diagnostics.optimal_split(X_eps, Y_eps, split=split, info=True, graph=True)


Out[20]:
{(0.05, 0.95): 0.22007647014229548,
 (0.1, 0.9): 0.14938196212577773,
 (0.2, 0.8): 0.14173976690625464,
 (0.4, 0.6): 0.12201183033504397,
 (0.5, 0.5): 0.10572881591328129,
 (0.6, 0.4): 0.094492508707899853,
 (0.8, 0.2): 0.12038399389925082,
 (0.9, 0.1): 0.15058892872027313,
 (0.95, 0.05): 0.31485718698333626}

Alpha, Epsilon and Machines

The following are methods to idetify the optimal epislon values, alpha values, and combination of machines. The grid methods allow for us to predict for a single point the optimal alpha/machines and epsilon combination.

Epsilon

The epsilon paramter controls how "strict" cobra should behave while choosing the points to aggregate.


In [21]:
cobra_diagnostics.optimal_epsilon(X_eps, Y_eps, line_points=100)


Out[21]:
(0.49012865501307668, 0.10577705316919428)

Alpha

The alpha parameter decides how many machines must a point be picked up by before being added to an aggregate. The default value is 4.


In [22]:
cobra_diagnostics.optimal_alpha(X_eps, Y_eps, info=True)


Out[22]:
{1: 0.41889483894680962,
 2: 0.23664915543154075,
 3: 0.14780647200325731,
 4: 0.10572881591328129}

In this particular case, the best performance is obtained by seeking consensus over all 4 machines.

Machines

Decide which subset of machines to select for the aggregate.


In [23]:
cobra_diagnostics.optimal_machines(X_eps, Y_eps, info=True)


Out[23]:
{('lasso',): 0.1487363755470355,
 ('random_forest',): 0.12176465358802234,
 ('random_forest', 'lasso'): 0.10807224096632587,
 ('ridge',): 0.16226802562230169,
 ('ridge', 'lasso'): 0.14830358127933152,
 ('ridge', 'random_forest'): 0.10999522304007556,
 ('ridge', 'random_forest', 'lasso'): 0.10870759942102363,
 ('ridge', 'tree'): 0.12607929212145341,
 ('ridge', 'tree', 'lasso'): 0.12106638669612195,
 ('ridge', 'tree', 'random_forest'): 0.10644604534573869,
 ('ridge', 'tree', 'random_forest', 'lasso'): 0.10572881591328129,
 ('tree',): 0.17175698225945865,
 ('tree', 'lasso'): 0.11994542013758974,
 ('tree', 'random_forest'): 0.11402675943319736,
 ('tree', 'random_forest', 'lasso'): 0.10504926738092327}

In [24]:
cobra_diagnostics.optimal_alpha_grid(X_eps[0], Y_eps[0], line_points=100)


Out[24]:
((4, 0.52513784465686786), array([  8.91114164e-08]))

In [25]:
cobra_diagnostics.optimal_machines_grid(X_eps[0], Y_eps[0], line_points=100)


Out[25]:
((('random_forest', 'lasso'), 0.35009189643791189), array([  1.72540771e-08]))

Increasing the number of line points helps in finding a better optimal value. These are the results for the same point. The MSEs are to the second value of the tuple.

With 10: ((('ridge', 'random_forest', 'lasso'), 1.1063905961135443), 0.96254542159345469)

With 20: ((('tree', 'random_forest'), 0.87346626008964035), 0.53850941611803993)

With 50: ((('ridge', 'tree'), 0.94833479666875231), 0.48256303899450931)

With 100: ((('ridge', 'tree', 'random_forest'), 0.10058096328304948), 0.30285776885759158)

With 200: ((('ridge', 'tree', 'lasso'), 0.10007553130675276), 0.30285776885759158)

pycobra is Licensed under the MIT License - https://opensource.org/licenses/MIT