In previous examples, we only use scikit-learn algorithms. In this example, we will learn how to use another Python machine learning library. You need to install ParsimonY to run this example: https://github.com/neurospin/pylearn-parsimony
For Pipeline
, r2_score
and StratifiedKFold
we rely on scikit-learn objects.
In [1]:
from sklearn.cross_validation import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score
In [2]:
from mempamal.configuration import JSONify_estimator, JSONify_cv, build_dataset
from mempamal.workflow import create_wf, save_wf
from mempamal.datasets import iris
In [3]:
# iris dataset as usual but with linear regression (Why not! :p).
X, y = iris.get_data()
To ensure that parsimony.estimators.ElasticNet
is compliant with the scikit-learn interface, we create a wrapper which inherits from sklearn.base.BaseEstimator
. Notice, that this wrapper must be accessible in your PYTHONPATH
for the future tasks. In the MEmPaMaL examples we provide the ElasticNet wrapper for the sake of the example.
In [4]:
import inspect
from mempamal.examples.elasticnet_parsimony import EnetWrap
print(inspect.getsource(inspect.getmodule(EnetWrap)))
The estimator is only the EnetWrap
and we create a multi-parameters grid (enet__l
and enet__alpha
).
In [5]:
est = Pipeline([("enet", EnetWrap())])
In [6]:
alphas = [1e-4, 1e-3, 1e-2, 0.1, 1., 10., 100., 1e3]
ls = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
grid = []
for a in alphas:
for l in ls:
grid.append({"enet__l": l,
"enet__alpha": a})
print("The grid contains {} sets of parameters:".format(len(grid)))
grid
Out[6]:
We jsonify the estimator and the cross-validation configuration.
We build the dataset in the current directory.
It's create a dataset.joblib
file.
Then we create the workflow in our internal format (create_wf
).
With verbose=True
, it prints the commands on stdout
.
And finally, we output the workflow (save_wf
) in the soma-workflow format
and write it to workflow.json
(need soma-workflow).
In [7]:
method_conf = JSONify_estimator(est, out="./est.json")
cv_conf = JSONify_cv(StratifiedKFold, cv_kwargs={"n_folds": 5},
score_func=r2_score,
stratified=True,
inner_cv=StratifiedKFold,
inner_cv_kwargs={"n_folds": 5},
inner_score_func=r2_score,
out="./cv.json")
dataset = build_dataset(X, y, method_conf, cv_conf, grid=grid, outputdir=".")
wfi = create_wf(dataset['folds'], cv_conf, method_conf, ".",
verbose=True)
wf = save_wf(wfi, "./workflow.json", mode="soma-workflow")
Now, we create a WorkflowController
and we submit the workflow.
We wait for workflow completion then we read the final results.
In [8]:
from soma_workflow.client import WorkflowController
import time
import json
import sklearn.externals.joblib as joblib
controller = WorkflowController()
wf_id = controller.submit_workflow(workflow=wf, name="third example")
while controller.workflow_status(wf_id) != 'workflow_done':
time.sleep(2)
print(joblib.load('./final_res.pkl'))