In [1]:
# Sets up import path, if you're running from the source distrib
import os, sys
sys.path.append(os.path.join(os.getcwd(), '../..'))

In [2]:
import traceback

VisTrails API example

This notebook showcases the new API. Inlined are some comments and explanations.


In [3]:
import vistrails as vt

The new API is exposed under the top-level vistrails package. The moment you use one of the API functions, like load_vistrail(), it will create an application and load the same configuration that the VisTrails application uses (although it will automatically enable packages the moment you need them).


In [4]:
vt.ipython_mode(True)

This explicitely requests IPythonMode to be enabled on output modules, so that pipeline executions will put results on the notebook (similarly to %matplotlib inline for matplotlib plots).

Vistrails and Pipelines

You can get a Vistrail through load_vistrail().


In [5]:
vistrail = vt.load_vistrail('simplemath.vt')

A Vistrail is a whole version tree, where each version is a different pipeline. From it we can get Pipelines, but it is also stateful (i.e. has a current version); this is useful for editing (creating new versions from the current one). It also provides the interface that Pipeline has, implicitely acting on the current_pipeline.

If GraphViz is available, Vistrail and Pipeline will be rendered in the IPython notebook.


In [6]:
vistrail


Out[6]:
G 0 28 Added annotation 0->28
<Vistrail: simplemath.vt, version -1, not changed>

In [7]:
vistrail.select_latest_version()

In [8]:
vistrail


Out[8]:
G 0 28 Added annotation 0->28
<Vistrail: simplemath.vt, version 28, not changed>

In [9]:
vistrail.get_pipeline(2)


Out[9]:
_anonymous_0 module0 PythonCalc
<Pipeline: 1 modules, 0 connections>

Packages

Only basic_modules (and abstractions?) are loaded on initialization, so that using the API stays fast. A package might be auto-enabled when it is requested, which is efficient and convenient.

Note that load_package() only accepts package identifiers.


In [10]:
tabledata = vt.load_package('org.vistrails.vistrails.tabledata')
tabledata


Out[10]:
<Package: org.vistrails.vistrails.tabledata, 23 modules>

You can get Modules from the package using the dot or bracket syntax. These modules are "dangling" modules, not yet instanciated in a specific pipeline/vistrail.

These will be useful once editing pipelines is added to the API.


In [11]:
tabledata.convert


Out[11]:
<Namespace convert of package org.vistrails.vistrails.tabledata>

In [12]:
from vistrails.core.modules.module_registry import MissingModule
try:
    tabledata['convert']  # can't get namespaces this way, use a dot
except MissingModule:
    pass
else:
    assert False

In [13]:
tabledata.BuildTable, tabledata['BuildTable']


Out[13]:
(vistrails.core.api.BuildTable, vistrails.core.api.BuildTable)

In [14]:
tabledata.read.CSVFile, tabledata['read|CSVFile']


Out[14]:
(vistrails.core.api.CSVFile, vistrails.core.api.CSVFile)

(note: IPython bug 6709 causes the 'vistrails.core.api.' prefixes above)

Pipeline manipulation

Unfortunately this is not yet available, stay tuned!

Execution

In addition to executing a Pipeline or Vistrail, you can easily pass values in on InputPort modules (to use subworkflows as Python functions) and get results out (either on OutputPort modules or any port of any module).

Execution returns a Results object from which you can get all of this. In addition, output modules (such as matplotlib's MplFigureOutput) will output to the IPython notebook if possible.

Gets output


In [15]:
outputs = vt.load_vistrail('outputs.vt')
outputs.select_version(1)
outputs


Out[15]:
G 0 1 Added module 0->1 5 Added parameter 1->5
<Vistrail: outputs.vt, version 1, not changed>

In [16]:
# Errors
try:
    result = outputs.execute()
except vt.ExecutionErrors:
    traceback.print_exc()
else:
    assert False


Traceback (most recent call last):
  File "<ipython-input-16-979bf6416e43>", line 3, in <module>
    result = outputs.execute()
  File "/home/remram/Documents/programming/dat/vistrails/examples/api/../../vistrails/core/api.py", line 259, in execute
    return self.current_pipeline.execute(*args, **kwargs)
  File "/home/remram/Documents/programming/dat/vistrails/examples/api/../../vistrails/core/api.py", line 482, in execute
    raise ExecutionErrors(self, result)
ExecutionErrors: Pipeline execution failed: 1 error:
0: Missing value from port value

In [17]:
# Results
outputs.select_latest_version()
result = outputs.execute()
result


Out[17]:
<ExecutionResult: 2 modules>

In [18]:
outputs


Out[18]:
G 0 5 Added parameter 0->5
<Vistrail: outputs.vt, version 5, changed>

In [19]:
outputs.current_pipeline


Out[19]:
_anonymous_0 module0 String value module1 InternalPipe OutputPort module0:out0->module1:in0
<Pipeline: 2 modules, 1 connections; outputs: msg>

This gets the value on any output port of any module (no need to insert OutputPort or GenericOutput modules, if you know how to find the module):


In [20]:
result.module_output(0)


Out[20]:
{'self': <vistrails.core.modules.basic_modules.String at 0x5bd7bb0>,
 'value': 'Hello, world',
 'value_as_string': 'Hello, world'}

This gets the value passed to an OutputPort module, using the OutputPort's name:


In [21]:
result.output_port('msg')


Out[21]:
'Hello, world'

Sets inputs


In [22]:
pipeline = vistrail.current_pipeline
pipeline


Out[22]:
_anonymous_0 module0 value2 value1 + value module4 InternalPipe OutputPort module0:out0->module4:in0 module1 First input InternalPipe module1:out0->module0:in1 module3 value2 value1 * value module1:out0->module3:in1 module2 Second input InternalPipe module2:out0->module0:in0 module2:out0->module3:in0 module5 InternalPipe OutputPort module3:out0->module5:in0
<Pipeline: 6 modules, 6 connections; inputs: in_a, in_b; outputs: out_times, out_plus>

In [23]:
in_a = pipeline.get_input('in_a')
assert (in_a == pipeline.get_module('First input')) is True
in_a


Out[23]:
<Module 'InputPort' from org.vistrails.vistrails.basic, id 1, label "First input">

We need to provide value to this workflow, for its two InputPort modules. Input can be supplied to execute() in two ways:

  • either by using module_obj == value, where module_obj is a module obtained from the pipeline, using get_input() or get_module();
  • or by using module_name=value, where module_name is the name set on an InputPort module

Note that, to Python, module_obj is a variable and must be bound to a value (of type Module), whereas module_name is a keyword-parameter name.


In [24]:
result = pipeline.execute(in_a == 2, in_b=4)

In [25]:
result.output_port('out_times'), result.output_port('out_plus')


Out[25]:
(8.0, 6.0)

Other example


In [26]:
im = vt.load_vistrail('imagemagick.vt')

In [27]:
im.select_version('read')
im


Out[27]:
G 0 6 read 0->6 14 blur 6->14 21 edges 6->21
<Vistrail: imagemagick.vt, version 6 (tag read), not changed>

Note that if you print a File value, IPython will try to render it.


In [28]:
im.execute().output_port('result')


Out[28]:

In [29]:
im.select_version('blur')
im


Out[29]:
G 0 6 read 0->6 14 blur 6->14 21 edges 6->21
<Vistrail: imagemagick.vt, version 14 (tag blur), changed>

In [30]:
im.execute().output_port('result')


Out[30]:

In [31]:
im.select_version('edges')
im.execute().output_port('result')


Out[31]:

Output mode


In [32]:
mpl = vt.load_vistrail('../matplotlib/pie_ex1.vt')
mpl.select_latest_version()

This workflow uses MplFigureOutput, which outputs to the IPython notebook if available (and since the spreadsheet is not running).


In [33]:
mpl.execute()


WARNING:vistrails.logger:/home/remram/Documents/programming/dat/vistrails/examples/api/../../vistrails/core/modules/vistrails_module.py, line 1724
UserWarning: A Module instance was used as data: module=MplFigure, port=self, object=<vistrails.packages.matplotlib.bases.MplFigure object at 0x7f8b4028ef10>
  UserWarning)

Out[33]:
<ExecutionResult: 3 modules>

In [34]:
richtext = vt.load_vistrail('out_html.xml')
richtext.select_latest_version()

This one uses RichTextOutput:


In [35]:
richtext.execute()


this is a test hehe
Out[35]:
<ExecutionResult: 2 modules>

In [36]:
tbl = vt.load_vistrail('table.xml')
tbl.select_latest_version()

TableOutput:


In [37]:
tbl.execute()


Exported table
a b
1 4
2 5
3 6
Out[37]:
<ExecutionResult: 2 modules>

In [38]:
render = vt.load_vistrail('brain_output.xml')
render.select_latest_version()

And vtkRendererOutput:


In [39]:
render.execute()


Out[39]:
<ExecutionResult: 16 modules>