Getting Started with Boutiques

As you've seen from our documentation, Boutiques is a flexible way to represent command line executables and distribute them across compute ecosystems consistently. A Boutiques tool descriptor is a JSON file that fully describes the input and output parameters and files for a given command line call (or calls, as you can include pipes(|) and ampersands (&)). There are several ways Boutiques helps you build a tool descriptor for your tool:

The boutiques command-line utility contains a validator, simulator, and other tools which can help you either find an existing descriptor you wish to model yours after, or build and test your own.
The examples provide useful references for development.

To help you aid in this process, we will walk through the process of making an tool descriptor for FSL's BET (finished product found here).

Step 1: Describing the command line

The first step in creating an tool descriptor for your command line call is creating a fully descriptive list of your command line options. If your tool was written in Python and you use the argparse library, then this is already done for you in large part. For many tools (bash, Python, or otherwise) this can be obtained by typing executing it with the -h flag. In the case of FSL's BET, we get the following:



In [2]:

    
%%bash
bet -h









    



Usage:    bet <input> <output> [options]

Main bet2 options:
  -o          generate brain surface outline overlaid onto original image
  -m          generate binary brain mask
  -s          generate approximate skull image
  -n          don't generate segmented brain image output
  -f <f>      fractional intensity threshold (0->1); default=0.5; smaller values give larger brain outline estimates
  -g <g>      vertical gradient in fractional intensity threshold (-1->1); default=0; positive values give larger brain outline at bottom, smaller at top
  -r <r>      head radius (mm not voxels); initial surface sphere is set to half of this
  -c <x y z>  centre-of-gravity (voxels not mm) of initial mesh surface.
  -t          apply thresholding to segmented brain image and mask
  -e          generates brain surface as mesh in .vtk format

Variations on default bet2 functionality (mutually exclusive options):
  (default)   just run bet2
  -R          robust brain centre estimation (iterates BET several times)
  -S          eye & optic nerve cleanup (can be useful in SIENA)
  -B          bias field & neck cleanup (can be useful in SIENA)
  -Z          improve BET if FOV is very small in Z (by temporarily padding end slices)
  -F          apply to 4D FMRI data (uses -f 0.3 and dilates brain mask slightly)
  -A          run bet2 and then betsurf to get additional skull and scalp surfaces (includes registrations)
  -A2 <T2>    as with -A, when also feeding in non-brain-extracted T2 (includes registrations)

Miscellaneous options:
  -v          verbose (switch on diagnostic messages)
  -h          display this help, then exits
  -d          debug (don't delete temporary intermediate images)

Looking at all of these flags, we see a list of options which can be summarized by:

bet [INPUT_FILE] [MASK] [FRACTIONAL_INTENSITY] [VERTICAL_GRADIENT] [CENTER_OF_GRAVITY] [OVERLAY_FLAG] [BINARY_MASK_FLAG] [APPROX_SKULL_FLAG] [NO_SEG_OUTPUT_FLAG] [VTK_VIEW_FLAG] [HEAD_RADIUS] [THRESHOLDING_FLAG] [ROBUST_ITERS_FLAG] [RES_OPTIC_CLEANUP_FLAG] [REDUCE_BIAS_FLAG] [SLICE_PADDING_FLAG] [MASK_WHOLE_SET_FLAG] [ADD_SURFACES_FLAG] [ADD_SURFACES_T2] [VERBOSE_FLAG] [DEBUG_FLAG]

Now that we have summarized all command line options for our tool - some of which describe inputs and others, outputs - we can begin to craft our JSON Boutiques tool descriptor.

Step 2: Understanding Boutiques + JSON

For those unfamiliar with JSON, we recommend following this 3 minute JSON tutorial to get you up to speed. In short, a JSON file is a dictionary object which contains keys and associated values. A key informs us what is being described, and a value is the description (which, importantly, can be arbitrarily typed). The Boutiques tool descriptor is a JSON file which requires the following keys, or, properties:

name
description
schema-version
command-line
inputs
output-files

Some additional, optional, properties that a Boutiques fill will recognize are:

groups
tool-version
suggested-resources
container-image:
- type
- image
- index

In the case of BET, we will of course populate the required elements, but will also include tool-version and groups.

Step 3: Populating the tool descriptor

We will break-up populating the tool descriptor into two sections: adding meta-parameters (such as name, description, schema-version, command-line, tool-version, and docker-image, docker-index if we were to include them) and i/o-parameters (such as inputs, output-files, and groups).

Currently, before adding any details, our tool descriptor should looks like this:

{
    "name" : TODO,
    "tool-version": TODO,
    "description": TODO,
    "command-line": TODO,
    "scheme-version": TODO,
    "inputs": TODO,
    "output-files": TODO,
}

Step 3.1: Adding meta-parameters

Many of the meta-parameters will be obvious to you if you're familiar with the tool, or extractable from the message received earlier when you passed the -h flag into your program. We can update our JSON to be the following:

{
    "name" : "fsl_bet",
    "tool-version" : "1.0.0",
    "description" : "Automated brain extraction tool for FSL",
    "command-line" : "bet [INPUT_FILE] [MASK] [FRACTIONAL_INTENSITY] [VERTICAL_GRADIENT] [CENTER_OF_GRAVITY] [OVERLAY_FLAG] [BINARY_MASK_FLAG] [APPROX_SKULL_FLAG] [NO_SEG_OUTPUT_FLAG] [VTK_VIEW_FLAG] [HEAD_RADIUS] [THRESHOLDING_FLAG] [ROBUST_ITERS_FLAG] [RES_OPTIC_CLEANUP_FLAG] [REDUCE_BIAS_FLAG] [SLICE_PADDING_FLAG] [MASK_WHOLE_SET_FLAG] [ADD_SURFACES_FLAG] [ADD_SURFACES_T2] [VERBOSE_FLAG] [DEBUG_FLAG]",
    "schema-version" : "0.4",
    "inputs": TODO,
    "output-files": TODO,
    "groups": TODO
}

Step 3.2: Adding i/o parameters

Inputs and outputs of many applications are complicated - outputs can be dependent upon input flags, flags can be mutually exclusive or require at least one option, etc. The way Boutiques handles this is with a detailed schema which consists of options for inputs and outputs, as well as optionally specifying groups of inputs which may add additional layers of input complexity.

As you have surely noted, tools do only contain a single "name" or "version" being used, but may have many input and output parameters. This means that inputs, outputs, and groups, will be described as a list. Each element of these lists will be a dictionary following the input, output, or group schema, respectively. This means that our JSON actually looks more like this:

{
    "name" : "fsl_bet",
    "tool-version" : "1.0.0",
    "description" : "Automated brain extraction tool for FSL",
    "command-line" : "bet [INPUT_FILE] [MASK] [FRACTIONAL_INTENSITY] [VERTICAL_GRADIENT] [CENTER_OF_GRAVITY] [OVERLAY_FLAG] [BINARY_MASK_FLAG] [APPROX_SKULL_FLAG] [NO_SEG_OUTPUT_FLAG] [VTK_VIEW_FLAG] [HEAD_RADIUS] [THRESHOLDING_FLAG] [ROBUST_ITERS_FLAG] [RES_OPTIC_CLEANUP_FLAG] [REDUCE_BIAS_FLAG] [SLICE_PADDING_FLAG] [MASK_WHOLE_SET_FLAG] [ADD_SURFACES_FLAG] [ADD_SURFACES_T2] [VERBOSE_FLAG] [DEBUG_FLAG]",
    "schema-version" : "0.4",
    "inputs": [
        {TODO},
        {TODO},
        ...
    ],
    "output-files": [
        {TODO},
        {TODO},
        ...
    ],
}

As the file is beginning to grow considerably in number of lines, we will no longer show you the full JSON at each step but will simply show you the dictionaries responsible for output, input, and group entries.

Step 3.2.1: Specifying inputs

The input schema contains several options, many of which can be ignored in this first example with the exception of id, name, and type. For BET, there are several input values we can choose to demonstrate this for you. We have chosen three with considerably different functionality and therefore schemas. In particular:

[INPUT_FILE]
[FRACTIONAL_INTENSITY]
[CENTER_OF_GRAVITY]

[INPUT_FILE] The simplest of these in the [INPUT_FILE] which is a required parameter that simply expects a qualified path to a file. The dictionary entry is:

{
    "id" : "infile",
    "name" : "Input file",
    "type" : "File",
    "description" : "Input image (e.g. img.nii.gz)",
    "optional": false,
    "value-key" : "[INPUT_FILE]"
}

[FRACTIONAL_INTENSITY] This parameter documents an optional flag that can be passed to the executable. Along with the flag, when it is passed, is a floating point value that can range from 0 to 1. We are able to validate at the level of Boutiques whether or not a valid input is passed, so that jobs are not submitted to the execution engine which will error, but they get flagged upon validation of inputs. This dictionary is:

{
    "id" : "fractional_intensity",
    "name" : "Fractional intensity threshold",
    "type" : "Number",
    "description" : "Fractional intensity threshold (0->1); default=0.5; smaller values give larger brain outline estimates",
    "command-line-flag": "-f",
    "optional": true,
    "value-key" : "[FRACTIONAL_INTENSITY]",
    "integer" : false,
    "minimum" : 0,
    "maximum" : 1
}

[CENTER_OF_GRAVITY] The center of gravity value expects a triple (i.e. [X, Y, Z] position) if the flag is specified. Here we are able to set the condition that the length of the list received after this flag is 3, by specifying that the input is a list that has both a minimum and maximum length.

{
    "id" : "center_of_gravity",
    "name" : "Center of gravity vector",
    "type" : "Number",
    "description" : "The xyz coordinates of the center of gravity (voxels, not mm) of initial mesh surface. Must have exactly three numerical entries in the list (3-vector).",
    "command-line-flag": "-c",
    "optional": true,
    "value-key" : "[CENTER_OF_GRAVITY]",
    "list" : true,
    "min-list-entries" : 3,
    "max-list-entries" : 3
}

For further examples of different types of inputs, feel free to explore more examples.

Step 3.2.2: Specifying outputs

The output schema also contains several options, with the only mandatory ones being id, name, and path-template. We again demonstrate an example from BET:

outfile

outfile All of the output parameters in BET are similarly structured, and exploit the same core functionality of basing the output file, described by path-template, as a function of an input value on the command line, here given by [MASK]. The optional flag also describes whether or not a derivative should always be produced, and whether Boutiques should indicate an error if a file isn't found. The output descriptor is thus:

{
    "id" : "outfile",
    "name" : "Output mask file",
    "description" : "Main default mask output of BET",
    "path-template" : "[MASK].nii.gz",
    "optional" : true
}

An extension of the feature of naming outputs based on inputs exists in newer versions of the schema than this example was originally developed, and enable stripping the extension of the input values used, as well. An example of this can be seen here.

Step 3.2.3: Specifying groups

The group schema enables provides an additional layer of complexity when considering the relationships between inputs. For instance, if multiple inputs within a set are mutually exclusive, they may be grouped and a flag set indicating that only one can be selected. Alternatively, if at least one option within a group must be specified, the user can also set a flag indicating such. The following group from the BET implementation is used to illustrate this:

variational_params_group

variational_params_group Many flags exist in BET, and each of them is represented in the command line we specified earlier. However, as you may have noticed when reading the output of bet -h, several of these options are mutually exclusive to one another. In order to again prevent jobs from being submitted to a scheduler and failing there, Boutiques enables grouping of inputs and forcing such mutual exclusivity so that the invalid inputs are flagged in the validation stage. This group dictionary is:

{
    "id" : "variational_params_group",
    "name" : "Variations on Default Functionality",
    "description" : "Mutually exclusive options that specify variations on how BET should be run.",
    "members" : ["robust_iters_flag", "residual_optic_cleanup_flag", "reduce_bias_flag", "slice_padding_flag", "whole_set_mask_flag", "additional_surfaces_flag", "additional_surfaces_t2"],
    "mutually-exclusive" : true
}

Though an example of one-is-required input groups is not available in our BET example, you can investigate a validated tool descriptor here to see how it is implemented.

Step 3.3: (optional) Extending the tool descriptor

Now that the basic implementation of this tool has been done, you can check out the schema to explore deeper functionality of Boutiques. For example, if you have created a Docker or Singularity container, you can associate an image with your tool descriptor and any compute resource with Docker or Singularity installed will launch the executable through them (an example of using Docker can be found here).

Step 4: Validating the tool descriptor

Once you've completed your Boutiques tool descriptor, you should run the validator to ensure that you have created it correctly. The README.md here describes how to install and use the validator and remainder of the Boutiques shell (bosh) tools on your tool descriptor.

Step 5: Using the tool descriptor

Once the tool descriptor has been validated, your tool is now ready to be integrated in a platform that supports Boutiques. You can use the localExec.py tool described here to launch your container locally for preliminary testing. Once you feel comfortable with your tool, you can contact your system administrator and have them integrate it into their compute resources so you can test and use it to process your data.