R-based metabolomics workflow by Kultima lab

This notebook aims to show, through a series of examples, how to set up a metabolomics workflow using the Chronos REST API. As benchmark case we use a R-based pipeline by the Kultima lab. The aim of this pipeline is to:

Remove contaminants present in blank samples;
Remove batch specific features;
Transform the intensities to the log2 base scale;
Perform variable selection based on coefficients of variation (CV) on metabolites across replicates.

The code snippets in this notebook use Python to consume the Chronos REST API, in order to set up a Direct Acyclic Graph (DAG), which defines the workflow. Each node in the DAG represents a microservice that performs a specific task. Once the DAG is properly setup, Chronos will figure out the dependencies between the various microservices, running them in the correct order, and keeping them alive only for the time they are needed. Furthermore, independent microservices will be run in parallel.

Chronos REST API calls define nodes in the DAG. REST calls are performed through HTTP requests on some well-defined URLs, which contain arguments in JSON format. The Chronos REST API is documented in this page.

Prerequisites

There are some prerequisites to fulfill in order to successfully run the examples in this page.

Please run the following snippet and insert your MANTL control node URL



In [ ]:

    
control=input()

Please run the following code snippet and insert your MANTL admin password



In [ ]:

    
import getpass
password=getpass.getpass()

Please run the following code snippet to get some input data for the workflow



In [ ]:

    
import urllib.request
urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/phnmnl/workflow-demo/master/data/inputdata_workshop.xls", # mdownload URL
    "inputdata_workshop.xls" # local path
)

Please run the following code snippet to setup Python requests package



In [ ]:

    
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning) # suppress warnings

Step 1: Blank filter - Contaminants Removal

When perfoming any mass spectrometry (MS) study, it happens that you get plastic or other contaminent within your samples. When you furher analyze your samples with MS, these contaminant will be recorded together with the metabolites. To be able to detect and filter these out you often add blank samples (samples with only DMSO in them) in between the normal samples in the runorder.

In this step we aim to remove the contaminants detected in the blanks, from the rest of our samples. The theory behind it is to remove everything that has an intensity of X in the blanks compared to the samples. For example, everything that, in the blanks, has an intensity of 1% or higher of the intensities in the other samples.

The input data in the prerequisites will be used here as input. Hence, we need to mount the Jupyther working directory (/mnt/container-volumes/jupyter) as a volume in the docker container. Please go through the following code snippet, and use the Chronos REST API documentation, to figure out the meaning of the JSON data that is sent with the HTTP request. Once you are done with that please run the snippet and check the Chronos interface (which you can access throug the MANTL UI).



In [ ]:

    
url="https://admin:"+password+"@"+control+"/chronos/scheduler/iso8601"
json="""
{ 
    "schedule" : "R0/2030-01-01T12:00:00Z/PT1H",  
    "cpus": "0.25",
    "mem": "128",  
    "epsilon" : "PT10M",  
    "name" : "blank-filter",
    "container": {
        "type": "DOCKER",
        "image": "farmbio/blankfilter",
        "volumes": [{
            "hostPath": "/mnt/container-volumes/jupyter",
            "containerPath": "/data",
            "mode": "RW"
         }]
    },
    "command" : "Rscript BlankFilter.r /data/inputdata_workshop.xls /data/output_BlankFilter.xls",
    "owner" : "user@example.com"
}
"""
response=requests.post(url, headers = {'content-type' : 'application/json'}, data=json, verify=False)
print("HTTP response code: " + str(response.status_code))

N.B. You can ignore any warning about unverified HTTP requests. Response code 204 means that the REST call succeed.

Step 2: BatchfeatureRemoval - Removal of batch specific features

In MS studies you may, if you have many samples, prepare the samples in batches. By doing so you introduce a risk of having features that are unique within a batch, which is not desirable.

In this step we remove features that have a coverage of 80% within one batch, but not in any other. In other words, here we remove batch specific features.

The input data of this step comes form the blank filter, hence in the JSON parameters we will set the previous step as parent. In this way Chronos will make sure to run the microservices in the correct order. Please go through the following code snippet, run it and check the Chronos interface.



In [ ]:

    
url="https://admin:"+password+"@"+control+"/chronos/scheduler/dependency"
json="""
{ 
    "parents" : ["blank-filter"],
    "cpus": "0.25",
    "mem": "128",  
    "epsilon" : "PT10M",  
    "name" : "batchfeature-removal",
    "container": {
        "type": "DOCKER",
        "image": "farmbio/batchfeatureremoval",
        "volumes": [{
            "hostPath": "/mnt/container-volumes/jupyter",
            "containerPath": "/data",
            "mode": "RW"
         }]
    },
    "command" : "Rscript BatchfeatureRemoval.r /data/output_BlankFilter.xls /data/output_BatchfeatureRemoval.xls",
    "owner" : "user@example.com"
}
"""
response=requests.post(url, headers = {'content-type' : 'application/json'}, data=json, verify=False)
print("HTTP response code: " + str(response.status_code))

Step 3: log2transformation - Transforming the data to the log2 base scale

In this step we map features intensities to the the log2 scale. Additionally, missing values are imputed by zeros.