Data input for BIDS datasets

DataGrabber and SelectFiles are great if you are dealing with generic datasets with arbitrary organization. However, if you have decided to use Brain Imaging Data Structure (BIDS) to organize your data (or got your hands on a BIDS dataset) you can take advantage of a formal structure BIDS imposes. In this short tutorial, you will learn how to do this.

pybids - a Python API for working with BIDS datasets

pybids is a lightweight python API for querying BIDS folder structure for specific files and metadata. You can install it from PyPi:

pip install pybids

Please note it should be already installed in the tutorial Docker image.

The layout object and simple queries

To begin working with pybids we need to initialize a layout object. We will need it to do all of our queries


In [ ]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds000114/")

In [ ]:
!tree -L 4 /data/ds000114/

Let's figure out what are the subject labels in this dataset


In [ ]:
layout.get_subjects()

What datatypes are included in this dataset?


In [ ]:
layout.get_datatypes()

Which different data suffixes are included in this dataset?


In [ ]:
layout.get_suffixes(datatype='func')

What are the different tasks included in this dataset?


In [ ]:
layout.get_tasks()

We can also ask for all of the data for a particular subject and one datatype.


In [ ]:
layout.get(subject='01', datatype="anat", session="test")

We can also ask for a specific subset of data. Note that we are using extension filter to get just the imaging data (BIDS allows both .nii and .nii.gz so we need to include both).


In [ ]:
layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'])

You probably noticed that this method does not only return the file paths, but objects with relevant query fields. We can easily extract just the file paths.


In [ ]:
layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'], return_type='file')

Exercise 1:

List all files for the "linebisection" task for subject 02.


In [ ]:
#write your solution here

In [ ]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds000114/")

layout.get(subject='02', return_type='file', task="linebisection")

BIDSDataGrabber: Including pybids in your nipype workflow

This is great, but what we really want is to include this into our nipype workflows. To do this, we can import BIDSDataGrabber, which provides an Interface for BIDSLayout.get


In [ ]:
from nipype.interfaces.io import BIDSDataGrabber
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.utility import Function

bg = Node(BIDSDataGrabber(), name='bids-grabber')
bg.inputs.base_dir = '/data/ds000114'

You can define static filters, that will apply to all queries, by modifying the appropriate input


In [ ]:
bg.inputs.subject = '01'
res = bg.run()
res.outputs

Note that by default BIDSDataGrabber will fetch nifti files matching datatype func and anat, and output them as two output fields.

To define custom fields, simply define the arguments to pass to BIDSLayout.get as dictionary, like so:


In [ ]:
bg.inputs.output_query = {'bolds': dict(suffix='bold')}
res = bg.run()
res.outputs

This results in a single output field bold, which returns all files with suffix:bold for subject:"01"

Now, lets put it in a workflow. We are not going to analyze any data, but for demonstration purposes, we will add a couple of nodes that pretend to analyze their inputs


In [ ]:
def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeBOLD")

In [ ]:
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD, "paths")
wf.run()

Exercise 2:

Modify the BIDSDataGrabber and the workflow to collect T1ws images for subject 10.


In [ ]:
# write your solution here

In [ ]:
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex2_BIDSDataGrabber = BIDSDataGrabber()
ex2_BIDSDataGrabber.inputs.base_dir = '/data/ds000114'
ex2_BIDSDataGrabber.inputs.subject = '10'
ex2_BIDSDataGrabber.inputs.output_query = {'T1w': dict(datatype='anat')}

ex2_res = ex2_BIDSDataGrabber.run()
ex2_res.outputs

Iterating over subject labels

In the previous example, we demonstrated how to use pybids to "analyze" one subject. How can we scale it for all subjects? Easy - using iterables (more in Iteration/Iterables).


In [ ]:
bg_all = Node(BIDSDataGrabber(), name='bids-grabber')
bg_all.inputs.base_dir = '/data/ds000114'
bg_all.inputs.output_query = {'bolds': dict(suffix='bold')}
bg_all.iterables = ('subject', layout.get_subjects()[:2])
wf = Workflow(name="bids_demo")
wf.connect(bg_all, "bolds", analyzeBOLD, "paths")
wf.run()

Accessing additional metadata

Querying different files is nice, but sometimes you want to access more metadata. For example RepetitionTime. pybids can help with that as well


In [ ]:
layout.get_metadata('/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz')

Can we incorporate this into our pipeline? Yes, we can! To do so, let's use a Function node to use BIDSLayout in a custom way. (More about MapNode in MapNode)


In [ ]:
def printMetadata(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ str(layout.get_metadata(path)["RepetitionTime"]) + "\n\n")
    
analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                             output_names=[]), name="analyzeBOLD2", iterfield="path")
analyzeBOLD2.inputs.data_dir = "/data/ds000114/"

In [ ]:
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD2, "path")
wf.run()

Exercise 3:

Modify the printMetadata function to also print EchoTime


In [ ]:
# write your solution here

In [ ]:
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex3_BIDSDataGrabber = Node(BIDSDataGrabber(), name='bids-grabber')
ex3_BIDSDataGrabber.inputs.base_dir = '/data/ds000114'
ex3_BIDSDataGrabber.inputs.subject = '01'
ex3_BIDSDataGrabber.inputs.output_query = {'bolds': dict(suffix='bold')}

In [ ]:
# and now modify analyzeBOLD2
def printMetadata_et(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ 
          str(layout.get_metadata(path)["RepetitionTime"]) +
          "\nET: "+ str(layout.get_metadata(path)["EchoTime"])+ "\n\n")
    
ex3_analyzeBOLD2 = MapNode(Function(function=printMetadata_et, 
                                    input_names=["path", "data_dir"],
                                    output_names=[]), 
                           name="ex3", iterfield="path")
ex3_analyzeBOLD2.inputs.data_dir = "/data/ds000114/"

# and create a new workflow
ex3_wf = Workflow(name="ex3")
ex3_wf.connect(ex3_BIDSDataGrabber, "bolds", ex3_analyzeBOLD2, "path")
ex3_wf.run()

In [ ]: