Data input for BIDS datasets

DataGrabber and SelectFiles are great if you are dealing with generic datasets with arbitrary organization. However if you have decided to use Brain Imaging Data Structure (BIDS) to organized your data (or got your hands on a BIDS dataset) you can take advanted of a formal structure BIDS imposes. In this short tutorial you will learn how to do this.

pybids - a Python API for working with BIDS datasets

pybids is a lightweight python API for querying BIDS folder structure for specific files and metadata. You can install it from PyPi:

pip install pybids

Please note it should be already installed in the tutorial Docker image.

The layout object and simple queries

To begin working with pubids we need to initalize a layout object. We will need it to do all of our queries


In [3]:
from bids.grabbids import BIDSLayout
layout = BIDSLayout("/data/ds000114/")

In [4]:
!tree -L 4 /data/ds000114/


/data/ds000114/ [error opening dir]

0 directories, 0 files

Let's figure out what are the subject labels in this dataset


In [ ]:
layout.get_subjects()


Out[ ]:
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

What modalities are included in this dataset?


In [ ]:
layout.get_modalities()


Out[ ]:
['anat', 'dwi', 'func']

What different data types are included in this dataset?


In [ ]:
layout.get_types(modality='func')


Out[ ]:
['bold', 'brainmask', 'confounds', 'events', 'fsaverage5', 'preproc']

What are the different tasks included in this dataset?


In [ ]:
layout.get_tasks()


Out[ ]:
['covertverbgeneration',
 'fingerfootlips',
 'linebisection',
 'overtverbgeneration',
 'overtwordrepetition']

We can also ask for all of the data for a particular subject and one modality.


In [ ]:
layout.get(subject='01', modality="anat", session="test")


Out[ ]:
[File(filename='/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz', subject='01', session='test', type='T1w', modality='anat')]

We can also ask for a specific subset of data. Note that we are using extension filter to get just the imaging data (BIDS allows both .nii and .nii.gz so we need to include both).


In [ ]:
layout.get(subject='01', type='bold', extensions=['nii', 'nii.gz'])


Out[ ]:
[File(filename='/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz', subject='01', session='retest', type='bold', task='covertverbgeneration', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz', subject='01', session='retest', type='bold', task='fingerfootlips', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', subject='01', session='retest', type='bold', task='linebisection', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz', subject='01', session='retest', type='bold', task='overtverbgeneration', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz', subject='01', session='retest', type='bold', task='overtwordrepetition', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz', subject='01', session='test', type='bold', task='covertverbgeneration', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', subject='01', session='test', type='bold', task='fingerfootlips', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz', subject='01', session='test', type='bold', task='linebisection', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz', subject='01', session='test', type='bold', task='overtverbgeneration', modality='func'),
 File(filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz', subject='01', session='test', type='bold', task='overtwordrepetition', modality='func')]

You probably noticed that this method does not only return the file paths, but objects with relevant query fields. We can easily extract just the file paths.


In [ ]:
[f.filename for f in layout.get(subject='01', type='bold', extensions=['nii', 'nii.gz'])]


Out[ ]:
['/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz']

Exercise 1:

List all files for the "linebisection" task for subject 02.

Including pybids in your nipype workflow

This is great, but what we really want is to include this into our nipype workflows. How to do this? We can create our own custom BIDSDataGrabber using a Function Interface. First we need a plain Python function that for a given subject label and dataset location will return list of BOLD files.


In [ ]:
def get_niftis(subject_id, data_dir):
    # Remember that all the necesary imports need to be INSIDE the function for the Function Interface to work!
    from bids.grabbids import BIDSLayout
    
    layout = BIDSLayout(data_dir)
    
    bolds = [f.filename for f in layout.get(subject=subject_id, type="bold", extensions=['nii', 'nii.gz'])]
    
    return bolds

In [ ]:
get_niftis('01', '/data/ds000114')


Out[ ]:
['/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz',
 '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz']

Ok we got our function. Now we need to wrap it inside a Node object.


In [ ]:
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.utility import IdentityInterface, Function

In [ ]:
BIDSDataGrabber = Node(Function(function=get_niftis, input_names=["subject_id",
                                       "data_dir"],
                                   output_names=["bolds"]), name="BIDSDataGrabber")
BIDSDataGrabber.inputs.data_dir = "/data/ds000114"

In [ ]:
BIDSDataGrabber.inputs.subject_id='01'
res = BIDSDataGrabber.run()
res.outputs


170904-05:41:32,143 workflow INFO:
	 Executing node BIDSDataGrabber in dir: /tmp/tmppp9obk8m/BIDSDataGrabber
Out[ ]:
bolds = ['/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz']

Works like a charm! (hopefully :) Lets put it in a workflow. We are not going to analyze any data, but for demostrantion purposes we will add a couple of nodes that pretend to analyze their inputs


In [ ]:
def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeBOLD")

In [ ]:
wf = Workflow(name="bids_demo")
wf.connect(BIDSDataGrabber, "bolds", analyzeBOLD, "paths")
wf.run()


170904-05:41:37,713 workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging']
170904-05:41:37,719 workflow INFO:
	 Running serially.
170904-05:41:37,721 workflow INFO:
	 Executing node BIDSDataGrabber in dir: /tmp/tmpaf1g0sbc/bids_demo/BIDSDataGrabber
170904-05:41:38,119 workflow INFO:
	 Executing node analyzeBOLD in dir: /tmp/tmpk3f9hfaf/bids_demo/analyzeBOLD


analyzing ['/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz']


Out[ ]:
<networkx.classes.digraph.DiGraph at 0x7f986846e438>

Exercise 2:

Modify the BIDSDataGrabber and the workflow to include T1ws.

Iterating over subject labels

In the previous example we demostrated how to use pybids to "analyze" one subject. How can we scale it for all subjects? Easy - using iterables (more in Iteration/Iterables.


In [ ]:
BIDSDataGrabber.iterables = ('subject_id', layout.get_subjects()[:2])
wf.run()


170904-05:41:39,742 workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging']
170904-05:41:39,750 workflow INFO:
	 Running serially.
170904-05:41:39,751 workflow INFO:
	 Executing node BIDSDataGrabber.aI.a1 in dir: /tmp/tmp1zylbx6z/bids_demo/_subject_id_02/BIDSDataGrabber
170904-05:41:40,141 workflow INFO:
	 Executing node analyzeBOLD.a1 in dir: /tmp/tmp59wngf32/bids_demo/_subject_id_02/analyzeBOLD


analyzing ['/data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-linebisection_bold.nii.gz', '/data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-overtwordrepetition_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-linebisection_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-overtwordrepetition_bold.nii.gz']


170904-05:41:40,147 workflow INFO:
	 Executing node BIDSDataGrabber.aI.a0 in dir: /tmp/tmp_0qamd24/bids_demo/_subject_id_01/BIDSDataGrabber
170904-05:41:40,523 workflow INFO:
	 Executing node analyzeBOLD.a0 in dir: /tmp/tmp4b4yl0lk/bids_demo/_subject_id_01/analyzeBOLD


analyzing ['/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz', '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz']


Out[ ]:
<networkx.classes.digraph.DiGraph at 0x7f986846eef0>

Accessing additional metadata

Querying different files is nice, but sometimes you want to access more metadata. For example RepetitionTime. pybids can help with that as well


In [ ]:
layout.get_metadata('/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz')


Out[ ]:
{'EchoTime': 0.05,
 'FlipAngle': 90,
 'RepetitionTime': 2.5,
 'SliceTiming': [0.0,
  1.2499999999999998,
  0.08333333333333333,
  1.333333333333333,
  0.16666666666666666,
  1.4166666666666663,
  0.25,
  1.4999999999999996,
  0.3333333333333333,
  1.5833333333333328,
  0.41666666666666663,
  1.666666666666666,
  0.5,
  1.7499999999999993,
  0.5833333333333333,
  1.8333333333333326,
  0.6666666666666666,
  1.9166666666666659,
  0.75,
  1.9999999999999991,
  0.8333333333333333,
  2.083333333333332,
  0.9166666666666666,
  2.1666666666666656,
  1.0,
  2.249999999999999,
  1.0833333333333333,
  2.333333333333332,
  1.1666666666666665,
  2.416666666666665],
 'TaskName': 'finger_foot_lips'}

Can we incorporate this into our pipeline? Yes we can! (More about MapNode in MapNode)


In [ ]:
def printMetadata(path, data_dir):
    from bids.grabbids import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ str(layout.get_metadata(path)["RepetitionTime"]) + "\n\n")
    
analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                             output_names=[]), name="analyzeBOLD2", iterfield="path")
analyzeBOLD2.inputs.data_dir = "/data/ds000114/"

In [ ]:
wf = Workflow(name="bids_demo")
wf.connect(BIDSDataGrabber, "bolds", analyzeBOLD2, "path")
wf.run()


170904-05:41:52,762 workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging']
170904-05:41:52,770 workflow INFO:
	 Running serially.
170904-05:41:52,771 workflow INFO:
	 Executing node BIDSDataGrabber.aI.a1 in dir: /tmp/tmpo7ymsmt7/bids_demo/_subject_id_02/BIDSDataGrabber
170904-05:41:53,149 workflow INFO:
	 Executing node analyzeBOLD2.a1 in dir: /tmp/tmp6lfywdk6/bids_demo/_subject_id_02/analyzeBOLD2
170904-05:41:53,154 workflow INFO:
	 Executing node _analyzeBOLD20 in dir: /tmp/tmp6lfywdk6/bids_demo/_subject_id_02/analyzeBOLD2/mapflow/_analyzeBOLD20


analyzing /data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz
TR: 2.5


170904-05:41:53,532 workflow INFO:
	 Executing node _analyzeBOLD21 in dir: /tmp/tmp6lfywdk6/bids_demo/_subject_id_02/analyzeBOLD2/mapflow/_analyzeBOLD21


analyzing /data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-fingerfootlips_bold.nii.gz
TR: 2.5


...


170904-05:41:57,282 workflow INFO:
	 Executing node BIDSDataGrabber.aI.a0 in dir: /tmp/tmph_gu82j_/bids_demo/_subject_id_01/BIDSDataGrabber
170904-05:41:57,678 workflow INFO:
	 Executing node analyzeBOLD2.a0 in dir: /tmp/tmpfqjrvc05/bids_demo/_subject_id_01/analyzeBOLD2
170904-05:41:57,682 workflow INFO:
	 Executing node _analyzeBOLD20 in dir: /tmp/tmpfqjrvc05/bids_demo/_subject_id_01/analyzeBOLD2/mapflow/_analyzeBOLD20


analyzing /data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz
TR: 2.5


...


170904-05:42:01,372 workflow INFO:
	 Executing node _analyzeBOLD29 in dir: /tmp/tmpfqjrvc05/bids_demo/_subject_id_01/analyzeBOLD2/mapflow/_analyzeBOLD29


analyzing /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz
TR: 5.0


Out[ ]:
<networkx.classes.digraph.DiGraph at 0x7f9895239630>

Exercise 3:

Modify the printMetadata function to also print EchoTime