Data Input

To do any computation, you need to have data. Getting the data in the framework of a workflow is therefore the first step of every analysis. Nipype provides many different modules to grab or select the data:

DataFinder
DataGrabber
FreeSurferSource
JSONFileGrabber
S3DataGrabber
SSHDataGrabber
SelectFiles
XNATSource

This tutorial will only cover some of them. For the rest, see the section interfaces.io on the official homepage.

Dataset structure

To be able to import data, you first need to be aware about the structure of your dataset. The structure of the dataset for this tutorial is according to BIDS, and looks as follows:

ds000114
├── CHANGES
├── dataset_description.json
├── derivatives
│   ├── fmriprep
│   │   └── sub01...sub10
│   │        └── ...
│   ├── freesurfer
│       ├── fsaverage
│       ├── fsaverage5
│   │   └── sub01...sub10
│   │        └── ...
├── dwi.bval
├── dwi.bvec
├── sub-01
│   ├── ses-retest    
│       ├── anat
│       │   └── sub-01_ses-retest_T1w.nii.gz
│       ├──func
│           ├── sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz
│           ├── sub-01_ses-retest_task-fingerfootlips_bold.nii.gz
│           ├── sub-01_ses-retest_task-linebisection_bold.nii.gz
│           ├── sub-01_ses-retest_task-linebisection_events.tsv
│           ├── sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz
│           └── sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz
│       └── dwi
│           └── sub-01_ses-retest_dwi.nii.gz
│   ├── ses-test    
│       ├── anat
│       │   └── sub-01_ses-test_T1w.nii.gz
│       ├──func
│           ├── sub-01_ses-test_task-covertverbgeneration_bold.nii.gz
│           ├── sub-01_ses-test_task-fingerfootlips_bold.nii.gz
│           ├── sub-01_ses-test_task-linebisection_bold.nii.gz
│           ├── sub-01_ses-test_task-linebisection_events.tsv
│           ├── sub-01_ses-test_task-overtverbgeneration_bold.nii.gz
│           └── sub-01_ses-test_task-overtwordrepetition_bold.nii.gz
│       └── dwi
│           └── sub-01_ses-retest_dwi.nii.gz
├── sub-02..sub-10
│   └── ...
├── task-covertverbgeneration_bold.json
├── task-covertverbgeneration_events.tsv
├── task-fingerfootlips_bold.json
├── task-fingerfootlips_events.tsv
├── task-linebisection_bold.json
├── task-overtverbgeneration_bold.json
├── task-overtverbgeneration_events.tsv
├── task-overtwordrepetition_bold.json
└── task-overtwordrepetition_events.tsv

DataGrabber

DataGrabber is a generic data grabber module that wraps around glob to select your neuroimaging data in an intelligent way. As an example, let's assume we want to grab the anatomical and functional images of a certain subject.

First, we need to create the DataGrabber node. This node needs to have some input fields for all dynamic parameters (e.g. subject identifier, task identifier), as well as the two desired output fields anat and func.


In [4]:
from nipype import DataGrabber, Node

# Create DataGrabber node
dg = Node(DataGrabber(infields=['subject_id', 'ses_name', 'task_name'],
                      outfields=['anat', 'func']),
          name='datagrabber')

# Location of the dataset folder
dg.inputs.base_directory = '/data/ds000114'

# Necessary default parameters
dg.inputs.template = '*'
dg.inputs.sort_filelist = True


---------------------------------------------------------------------------
TraitError                                Traceback (most recent call last)
<ipython-input-4-5f5c7339b303> in <module>()
      7 
      8 # Location of the dataset folder
----> 9 dg.inputs.base_directory = '/data/ds000114'
     10 
     11 # Necessary default parameters

/opt/conda/envs/neuro/lib/python3.6/site-packages/nipype/interfaces/traits_extension.py in validate(self, object, name, value)
    185                     args='The trait \'{}\' of {} instance is {}, but the path '
    186                          ' \'{}\' does not exist.'.format(name,
--> 187                                     class_of(object), self.info_text, value))
    188 
    189         self.error(object, name, value)

TraitError: The trait 'base_directory' of a DataGrabberInputSpec instance is an existing directory name, but the path  '/data/ds000114' does not exist.

Second, we know that the two files we desire are the the following location:

anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

We see that the two files only have three dynamic parameters between subjects and task names:

subject_id: in this case 'sub-01'
task_name: in this case fingerfootlips
ses_name: test

This means that we can rewrite the paths as follows:

anat = /data/ds102/[subject_id]/ses-[ses_name]/anat/sub-[subject_id]_ses-[ses_name]_T1w.nii.gz
func = /data/ds102/[subject_id]/ses-[ses_name]/func/sub-[subject_id]_ses-[ses_name]_task-[task_name]_bold.nii.gz

Therefore, we need the parameters subject_id and ses_name for the anatomical image and the parameters subject_id, ses_name and task_name for the functional image. In the context of DataGabber, this is specified as follows:


In [ ]:
dg.inputs.template_args = {'anat': [['subject_id', 'ses_name']],
                           'func': [['subject_id', 'ses_name', 'task_name']]}

Now, comes the most important part of DataGrabber. We need to specify the template structure to find the specific data. This can be done as follows.


In [ ]:
dg.inputs.field_template = {'anat': 'sub-%02d/ses-%s/anat/*_T1w.nii.gz',
                            'func': 'sub-%02d/ses-%s/func/*task-%s_bold.nii.gz'}

You'll notice that we use %s, %02d and * for placeholders in the data paths. %s is a placeholder for a string and is filled out by task_name or ses_name. %02d is a placeholder for a integer number and is filled out by subject_id. * is used as a wild card, e.g. a placeholder for any possible string combination. This is all to set up the DataGrabber node.

Now it is up to you how you want to feed the dynamic parameters into the node. You can either do this by using another node (e.g. IdentityInterface) and feed subject_id, ses_name and task_name as connections to the DataGrabber node or specify them directly as node inputs.


In [ ]:
# Using the IdentityInterface
from nipype import IdentityInterface
infosource = Node(IdentityInterface(fields=['subject_id', 'task_name']),
                  name="infosource")
infosource.inputs.task_name = "fingerfootlips"
infosource.inputs.ses_name = "test"
subject_id_list = [1, 2]
infosource.iterables = [('subject_id', subject_id_list)]

Now you only have to connect infosource with your DataGrabber and run the workflow to iterate over subjects 1 and 2.

You can also provide the inputs to the DataGrabber node directly, for one subject you can do this as follows:


In [ ]:
# Specifying the input fields of DataGrabber directly
dg.inputs.subject_id = 1
dg.inputs.ses_name = "test"
dg.inputs.task_name = "fingerfootlips"

Now let's run the DataGrabber node and let's look at the output:


In [ ]:
dg.run().outputs


170904-06:02:02,387 workflow INFO:
	 Executing node datagrabber in dir: /tmp/tmpiwew6ysu/datagrabber
Out[ ]:
anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

SelectFiles

SelectFiles is a more flexible alternative to DataGrabber. It uses the {}-based string formating syntax to plug values into string templates and collect the data. These templates can also be combined with glob wild cards. The field names in the formatting template (i.e. the terms in braces) will become inputs fields on the interface, and the keys in the templates dictionary will form the output fields.

Let's focus again on the data we want to import:

anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

Now, we can replace those paths with the accoridng {}-based strings.

anat = /data/ds000114/sub-{subject_id}/ses-{ses_name}/anat/sub-{subject_id}_ses-{ses_name}_T1w.nii.gz
func = /data/ds000114/sub-{subject_id}/ses-{ses_name}/func/ \
        sub-{subject_id}_ses-{ses_name}_task-{task_name}_bold.nii.gz

How would this look like as a SelectFiles node?


In [ ]:
from nipype import SelectFiles, Node

# String template with {}-based strings
templates = {'anat': 'sub-{subject_id}/ses-{ses_name}/anat/sub-{subject_id}_ses-{ses_name}_T1w.nii.gz',
             'func': 'sub-{subject_id}/ses-{ses_name}/func/sub-{subject_id}_ses-{ses_name}_task-{task_name}_bold.nii.gz'}

# Create SelectFiles node
sf = Node(SelectFiles(templates),
          name='selectfiles')

# Location of the dataset folder
sf.inputs.base_directory = '/data/ds000114'

# Feed {}-based placeholder strings with values
sf.inputs.subject_id = '01'
sf.inputs.ses_name = "test"
sf.inputs.task_name = 'fingerfootlips'

Let's check if we get what we wanted.


In [ ]:
sf.run().outputs


170904-06:02:02,435 workflow INFO:
	 Executing node selectfiles in dir: /tmp/tmpd3odxyze/selectfiles
Out[ ]:
anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

Perfect! But why is SelectFiles more flexible than DataGrabber? First, you perhaps noticed that with the {}-based string, we can reuse the same input (e.g. subject_id) multiple time in the same string, without feeding it multiple times into the template.

Additionally, you can also select multiple files without the need of an iterable node. For example, let's assume we want to select both anatomical images ('sub-01' and 'sub-02') at once. We can do this by using the following file template:

'sub-0[1,2]/anat/sub-0[1,2]_T1w.nii.gz'

Let's see how this works:


In [ ]:
from nipype import SelectFiles, Node
from os.path import abspath as opap

# String template with {}-based strings
templates = {'anat': 'sub-0[1,2]/ses-{ses_name}/anat/sub-0[1,2]_ses-{ses_name}_T1w.nii.gz'}


# Create SelectFiles node
sf = Node(SelectFiles(templates),
          name='selectfiles')

# Location of the dataset folder
sf.inputs.base_directory = '/data/ds000114'

# Feed {}-based placeholder strings with values
sf.inputs.ses_name = 'test'

# Print SelectFiles output
sf.run().outputs


170904-06:02:02,458 workflow INFO:
	 Executing node selectfiles in dir: /tmp/tmp53bxb6rj/selectfiles
Out[ ]:
anat = ['/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz', '/data/ds000114/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz']

As you can see, now anat contains two file paths, one for the first and one for the second subject. As a side node, you could have also gotten them same thing with the wild card *:

'sub-0*/ses-test/anat/sub-0*_ses-test_T1w.nii.gz'

FreeSurferSource

FreeSurferSource is a specific case of a file grabber that felicitates the data import of outputs from the FreeSurfer recon-all algorithm. This of course requires that you've already run recon-all on your subject.

For the tutorial dataset ds000114, recon-all was already run. So, let's make sure that you have the anatomy output of one subject on your system:


In [ ]:
!datalad get -r -J4 /data/ds000114/derivatives/freesurfer/sub-01/

Now, before you can run FreeSurferSource, you first have to specify the path to the FreeSurfer output folder, i.e. you have to specify the SUBJECTS_DIR variable. This can be done as follows:


In [ ]:
from nipype.interfaces.freesurfer import FSCommand
from os.path import abspath as opap

# Path to your freesurfer output folder
fs_dir = opap('/data/ds000114/derivatives/freesurfer/')

# Set SUBJECTS_DIR
FSCommand.set_default_subjects_dir(fs_dir)

To create the FreeSurferSource node, do as follows:


In [ ]:
from nipype import Node
from nipype.interfaces.io import FreeSurferSource

# Create FreeSurferSource node
fssource = Node(FreeSurferSource(subjects_dir=fs_dir),
                name='fssource')

Let's now run it for a specific subject.


In [ ]:
fssource.inputs.subject_id = 'sub-01'
result = fssource.run()


170904-06:04:52,28 workflow INFO:
	 Executing node fssource in dir: /tmp/tmpisu77m7r/fssource

Did it work? Let's try to access multiple FreeSurfer outputs:


In [ ]:
print('aparc_aseg: %s\n' % result.outputs.aparc_aseg)
print('brainmask: %s\n' % result.outputs.brainmask)
print('inflated: %s\n' % result.outputs.inflated)


aparc_aseg: ['/data/ds000114/derivatives/freesurfer/sub-01/mri/aparc+aseg.mgz', '/data/ds000114/derivatives/freesurfer/sub-01/mri/aparc.a2009s+aseg.mgz', '/data/ds000114/derivatives/freesurfer/sub-01/mri/aparc.dktatlas+aseg.mgz']

brainmask: /data/ds000114/derivatives/freesurfer/sub-01/mri/brainmask.mgz

inflated: ['/data/ds000114/derivatives/freesurfer/sub-01/surf/rh.inflated', '/data/ds000114/derivatives/freesurfer/sub-01/surf/lh.inflated']

It seems to be working as it should. But as you can see, the inflated output actually contains the file location for both hemispheres. With FreeSurferSource we can also restrict the file selection to a single hemisphere. To do this, we use the hemi input filed:


In [ ]:
fssource.inputs.hemi = 'lh'
result = fssource.run()


170904-06:04:54,346 workflow INFO:
	 Executing node fssource in dir: /tmp/tmpisu77m7r/fssource

Let's take a look again at the inflated output.


In [ ]:
result.outputs.inflated


Out[ ]:
'/data/ds000114/derivatives/freesurfer/sub-01/surf/lh.inflated'

Perfect!