Data Input

To do any computation, you need to have data. Getting the data in the framework of a workflow is therefore the first step of every analysis. Nipype provides many different modules to grab or select the data:

DataFinder
DataGrabber
FreeSurferSource
JSONFileGrabber
S3DataGrabber
SSHDataGrabber
SelectFiles
XNATSource

This tutorial will only cover some of them. For the rest, see the section interfaces.io on the official homepage.

Dataset structure

To be able to import data, you first need to be aware about the structure of your dataset. The structure of the dataset for this tutorial is according to BIDS, and looks as follows:

ds102
├── CHANGES
├── dataset_description.json
├── participants.tsv
├── README
├── sub-01
│   ├── anat
│   │   └── sub-01_T1w.nii.gz
│   └── func
│       ├── sub-01_task-flanker_run-1_bold.nii.gz
│       ├── sub-01_task-flanker_run-1_events.tsv
│       ├── sub-01_task-flanker_run-2_bold.nii.gz
│       └── sub-01_task-flanker_run-2_events.tsv
├── sub-02
│   ├── anat
│   │   └── sub-02_T1w.nii.gz
│   └── func
│       ├── sub-02_task-flanker_run-1_bold.nii.gz
│       ├── sub-02_task-flanker_run-1_events.tsv
│       ├── sub-02_task-flanker_run-2_bold.nii.gz
│       └── sub-02_task-flanker_run-2_events.tsv
├── sub-03
│   ├── anat
│   │   └── sub-03_T1w.nii.gz
│   └── func
│       ├── sub-03_task-flanker_run-1_bold.nii.gz
│       ├── sub-03_task-flanker_run-1_events.tsv
│       ├── sub-03_task-flanker_run-2_bold.nii.gz
│       └── sub-03_task-flanker_run-2_events.tsv
├── ...
.
└── task-flanker_bold.json

DataGrabber

DataGrabber is a generic data grabber module that wraps around glob to select your neuroimaging data in an intelligent way. As an example, let's assume we want to grab the anatomical and functional images of a certain subject.

First, we need to create the DataGrabber node. This node needs to have some input fields for all dynamic parameters (e.g. subject identifier, task identifier), as well as the two desired output fields anat and func.


In [ ]:
from nipype import DataGrabber, Node

# Create DataGrabber node
dg = Node(DataGrabber(infields=['subject_id', 'task_id'],
                      outfields=['anat', 'func']),
          name='datagrabber')

# Location of the dataset folder
dg.inputs.base_directory = '/data/ds102'

# Necessary default parameters
dg.inputs.template = '*'
dg.inputs.sort_filelist = True

Second, we know that the two files we desire are the the following location:

anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz

We see that the two files only have two dynamic parameters between subjects and conditions:

subject_id: in this case 'sub-01'
task_id: in this case 1

This means that we can rewrite the paths as follows:

anat = /data/ds102/[subject_id]/anat/[subject_id]_T1w.nii.gz
func = /data/ds102/[subject_id]/func/[subject_id]_task-flanker_run-[task_id]_bold.nii.gz

Therefore, we need the parameter subject_id for the anatomical image and the parameter subject_id and task_id for the functional image. In the context of DataGabber, this is specified as follows:


In [ ]:
dg.inputs.template_args = {'anat': [['subject_id']],
                           'func': [['subject_id', 'task_id']]}

Now, comes the most important part of DataGrabber. We need to specify the template structure to find the specific data. This can be done as follows.


In [ ]:
dg.inputs.field_template = {'anat': '%s/anat/*_T1w.nii.gz',
                            'func': '%s/func/*run-%d_bold.nii.gz'}

You'll notice that we use %s, %02d and * for placeholders in the data paths. %s is a placeholder for a string and is filled out by subject_id. %02d is a placeholder for a integer number and is filled out by task_id. * is used as a wild card, e.g. a placeholder for any possible string combination. This is all to set up the DataGrabber node.

Now it is up to you how you want to feed the dynamic parameters into the node. You can either do this by using another node (e.g. IdentityInterface) and feed subject_id and task_id as connections to the DataGrabber node or specify them directly as node inputs.


In [ ]:
# Using the IdentityInterface
from nipype import IdentityInterface
infosource = Node(IdentityInterface(fields=['subject_id', 'contrasts']),
                  name="infosource")
infosource.inputs.contrasts = 1
subject_list = ['sub-01',
                'sub-02',
                'sub-03',
                'sub-04',
                'sub-05']
infosource.iterables = [('subject_id', subject_list)]

Now you only have to connect infosource with your DataGrabber and run the workflow to iterate over subjects 1, 2 and 3.

If you specify the inputs to the DataGrabber node directly, you can do this as follows:


In [ ]:
# Specifying the input fields of DataGrabber directly
dg.inputs.subject_id = 'sub-01'
dg.inputs.task_id = 1

Now let's run the DataGrabber node and let's look at the output:


In [ ]:
print dg.run().outputs


170301-21:53:31,59 workflow INFO:
	 Executing node datagrabber in dir: /tmp/tmp6AloiV/datagrabber
170301-21:53:31,84 workflow INFO:
	 Runtime memory and threads stats unavailable

anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz

SelectFiles

SelectFiles is a more flexible alternative to DataGrabber. It uses the {}-based string formating syntax to plug values into string templates and collect the data. These templates can also be combined with glob wild cards. The field names in the formatting template (i.e. the terms in braces) will become inputs fields on the interface, and the keys in the templates dictionary will form the output fields.

Let's focus again on the data we want to import:

anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz

Now, we can replace those paths with the accoridng {}-based strings.

anat = /data/ds102/{subject_id}/anat/{subject_id}_T1w.nii.gz
func = /data/ds102/{subject_id}/func/{subject_id}_task-flanker_run-{task_id}_bold.nii.gz

How would this look like as a SelectFiles node?


In [ ]:
from nipype import SelectFiles, Node

# String template with {}-based strings
templates = {'anat': '{subject_id}/anat/{subject_id}_T1w.nii.gz',
             'func': '{subject_id}/func/{subject_id}_task-flanker_run-{task_id}_bold.nii.gz'}

# Create SelectFiles node
sf = Node(SelectFiles(templates),
          name='selectfiles')

# Location of the dataset folder
sf.inputs.base_directory = '/data/ds102'

# Feed {}-based placeholder strings with values
sf.inputs.subject_id = 'sub-01'
sf.inputs.task_id = '1'

Let's check if we get what we wanted.


In [ ]:
print sf.run().outputs


170301-21:53:57,750 workflow INFO:
	 Executing node selectfiles in dir: /tmp/tmpejvdlC/selectfiles
170301-21:53:57,763 workflow INFO:
	 Runtime memory and threads stats unavailable

anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz

Perfect! But why is SelectFiles more flexible than DataGrabber? First, you perhaps noticed that with the {}-based string, we can reuse the same input (e.g. subject_id) multiple time in the same string, without feeding it multiple times into the template.

Additionally, you can also select multiple files without the need of an iterable node. For example, let's assume we want to select both functional images ('run-1' and 'run-2') at once. We can do this by using the following file template:

{subject_id}_task-flanker_run-[1,2]_bold.nii.gz'

Let's see how this works:


In [ ]:
from nipype import SelectFiles, Node
from os.path import abspath as opap

# String template with {}-based strings
templates = {'anat': '{subject_id}/anat/{subject_id}_T1w.nii.gz',
             'func': '{subject_id}/func/{subject_id}_task-flanker_run-[1,2]_bold.nii.gz'}

# Create SelectFiles node
sf = Node(SelectFiles(templates),
          name='selectfiles')

# Location of the dataset folder
sf.inputs.base_directory = '/data/ds102'

# Feed {}-based placeholder strings with values
sf.inputs.subject_id = 'sub-01'

# Print SelectFiles output
print sf.run().outputs


170301-21:54:03,222 workflow INFO:
	 Executing node selectfiles in dir: /tmp/tmpjgAYwb/selectfiles
170301-21:54:03,259 workflow INFO:
	 Runtime memory and threads stats unavailable

anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = ['/data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz', '/data/ds102/sub-01/func/sub-01_task-flanker_run-2_bold.nii.gz']

As you can see, now func contains two file paths, one for the first and one for the second run. As a side node, you could have also gotten them same thing with the wild card *:

{subject_id}_task-flanker_run-*_bold.nii.gz'

FreeSurferSource

Note: FreeSurfer and the recon-all output is not included in this tutorial.

FreeSurferSource is a specific case of a file grabber that felicitates the data import of outputs from the FreeSurfer recon-all algorithm. This of course requires that you've already run recon-all on your subject.

Before you can run FreeSurferSource, you first have to specify the path to the FreeSurfer output folder, i.e. you have to specify the SUBJECTS_DIR variable. This can be done as follows:


In [ ]:
from nipype.interfaces.freesurfer import FSCommand
from os.path import abspath as opap

# Path to your freesurfer output folder
fs_dir = opap('/data/ds102/freesurfer')

# Set SUBJECTS_DIR
FSCommand.set_default_subjects_dir(fs_dir)

To create the FreeSurferSource node, do as follows:


In [ ]:
from nipype import Node
from nipype.interfaces.io import FreeSurferSource

# Create FreeSurferSource node
fssource = Node(FreeSurferSource(subjects_dir=fs_dir),
                name='fssource')

Let's now run it for a specific subject.


In [ ]:
fssource.inputs.subject_id = 'sub001'
result = fssource.run()


170302-17:50:07,668 workflow INFO:
	 Executing node fssource in dir: /tmp/tmpI0UTIX/fssource

Did it work? Let's try to access multiple FreeSurfer outputs:


In [ ]:
print 'aparc_aseg: %s\n' % result.outputs.aparc_aseg
print 'brainmask: %s\n' % result.outputs.brainmask
print 'inflated: %s\n' % result.outputs.inflated


aparc_aseg: [u'/data/ds102/freesurfer/sub001/mri/aparc.a2009s+aseg.mgz', u'/data/ds102/freesurfer/sub001/mri/aparc+aseg.mgz']

brainmask: /data/ds102/freesurfer/sub001/mri/brainmask.mgz

inflated: [u'/data/ds102/freesurfer/sub001/surf/rh.inflated', u'/data/ds102/freesurfer/sub001/surf/lh.inflated']

It seems to be working as it should. But as you can see, the inflated output actually contains the file location for both hemispheres. With FreeSurferSource we can also restrict the file selection to a single hemisphere. To do this, we use the hemi input filed:


In [ ]:
fssource.inputs.hemi = 'lh'
result = fssource.run()


170302-17:50:13,835 workflow INFO:
	 Executing node fssource in dir: /tmp/tmpI0UTIX/fssource

Let's take a look again at the inflated output.


In [ ]:
result.outputs.inflated


Out[ ]:
u'/data/ds102/freesurfer/sub001/surf/lh.inflated'

Perfect!