To do any computation, you need to have data. Getting the data in the framework of a workflow is therefore the first step of every analysis. Nipype provides many different modules to grab or select the data:
DataFinder
DataGrabber
FreeSurferSource
JSONFileGrabber
S3DataGrabber
SSHDataGrabber
SelectFiles
XNATSource
This tutorial will only cover some of them. For the rest, see the section interfaces.io
on the official homepage.
To be able to import data, you first need to be aware about the structure of your dataset. The structure of the dataset for this tutorial is according to BIDS, and looks as follows:
ds102
├── CHANGES
├── dataset_description.json
├── participants.tsv
├── README
├── sub-01
│ ├── anat
│ │ └── sub-01_T1w.nii.gz
│ └── func
│ ├── sub-01_task-flanker_run-1_bold.nii.gz
│ ├── sub-01_task-flanker_run-1_events.tsv
│ ├── sub-01_task-flanker_run-2_bold.nii.gz
│ └── sub-01_task-flanker_run-2_events.tsv
├── sub-02
│ ├── anat
│ │ └── sub-02_T1w.nii.gz
│ └── func
│ ├── sub-02_task-flanker_run-1_bold.nii.gz
│ ├── sub-02_task-flanker_run-1_events.tsv
│ ├── sub-02_task-flanker_run-2_bold.nii.gz
│ └── sub-02_task-flanker_run-2_events.tsv
├── sub-03
│ ├── anat
│ │ └── sub-03_T1w.nii.gz
│ └── func
│ ├── sub-03_task-flanker_run-1_bold.nii.gz
│ ├── sub-03_task-flanker_run-1_events.tsv
│ ├── sub-03_task-flanker_run-2_bold.nii.gz
│ └── sub-03_task-flanker_run-2_events.tsv
├── ...
.
└── task-flanker_bold.json
DataGrabber
is a generic data grabber module that wraps around glob
to select your neuroimaging data in an intelligent way. As an example, let's assume we want to grab the anatomical and functional images of a certain subject.
First, we need to create the DataGrabber
node. This node needs to have some input fields for all dynamic parameters (e.g. subject identifier, task identifier), as well as the two desired output fields anat
and func
.
In [ ]:
from nipype import DataGrabber, Node
# Create DataGrabber node
dg = Node(DataGrabber(infields=['subject_id', 'task_id'],
outfields=['anat', 'func']),
name='datagrabber')
# Location of the dataset folder
dg.inputs.base_directory = '/data/ds102'
# Necessary default parameters
dg.inputs.template = '*'
dg.inputs.sort_filelist = True
Second, we know that the two files we desire are the the following location:
anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz
We see that the two files only have two dynamic parameters between subjects and conditions:
subject_id: in this case 'sub-01'
task_id: in this case 1
This means that we can rewrite the paths as follows:
anat = /data/ds102/[subject_id]/anat/[subject_id]_T1w.nii.gz
func = /data/ds102/[subject_id]/func/[subject_id]_task-flanker_run-[task_id]_bold.nii.gz
Therefore, we need the parameter subject_id
for the anatomical image and the parameter subject_id
and task_id
for the functional image. In the context of DataGabber, this is specified as follows:
In [ ]:
dg.inputs.template_args = {'anat': [['subject_id']],
'func': [['subject_id', 'task_id']]}
Now, comes the most important part of DataGrabber. We need to specify the template structure to find the specific data. This can be done as follows.
In [ ]:
dg.inputs.field_template = {'anat': '%s/anat/*_T1w.nii.gz',
'func': '%s/func/*run-%d_bold.nii.gz'}
You'll notice that we use %s
, %02d
and *
for placeholders in the data paths. %s
is a placeholder for a string and is filled out by subject_id
. %02d
is a placeholder for a integer number and is filled out by task_id
. *
is used as a wild card, e.g. a placeholder for any possible string combination. This is all to set up the DataGrabber
node.
Now it is up to you how you want to feed the dynamic parameters into the node. You can either do this by using another node (e.g. IdentityInterface
) and feed subject_id
and task_id
as connections to the DataGrabber
node or specify them directly as node inputs.
In [ ]:
# Using the IdentityInterface
from nipype import IdentityInterface
infosource = Node(IdentityInterface(fields=['subject_id', 'contrasts']),
name="infosource")
infosource.inputs.contrasts = 1
subject_list = ['sub-01',
'sub-02',
'sub-03',
'sub-04',
'sub-05']
infosource.iterables = [('subject_id', subject_list)]
Now you only have to connect infosource
with your DataGrabber
and run the workflow to iterate over subjects 1, 2 and 3.
If you specify the inputs to the DataGrabber
node directly, you can do this as follows:
In [ ]:
# Specifying the input fields of DataGrabber directly
dg.inputs.subject_id = 'sub-01'
dg.inputs.task_id = 1
Now let's run the DataGrabber
node and let's look at the output:
In [ ]:
print dg.run().outputs
SelectFiles
is a more flexible alternative to DataGrabber
. It uses the {}-based string formating syntax to plug values into string templates and collect the data. These templates can also be combined with glob wild cards. The field names in the formatting template (i.e. the terms in braces) will become inputs fields on the interface, and the keys in the templates dictionary will form the output fields.
Let's focus again on the data we want to import:
anat = /data/ds102/sub-01/anat/sub-01_T1w.nii.gz
func = /data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz
Now, we can replace those paths with the accoridng {}-based strings.
anat = /data/ds102/{subject_id}/anat/{subject_id}_T1w.nii.gz
func = /data/ds102/{subject_id}/func/{subject_id}_task-flanker_run-{task_id}_bold.nii.gz
How would this look like as a SelectFiles
node?
In [ ]:
from nipype import SelectFiles, Node
# String template with {}-based strings
templates = {'anat': '{subject_id}/anat/{subject_id}_T1w.nii.gz',
'func': '{subject_id}/func/{subject_id}_task-flanker_run-{task_id}_bold.nii.gz'}
# Create SelectFiles node
sf = Node(SelectFiles(templates),
name='selectfiles')
# Location of the dataset folder
sf.inputs.base_directory = '/data/ds102'
# Feed {}-based placeholder strings with values
sf.inputs.subject_id = 'sub-01'
sf.inputs.task_id = '1'
Let's check if we get what we wanted.
In [ ]:
print sf.run().outputs
Perfect! But why is SelectFiles
more flexible than DataGrabber
? First, you perhaps noticed that with the {}-based string, we can reuse the same input (e.g. subject_id
) multiple time in the same string, without feeding it multiple times into the template.
Additionally, you can also select multiple files without the need of an iterable node. For example, let's assume we want to select both functional images ('run-1'
and 'run-2'
) at once. We can do this by using the following file template:
{subject_id}_task-flanker_run-[1,2]_bold.nii.gz'
Let's see how this works:
In [ ]:
from nipype import SelectFiles, Node
from os.path import abspath as opap
# String template with {}-based strings
templates = {'anat': '{subject_id}/anat/{subject_id}_T1w.nii.gz',
'func': '{subject_id}/func/{subject_id}_task-flanker_run-[1,2]_bold.nii.gz'}
# Create SelectFiles node
sf = Node(SelectFiles(templates),
name='selectfiles')
# Location of the dataset folder
sf.inputs.base_directory = '/data/ds102'
# Feed {}-based placeholder strings with values
sf.inputs.subject_id = 'sub-01'
# Print SelectFiles output
print sf.run().outputs
As you can see, now func
contains two file paths, one for the first and one for the second run. As a side node, you could have also gotten them same thing with the wild card *
:
{subject_id}_task-flanker_run-*_bold.nii.gz'
Note: FreeSurfer and the recon-all output is not included in this tutorial.
FreeSurferSource
is a specific case of a file grabber that felicitates the data import of outputs from the FreeSurfer recon-all algorithm. This of course requires that you've already run recon-all
on your subject.
Before you can run FreeSurferSource
, you first have to specify the path to the FreeSurfer output folder, i.e. you have to specify the SUBJECTS_DIR variable. This can be done as follows:
In [ ]:
from nipype.interfaces.freesurfer import FSCommand
from os.path import abspath as opap
# Path to your freesurfer output folder
fs_dir = opap('/data/ds102/freesurfer')
# Set SUBJECTS_DIR
FSCommand.set_default_subjects_dir(fs_dir)
To create the FreeSurferSource
node, do as follows:
In [ ]:
from nipype import Node
from nipype.interfaces.io import FreeSurferSource
# Create FreeSurferSource node
fssource = Node(FreeSurferSource(subjects_dir=fs_dir),
name='fssource')
Let's now run it for a specific subject.
In [ ]:
fssource.inputs.subject_id = 'sub001'
result = fssource.run()
Did it work? Let's try to access multiple FreeSurfer outputs:
In [ ]:
print 'aparc_aseg: %s\n' % result.outputs.aparc_aseg
print 'brainmask: %s\n' % result.outputs.brainmask
print 'inflated: %s\n' % result.outputs.inflated
It seems to be working as it should. But as you can see, the inflated
output actually contains the file location for both hemispheres. With FreeSurferSource
we can also restrict the file selection to a single hemisphere. To do this, we use the hemi
input filed:
In [ ]:
fssource.inputs.hemi = 'lh'
result = fssource.run()
Let's take a look again at the inflated
output.
In [ ]:
result.outputs.inflated
Out[ ]:
Perfect!