To do any computation, you need to have data. Getting the data in the framework of a workflow is therefore the first step of every analysis. Nipype provides many different modules to grab or select the data:
DataFinder
DataGrabber
FreeSurferSource
JSONFileGrabber
S3DataGrabber
SSHDataGrabber
SelectFiles
XNATSource
This tutorial will only cover some of them. For the rest, see the section interfaces.io
on the official homepage.
To be able to import data, you first need to be aware of the structure of your dataset. The structure of the dataset for this tutorial is according to BIDS, and looks as follows:
ds000114
├── CHANGES
├── dataset_description.json
├── derivatives
│ ├── fmriprep
│ │ └── sub01...sub10
│ │ └── ...
│ ├── freesurfer
│ ├── fsaverage
│ ├── fsaverage5
│ │ └── sub01...sub10
│ │ └── ...
├── dwi.bval
├── dwi.bvec
├── sub-01
│ ├── ses-retest
│ ├── anat
│ │ └── sub-01_ses-retest_T1w.nii.gz
│ ├──func
│ ├── sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz
│ ├── sub-01_ses-retest_task-fingerfootlips_bold.nii.gz
│ ├── sub-01_ses-retest_task-linebisection_bold.nii.gz
│ ├── sub-01_ses-retest_task-linebisection_events.tsv
│ ├── sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz
│ └── sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz
│ └── dwi
│ └── sub-01_ses-retest_dwi.nii.gz
│ ├── ses-test
│ ├── anat
│ │ └── sub-01_ses-test_T1w.nii.gz
│ ├──func
│ ├── sub-01_ses-test_task-covertverbgeneration_bold.nii.gz
│ ├── sub-01_ses-test_task-fingerfootlips_bold.nii.gz
│ ├── sub-01_ses-test_task-linebisection_bold.nii.gz
│ ├── sub-01_ses-test_task-linebisection_events.tsv
│ ├── sub-01_ses-test_task-overtverbgeneration_bold.nii.gz
│ └── sub-01_ses-test_task-overtwordrepetition_bold.nii.gz
│ └── dwi
│ └── sub-01_ses-retest_dwi.nii.gz
├── sub-02..sub-10
│ └── ...
├── task-covertverbgeneration_bold.json
├── task-covertverbgeneration_events.tsv
├── task-fingerfootlips_bold.json
├── task-fingerfootlips_events.tsv
├── task-linebisection_bold.json
├── task-overtverbgeneration_bold.json
├── task-overtverbgeneration_events.tsv
├── task-overtwordrepetition_bold.json
└── task-overtwordrepetition_events.tsv
DataGrabber
is an interface for collecting files from hard drive. It is very flexible and supports almost any file organization of your data you can imagine.
You can use it as a trivial use case of getting a fixed file. By default, DataGrabber
stores its outputs in a field called outfiles.
In [ ]:
import nipype.interfaces.io as nio
datasource1 = nio.DataGrabber()
datasource1.inputs.base_directory = '/data/ds000114'
datasource1.inputs.template = 'sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'
datasource1.inputs.sort_filelist = True
results = datasource1.run()
results.outputs
Or you can get at all NIfTI files containing the word 'fingerfootlips'
in all directories starting with the letter 's'
.
In [ ]:
import nipype.interfaces.io as nio
datasource2 = nio.DataGrabber()
datasource2.inputs.base_directory = '/data/ds000114'
datasource2.inputs.template = 's*/ses-test/func/*fingerfootlips*.nii.gz'
datasource2.inputs.sort_filelist = True
results = datasource2.run()
results.outputs
Two special inputs were used in these previous cases. The input base_directory
indicates in which directory to search, while the input template
indicates the
string template to match. So in the previous case DataGrabber
is looking for
path matches of the form /data/ds000114/s*/ses-test/func/*fingerfootlips*.nii.gz
.
More use cases arise when the template can be filled by other inputs. In the
example below, we define an input field for DataGrabber
called subject_id
. This is
then used to set the template (see %d
in the template).
In [ ]:
datasource3 = nio.DataGrabber(infields=['subject_id'])
datasource3.inputs.base_directory = '/data/ds000114'
datasource3.inputs.template = 'sub-%02d/ses-test/func/*fingerfootlips*.nii.gz'
datasource3.inputs.sort_filelist = True
datasource3.inputs.subject_id = [1, 7]
results = datasource3.run()
results.outputs
This will return the functional images from subject 1 and 7 for the task fingerfootlips
. We can take this a step further and pair subjects with task.
In [ ]:
datasource4 = nio.DataGrabber(infields=['subject_id', 'run'])
datasource4.inputs.base_directory = '/data/ds000114'
datasource4.inputs.template = 'sub-%02d/ses-test/func/*%s*.nii.gz'
datasource4.inputs.sort_filelist = True
datasource4.inputs.run = ['fingerfootlips', 'linebisection']
datasource4.inputs.subject_id = [1, 7]
results = datasource4.run()
results.outputs
This will return the functional image of subject 1, task 'fingerfootlips'
and the functional image of subject 7 for the 'linebisection'
task.
DataGrabber
is a generic data grabber module that wraps around glob
to select your neuroimaging data in an intelligent way. As an example, let's assume we want to grab the anatomical and functional images of a certain subject.
First, we need to create the DataGrabber
node. This node needs to have some input fields for all dynamic parameters (e.g. subject identifier, task identifier), as well as the two desired output fields anat
and func
.
In [ ]:
from nipype import DataGrabber, Node
# Create DataGrabber node
dg = Node(DataGrabber(infields=['subject_id', 'ses_name', 'task_name'],
outfields=['anat', 'func']),
name='datagrabber')
# Location of the dataset folder
dg.inputs.base_directory = '/data/ds000114'
# Necessary default parameters
dg.inputs.template = '*'
dg.inputs.sort_filelist = True
Second, we know that the two files we desire are the the following location:
anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz
We see that the two files only have three dynamic parameters between subjects and task names:
subject_id: in this case 'sub-01'
task_name: in this case fingerfootlips
ses_name: test
This means that we can rewrite the paths as follows:
anat = /data/ds102/[subject_id]/ses-[ses_name]/anat/sub-[subject_id]_ses-[ses_name]_T1w.nii.gz
func = /data/ds102/[subject_id]/ses-[ses_name]/func/sub-[subject_id]_ses-[ses_name]_task-[task_name]_bold.nii.gz
Therefore, we need the parameters subject_id
and ses_name
for the anatomical image and the parameters subject_id
, ses_name
and task_name
for the functional image. In the context of DataGabber, this is specified as follows:
In [ ]:
dg.inputs.template_args = {'anat': [['subject_id', 'ses_name']],
'func': [['subject_id', 'ses_name', 'task_name']]}
Now, comes the most important part of DataGrabber. We need to specify the template structure to find the specific data. This can be done as follows.
In [ ]:
dg.inputs.field_template = {'anat': 'sub-%02d/ses-%s/anat/*_T1w.nii.gz',
'func': 'sub-%02d/ses-%s/func/*task-%s_bold.nii.gz'}
You'll notice that we use %s
, %02d
and *
for placeholders in the data paths. %s
is a placeholder for a string and is filled out by task_name
or ses_name
. %02d
is a placeholder for a integer number and is filled out by subject_id
. *
is used as a wild card, e.g. a placeholder for any possible string combination. This is all to set up the DataGrabber
node.
Above, two more fields are introduced: field_template
and template_args
. These fields are both dictionaries whose keys correspond to the outfields
keyword. The field_template
reflects the search path for each output field, while the template_args
reflect the inputs that satisfy the template. The inputs can either be one of the named inputs specified by the infields
keyword arg or it can be raw strings or integers corresponding to the template. For the func
output, the %s in the field_template
is satisfied by subject_id
and the %d is filled in by the list of numbers.
Now it is up to you how you want to feed the dynamic parameters into the node. You can either do this by using another node (e.g. IdentityInterface
) and feed subject_id
, ses_name
and task_name
as connections to the DataGrabber
node or specify them directly as node inputs.
In [ ]:
# Using the IdentityInterface
from nipype import IdentityInterface
infosource = Node(IdentityInterface(fields=['subject_id', 'task_name']),
name="infosource")
infosource.inputs.task_name = "fingerfootlips"
infosource.inputs.ses_name = "test"
subject_id_list = [1, 2]
infosource.iterables = [('subject_id', subject_id_list)]
Now you only have to connect infosource
with your DataGrabber
and run the workflow to iterate over subjects 1 and 2.
You can also provide the inputs to the DataGrabber
node directly, for one subject you can do this as follows:
In [ ]:
# Specifying the input fields of DataGrabber directly
dg.inputs.subject_id = 1
dg.inputs.ses_name = "test"
dg.inputs.task_name = "fingerfootlips"
Now let's run the DataGrabber
node and let's look at the output:
In [ ]:
dg.run().outputs
In [ ]:
# write your solution here
In [ ]:
from nipype import DataGrabber, Node
# Create DataGrabber node
ex1_dg = Node(DataGrabber(infields=['subject_id', 'ses_name'],
outfields=['anat']),
name='datagrabber')
# Location of the dataset folder
ex1_dg.inputs.base_directory = '/data/ds000114'
# Necessary default parameters
ex1_dg.inputs.template = '*'
ex1_dg.inputs.sort_filelist = True
# specify the template
ex1_dg.inputs.template_args = {'anat': [['subject_id', 'ses_name']]}
ex1_dg.inputs.field_template = {'anat': 'sub-%02d/ses-%s/anat/*_T1w.nii.gz'}
# specify subject_id and ses_name you're interested in
ex1_dg.inputs.subject_id = 1
ex1_dg.inputs.ses_name = ["test", "retest"]
# and run the node
ex1_res = ex1_dg.run()
In [ ]:
# you can now check the output
ex1_res.outputs
SelectFiles
is a more flexible alternative to DataGrabber
. It is built on Python format strings, which are similar to the Python string interpolation feature you are likely already familiar with, but advantageous in several respects. Format strings allow you to replace named sections of template strings set off by curly braces ({}
), possibly filtered through a set of functions that control how the values are rendered into the string. As a very basic example, we could write
In [ ]:
msg = "This workflow uses {package}."
and then format it with keyword arguments:
In [ ]:
print(msg.format(package="FSL"))
SelectFiles
uses the {}-based string formatting syntax to plug values into string templates and collect the data. These templates can also be combined with glob wild cards. The field names in the formatting template (i.e. the terms in braces) will become inputs fields on the interface, and the keys in the templates dictionary will form the output fields.
Let's focus again on the data we want to import:
anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz
Now, we can replace those paths with the according {}-based strings.
anat = /data/ds000114/sub-{subject_id}/ses-{ses_name}/anat/sub-{subject_id}_ses-{ses_name}_T1w.nii.gz
func = /data/ds000114/sub-{subject_id}/ses-{ses_name}/func/ \
sub-{subject_id}_ses-{ses_name}_task-{task_name}_bold.nii.gz
How would this look like as a SelectFiles
node?
In [ ]:
from nipype import SelectFiles, Node
# String template with {}-based strings
templates = {'anat': 'sub-{subject_id}/ses-{ses_name}/anat/sub-{subject_id}_ses-{ses_name}_T1w.nii.gz',
'func': 'sub-{subject_id}/ses-{ses_name}/func/sub-{subject_id}_ses-{ses_name}_task-{task_name}_bold.nii.gz'}
# Create SelectFiles node
sf = Node(SelectFiles(templates),
name='selectfiles')
# Location of the dataset folder
sf.inputs.base_directory = '/data/ds000114'
# Feed {}-based placeholder strings with values
sf.inputs.subject_id = '01'
sf.inputs.ses_name = "test"
sf.inputs.task_name = 'fingerfootlips'
Let's check if we get what we wanted.
In [ ]:
sf.run().outputs
Perfect! But why is SelectFiles
more flexible than DataGrabber
? First, you perhaps noticed that with the {}-based string, we can reuse the same input (e.g. subject_id
) multiple time in the same string, without feeding it multiple times into the template.
Additionally, you can also select multiple files without the need of an iterable node. For example, let's assume we want to select anatomical images for all subjects at once. We can do this by using the eildcard *
in a template:
'sub-*/anat/sub-*_T1w.nii.gz'
Let's see how this works:
In [ ]:
from nipype import SelectFiles, Node
# String template with {}-based strings
templates = {'anat': 'sub-*/ses-{ses_name}/anat/sub-*_ses-{ses_name}_T1w.nii.gz'}
# Create SelectFiles node
sf = Node(SelectFiles(templates),
name='selectfiles')
# Location of the dataset folder
sf.inputs.base_directory = '/data/ds000114'
# Feed {}-based placeholder strings with values
sf.inputs.ses_name = 'test'
# Print SelectFiles output
sf.run().outputs
As you can see, now anat
contains ten file paths, T1w images for all ten subject.
As a side note, you could also use []
string formatting for some simple cases, e.g. for loading only subject 1 and 2:
'sub-0[1,2]/ses-test/anat/sub-0[1,2]_ses-test_T1w.nii.gz'
force_lists
There's an additional parameter, force_lists
, which controls how SelectFiles
behaves in cases where only a single file matches the template. The default behavior is that when a template matches multiple files they are returned as a list, while a single file is returned as a string. There may be situations where you want to force the outputs to always be returned as a list (for example, you are writing a workflow that expects to operate on several runs of data, but some of your subjects only have a single run). In this case, force_lists
can be used to tune the outputs of the interface. You can either use a boolean value, which will be applied to every output the interface has, or you can provide a list of the output fields that should be coerced to a list.
Returning to our previous example, you may want to ensure that the anat
files are returned as a list, but you only ever will have a single T1
file. In this case, you would do
In [ ]:
sf = SelectFiles(templates, force_lists=["anat"])
In [ ]:
# write your solution here
In [ ]:
from nipype import SelectFiles, Node
# String template with {}-based strings
templates = {'anat': 'sub-01/ses-*/anat/sub-01_ses-*_T1w.nii.gz'}
# Create SelectFiles node
sf = Node(SelectFiles(templates),
name='selectfiles')
# Location of the dataset folder
sf.inputs.base_directory = '/data/ds000114'
#sf.inputs.ses_name =
sf.run().outputs
For the tutorial dataset ds000114
, recon-all
was already run. So, let's make sure that you have the anatomy output of one subject on your system:
In [ ]:
!datalad get -r -J 4 -d /data/ds000114 /data/ds000114/derivatives/freesurfer/sub-01
Now, before you can run FreeSurferSource
, you first have to specify the path to the FreeSurfer output folder, i.e. you have to specify the SUBJECTS_DIR variable. This can be done as follows:
In [ ]:
from nipype.interfaces.freesurfer import FSCommand
from os.path import abspath as opap
# Path to your freesurfer output folder
fs_dir = opap('/data/ds000114/derivatives/freesurfer/')
# Set SUBJECTS_DIR
FSCommand.set_default_subjects_dir(fs_dir)
To create the FreeSurferSource
node, do as follows:
In [ ]:
from nipype import Node
from nipype.interfaces.io import FreeSurferSource
# Create FreeSurferSource node
fssource = Node(FreeSurferSource(subjects_dir=fs_dir),
name='fssource')
Let's now run it for a specific subject.
In [ ]:
fssource.inputs.subject_id = 'sub-01'
result = fssource.run()
Did it work? Let's try to access multiple FreeSurfer outputs:
In [ ]:
print('aparc_aseg: %s\n' % result.outputs.aparc_aseg)
print('inflated: %s\n' % result.outputs.inflated)
It seems to be working as it should. But as you can see, the inflated
output actually contains the file location for both hemispheres. With FreeSurferSource
we can also restrict the file selection to a single hemisphere. To do this, we use the hemi
input filed:
In [ ]:
fssource.inputs.hemi = 'lh'
result = fssource.run()
Let's take a look again at the inflated
output.
In [ ]:
result.outputs.inflated
Perfect!