In [ ]:
from nipype import Node
from nipype.interfaces.fsl import Smooth
node = Node(Smooth(), name='smooth')
node.interface.num_threads = 8
node.interface.estimated_memory_gb = 2
If the resource parameters are never set, they default to being 1 thread and 1 GB of RAM.
The MultiProc
workflow plugin schedules node execution based on the resources used by the current running nodes and the total resources available to the workflow. The plugin utilizes the plugin arguments n_procs
and memory_gb
to set the maximum resources a workflow can utilize. To limit a workflow to using 8 cores and 10 GB of RAM:
args_dict = {'n_procs': 8, 'memory_gb': 10}
workflow.run(plugin='MultiProc', plugin_args=args_dict)
If these values are not specifically set then the plugin will assume it can use all of the processors and memory on the system. For example, if the machine has 16 cores and 12 GB of RAM, the workflow will internally assume those values for n_procs
and memory_gb
, respectively.
The plugin will then queue eligible nodes for execution based on their expected usage via the num_threads
and estimated_memory_gb
interface parameters. If the plugin sees that only 3 of its 8 processors and 4 GB of its 10 GB of RAM is being used by running nodes, it will attempt to execute the next available node as long as its num_threads <= 5
and estimated_memory_gb <= 6
. If this is not the case, it will continue to check every available node in the queue until it sees a node that meets these conditions, or it waits for an executing node to finish to earn back the necessary resources. The priority of the queue is highest for nodes with the most estimated_memory_gb
followed by nodes with the most expected num_threads
.
It is not always easy to estimate the amount of resources a particular function or command uses. To help with this, Nipype provides some feedback about the system resources used by every node during workflow execution via the built-in runtime profiler. The runtime profiler is automatically enabled if the psutil Python package is installed and found on the system.
If the package is not found, the workflow will run normally without the runtime profiler.
The runtime profiler records the number of threads and the amount of memory (GB) used as runtime_threads
and runtime_memory_gb
in the Node's result.runtime
attribute. Since the node object is pickled and written to disk in its working directory, these values are available for analysis after node or workflow execution by manually parsing the pickle file contents.
Nipype also provides a logging mechanism for saving node runtime statistics to a JSON-style log file via the log_nodes_cb
logger function. This is enabled by setting the status_callback
parameter to point to this function in the plugin_args
when using the MultiProc
plugin.
In [ ]:
from nipype.utils.profiler import log_nodes_cb
args_dict = {'n_procs': 8, 'memory_gb': 10, 'status_callback': log_nodes_cb}
To set the filepath for the callback log the 'callback'
logger must be configured.
In [ ]:
# Set path to log file
import logging
callback_log_path = '/home/neuro/run_stats.log'
logger = logging.getLogger('callback')
logger.setLevel(logging.DEBUG)
handler = logging.FileHandler(callback_log_path)
logger.addHandler(handler)
Finally, the workflow can be run. For this, let's first create a simple workflow:
In [ ]:
from nipype.workflows.fmri.fsl import create_featreg_preproc
In [ ]:
# Import and initiate the workflow
from nipype.workflows.fmri.fsl import create_featreg_preproc
workflow = create_featreg_preproc()
# Specify input values
workflow.inputs.inputspec.func = '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'
workflow.inputs.inputspec.fwhm = 10
workflow.inputs.inputspec.highpass = 50
In [ ]:
workflow.run(plugin='MultiProc', plugin_args=args_dict)
In [ ]:
node.result.runtime
[Bunch(cmdline='fslmaths /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz /tmp/tmp9102ji29/featpreproc/img2float/mapflow/_img2float0/sub-01_ses-test_task-fingerfootlips_bold_dtype.nii.gz -odt float', command_path='/usr/lib/fsl/5.0/fslmaths', cwd='/tmp/tmp9102ji29/featpreproc/img2float/mapflow/_img2float0', dependencies=b'\tlinux-vdso.so.1 (0x00007ffc53ffb000)\n\tlibnewimage.so => /usr/lib/fsl/5.0/libnewimage.so (0x00007f1064ef7000)\n\tlibmiscmaths.so => /usr/lib/fsl/5.0/libmiscmaths.so (0x00007f1064c6a000)\n\tlibprob.so => /usr/lib/fsl/5.0/libprob.so (0x00007f1064a62000)\n\tlibfslio.so => /usr/lib/fsl/5.0/libfslio.so (0x00007f1064855000)\n\tlibnewmat.so.10 => /usr/lib/libnewmat.so.10 (0x00007f10645ff000)\n\tlibutils.so => /usr/lib/fsl/5.0/libutils.so (0x00007f10643f2000)\n\tlibniftiio.so.2 => /usr/lib/libniftiio.so.2 (0x00007f10641d0000)\n\tlibznz.so.2 => /usr/lib/libznz.so.2 (0x00007f1063fcc000)\n\tlibz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1063db2000)\n\tlibstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1063a30000)\n\tlibm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f106372c000)\n\tlibgcc_s.so.1 => /opt/mcr/v92/sys/os/glnxa64/libgcc_s.so.1 (0x00007f1063516000)\n\tlibc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1063177000)\n\t/lib64/ld-linux-x86-64.so.2 (0x00007f1065513000)', duration=8.307612, endTime='2018-04-30T14:45:51.031657', environ={'CLICOLOR': 1, 'CONDA_DEFAULT_ENV': neuro, 'CONDA_DIR': /opt/conda, 'CONDA_PATH_BACKUP': /usr/lib/fsl/5.0:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin, 'CONDA_PREFIX': /opt/conda/envs/neuro, 'CONDA_PS1_BACKUP': , 'FORCE_SPMMCR': 1, 'FSLBROWSER': /etc/alternatives/x-www-browser, 'FSLDIR': /usr/share/fsl/5.0, 'FSLLOCKDIR': , 'FSLMACHINELIST': , 'FSLMULTIFILEQUIT': TRUE, 'FSLOUTPUTTYPE': NIFTI_GZ, 'FSLREMOTECALL': , 'FSLTCLSH': /usr/bin/tclsh, 'FSLWISH': /usr/bin/wish, 'GIT_PAGER': cat, 'HOME': /home/neuro, 'HOSTNAME': bb97daa6f4d9, 'JPY_PARENT_PID': 50, 'LANG': en_US.UTF-8, 'LC_ALL': C.UTF-8, 'LD_LIBRARY_PATH': /usr/lib/fsl/5.0:/usr/lib/x86_64-linux-gnu:/opt/mcr/v92/runtime/glnxa64:/opt/mcr/v92/bin/glnxa64:/opt/mcr/v92/sys/os/glnxa64, 'MATLABCMD': /opt/mcr/v92/toolbox/matlab, 'MPLBACKEND': module://ipykernel.pylab.backend_inline, 'ND_ENTRYPOINT': /neurodocker/startup.sh, 'PAGER': cat, 'PATH': /opt/conda/envs/neuro/bin:/usr/lib/fsl/5.0:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin, 'POSSUMDIR': /usr/share/fsl/5.0, 'PS1': (neuro) , 'PWD': /home/neuro/nipype_tutorial, 'SHLVL': 1, 'SPMMCRCMD': /opt/spm12/run_spm12.sh /opt/mcr/v92/ script, 'TERM': xterm-color, '_': /opt/conda/envs/neuro/bin/jupyter-notebook}, hostname='bb97daa6f4d9', merged='', platform='Linux-4.13.0-39-generic-x86_64-with-debian-9.4', prevcwd='/home/neuro/nipype_tutorial/notebooks', returncode=0, startTime='2018-04-30T14:45:42.724045', stderr='', stdout='', version='5.0.9')]
After the workflow finishes executing, the log file at /home/neuro/run_stats.log
can be parsed for the runtime statistics. Here is an example of what the contents would look like:
{"name":"resample_node","id":"resample_node",
"start":"2016-03-11 21:43:41.682258",
"estimated_memory_gb":2,"num_threads":1}
{"name":"resample_node","id":"resample_node",
"finish":"2016-03-11 21:44:28.357519",
"estimated_memory_gb":"2","num_threads":"1",
"runtime_threads":"3","runtime_memory_gb":"1.118469238281"}
Here it can be seen that the number of threads was underestimated while the amount of memory needed was overestimated. The next time this workflow is run the user can change the node interface num_threads
and estimated_memory_gb
parameters to reflect this for a higher pipeline throughput. Note, sometimes the "runtime_threads" value is higher than expected, particularly for multi-threaded applications. Tools can implement multi-threading in different ways under-the-hood; the profiler merely traverses the process tree to return all running threads associated with that process, some of which may include active thread-monitoring daemons or transient processes.
Nipype provides the ability to visualize the workflow execution based on the runtimes and system resources each node takes. It does this using the log file generated from the callback logger after workflow execution - as shown above. The pandas Python package is required to use this feature.
In [ ]:
from nipype.utils.profiler import log_nodes_cb
args_dict = {'n_procs': 8, 'memory_gb': 10, 'status_callback': log_nodes_cb}
workflow.run(plugin='MultiProc', plugin_args=args_dict)
# ...workflow finishes and writes callback log to '/home/user/run_stats.log'
from nipype.utils.draw_gantt_chart import generate_gantt_chart
generate_gantt_chart('/home/neuro/run_stats.log', cores=8)
# ...creates gantt chart in '/home/user/run_stats.log.html'
The generate_gantt_chart
function will create an html file that can be viewed in a browser. Below is an example of the gantt chart displayed in a web browser. Note that when the cursor is hovered over any particular node bubble or resource bubble, some additional information is shown in a pop-up.
<img src="../../static/images/gantt_chart.png", width="720">