Utility Scripts

This script contains some usefil scripts

  1. Executing a bash command from python
  2. Command to convert byte string to ascii
  3. Expand the environment variables in the string
  4. Using Regular Expression to extract subject ID from the file path
  5. Parallel Execution of a loop - Part 1
  6. Parallel Execution of a loop - Part 2 (Passing Shared arrays/objects)
  7. Reading and writing CSV using Pandas
  8. Reading json files
  9. Plotting a Histogram and Box Plots

1. Executing a bash command from python

This script executes the fslmaths command to invert the mask i.e. 1 -> 0 and 0 -> 1


In [1]:
import subprocess
import os
from os.path import join as opj

mask = os.path.expandvars('$FSLDIR/data/standard/MNI152_T1_2mm_brain_mask.nii.gz')

proc = subprocess.Popen(['fslmaths', mask, '-mul', '-1', '-add' ,'1', 'mask_inverted'], 
                         stdout=subprocess.PIPE)
stdoutdata= proc.communicate()

# To check how the command was executed in cmdline

print("The commandline is: {}".format(subprocess.list2cmdline(proc.args)))

cwd = os.getcwd()

mask_inverted_path = opj(cwd, 'mask_inverted.nii.gz')


The commandline is: fslmaths /usr/local/fsl/data/standard/MNI152_T1_2mm_brain_mask.nii.gz -mul -1 -add 1 mask_inverted

2. Byte string to ASCII


In [ ]:
asciiString = byteString.decode("utf-8")

3. Expand the environment variables in the string


In [3]:
import os

os.path.expandvars('$FSLDIR')


Out[3]:
'/usr/share/fsl/5.0'

4. Using Regular Expression to extract subject ID from the file path


In [ ]:
import re 

anat_path = '/coreg_reg/_subject_id_0050002/anat2std_reg/sub-0050002_T1w_resample_brain_flirt.nii'
sub_id_extracted = re.search('.+_subject_id_(\d+)', anat_path).group(1)

5. Parallel Execution of a loop - Part 1

input_list = np.arange(100)

for i in range(100):
    function_perform_action(input_list[i])

In [ ]:
from multiprocessing import Pool
import multiprocessing.managers

input_list = np.arange(100)


# Create pool of 8 workers i.e. 8 processors will be utilized
pool = Pool(8)

# Execute the function - function_perform_action() with the inputs from input_list
pool.map(function_perform_action, input_list)

6. Parallel Execution of a loop - Part 2 (Passing Shared arrays/objects)

This function distributes a shared object (ndArray in our case) and then wait for the workers to compute and fill up the object.

input_list = np.arange(100)
sharedMatrix = np.zeros((100,100))

for i in range(100):
    function_compute(sharedMatrix,input_list[i])

Result - sharedMatrix with entries computed by function_compute
Note - Other global arrays can also be sent to the workers in the similar way


In [6]:
from functools import partial
from multiprocessing import Pool
import multiprocessing.managers

class MyManager(multiprocessing.managers.BaseManager):
    pass
MyManager.register('np_zeros', np.zeros, multiprocessing.managers.ArrayProxy)


input_list = np.arange(100)


m = MyManager()
m.start()

sharedMatrix = m.np_zeros((100,100))

func = partial(function_compute, sharedMatrix )

# Create pool of 8 workers
pool = Pool(8)

pool.map(func, input_list)

# or if you expect some returning outut then:
# data_outputs = pool.map(func, input_list)

7. Reading and writing CSV using Pandas


In [ ]:
import pandas as pd
import numpy as np


df = pd.read_csv('/home1/varunk/data/ABIDE1/RawDataBIDs/composite_phenotypic_file.csv') # , index_col='SUB_ID'

df = df.sort_values(['SUB_ID']) # Sort the table according to the SUB_ID field

df_matrix = df.as_matrix(['SUB_ID']).squeeze() # Extracting out the sub_id from the table

# or

# df_matrix = df.as_matrix() # to convert the whole table to matrix

# to do filtering based on field values

filtered_table = df.loc[(df['DSM_IV_TR'] == 2) & (df['SEX'] == 1) & (df['SITE_ID'] == 'NYU')]


# Save the table

filtered_table.to_csv('table.csv')

#  Another way to create the data frame and saving it

df = pd.DataFrame({
    'SITE_NAME': SITE ,  # SITE and NUM_AUT_DSM_V are scalar values thats why we need to include index=[0]
    'NUM_AUT_DSM_V': NUM_AUT_DSM_V , 
    
    },index=[0],columns = [ 'SITE_NAME',
                            'NUM_AUT_DSM_V',
                          ])
#  The above creates one row

# ------------ OR ----------------------------------------------------------------------------------------

df = pd.DataFrame({
    'SITE_NAME': [SITE] ,  # SITE and NUM_AUT_DSM_V could be lists, then it creates a table with multiple rows
    'NUM_AUT_DSM_V': [NUM_AUT_DSM_V] , 
    
    },columns = [ 'SITE_NAME',
                            'NUM_AUT_DSM_V',
                          ])


# Save the table

df.to_csv('table.csv')

8. Reading json files


In [ ]:
import json

json_path = '../scripts/json/paths.json'
with open(json_path, 'rt') as fp:
    task_info = json.load(fp)
    
# Accessing the contents:

path = task_info['atlas_path']

In [ ]:
# # For passing on the shared file:
# class MyManager(multiprocessing.managers.BaseManager):
#     pass
# MyManager.register('np_zeros', np.zeros, multiprocessing.managers.ArrayProxy)

9. Plotting Histogram


In [ ]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

bins = np.arange(0,10000, 10) # fixed bin size of 10

res = plt.hist(data_to_plot, 
         bins=bins, 
)

In [ ]:
# Plotting 2 histograms together

from matplotlib import pyplot

pyplot.hist(data_1, alpha=0.5,bins=bins, label='Data1',rwidth=0.1, align='left')
pyplot.hist(data_2,alpha=0.5, bins=bins, label='Data2',rwidth=0.1,align='right')
pyplot.legend(loc='upper right')
pyplot.xlabel('X label')
pyplot.show()

In [ ]:
res = pyplot.boxplot([data_1, data_2]) # Plots 2 box plots together (Can plot multiple as well)

In [ ]:


In [ ]:


In [ ]:


In [1]:
import os
os.path.expandvars('$FSLDIR')


Out[1]:
'/usr/share/fsl/5.0'

In [ ]:
/usr/share/fsl/5.0/data/standard/MNI152_T1_2mm_brain_mask.nii.gz

In [1]:
import numpy as np
bins = np.arange(10000, 10)

In [2]:
bins


Out[2]:
array([], dtype=int64)

In [ ]: