``````

In [2]:

import numpy as np
import pandas as pd
import nibabel as nib

``````

``````

In [3]:

``````
``````

In [4]:

``````
``````

In [5]:

``````
``````

In [6]:

``````
``````

Out[6]:

(91, 109, 91, 65)

``````

Test MNI to Voxel Coordinate Conversion

Get the inverse affine transform for the image, so we can go from image-space coordinates to voxel-data coordiantes.

``````

In [7]:

``````

Confirm that our coordinate translation works by checking against a known relationship (from FSLView).

MNI coordinate (28, 36, 2) = Voxel coordinate (31, 81, 37)

``````

In [8]:

nib.affines.apply_affine(inverse_affine, [28, 36, 2])

``````
``````

Out[8]:

array([ 31.,  81.,  37.])

``````

To be double sure, test our transformed coordinates on real data.

The patient 191 has a lesioned voxel at MNI coordinate (-36, 2, -2), but moving more than 2mm (1 voxel) in any direction puts you in empty space.

``````

In [9]:

``````
``````

In [10]:

``````
``````

In [11]:

``````

There should be damage here..

``````

In [12]:

nib.affines.apply_affine(inverse_affine, [-36, 2, -2])

``````
``````

Out[12]:

array([ 63.,  64.,  35.])

``````
``````

In [13]:

``````
``````

Out[13]:

1.0

``````

But not in any of these places...

``````

In [14]:

nib.affines.apply_affine(inverse_affine, [-40, 2, -2])

``````
``````

Out[14]:

array([ 65.,  64.,  35.])

``````
``````

In [15]:

``````
``````

Out[15]:

0.0

``````
``````

In [16]:

nib.affines.apply_affine(inverse_affine, [-36, -2, -2])

``````
``````

Out[16]:

array([ 63.,  62.,  35.])

``````
``````

In [17]:

``````
``````

Out[17]:

0.0

``````
``````

In [18]:

nib.affines.apply_affine(inverse_affine, [-36, 2, 2])

``````
``````

Out[18]:

array([ 63.,  64.,  37.])

``````
``````

In [19]:

``````
``````

Out[19]:

0.0

``````

Looks like our coordinate conversion is working. These were lazy checks, but lazy is better than nothing.

Test Coordinate to Index Conversion

Next, we need to map 3D voxel-data coordinates to 1D (flattened) coordinates.

First, create a small 3D array to test things on.

``````

In [20]:

test_data = np.random.rand(3, 3, 3)

``````
``````

In [21]:

test_data.shape

``````
``````

Out[21]:

(3, 3, 3)

``````
``````

In [22]:

test_data

``````
``````

Out[22]:

array([[[ 0.96603952,  0.7333165 ,  0.08405807],
[ 0.16360938,  0.5348787 ,  0.31332082],
[ 0.16836496,  0.91038218,  0.14604711]],

[[ 0.11633421,  0.57805041,  0.34473608],
[ 0.40597046,  0.40150874,  0.9052168 ],
[ 0.81390026,  0.79014952,  0.67670708]],

[[ 0.30568692,  0.52940662,  0.44017652],
[ 0.778479  ,  0.80518042,  0.72679871],
[ 0.54730178,  0.82862953,  0.36118964]]])

``````

Pull an arbitrary item using 3D indexing.

``````

In [23]:

test_data[0,2,1]

``````
``````

Out[23]:

0.91038217619808059

``````

The `ravel_multi_index` function translates a multi-dimensional index into the equivalent 1D index of a raveled array.

``````

In [24]:

np.ravel_multi_index([0,2,1], (3,3,3))

``````
``````

Out[24]:

7

``````
``````

In [25]:

test_data.ravel()[7]

``````
``````

Out[25]:

0.91038217619808059

``````

Now let's check this works on our image data.

``````

In [26]:

``````
``````

Out[26]:

1.0

``````
``````

In [27]:

``````
``````

Out[27]:

630756

``````
``````

In [28]:

``````
``````

Out[28]:

1.0

``````

Now that we've got indexing working using NumPy arrays, let's translate things into Pandas so we can have labeled rows.

Construct a test DF with image vectors from 10 patients. We do it iteratively, adding one patient at a time.

We'll also do some code profiling to see how much of a hit the system will take when doing these operations.

``````

In [29]:

``````
``````

In [45]:

``````
``````

Out[45]:

(91, 109, 91, 65)

``````
``````

In [46]:

%%memit
for i in range(64):
try:
except:
print('Created DF')

``````
``````

peak memory: 1164.52 MiB, increment: 344.36 MiB

``````
``````

In [47]:

``````
``````

Out[47]:

text-align: right;
}

text-align: left;
}

.dataframe tbody tr th {
vertical-align: top;
}

100
101
102
103
104
105
106
107
108
109
...
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063

0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

5 rows × 64 columns

``````

Now that we have the DF, let's search it.

``````

In [48]:

list(test_results[test_results == True].index)

``````
``````

Out[48]:

['102',
'103',
'104',
'105',
'106',
'1011',
'1012',
'1015',
'1018',
'1027',
'1028',
'1041',
'1043']

``````
``````

In [49]:

%%timeit
list(test_results[test_results == True].index)

``````
``````

1000 loops, best of 3: 1.26 ms per loop

``````
``````

In [50]:

%memit
list(test_results[test_results == True].index)

``````
``````

peak memory: 1605.99 MiB, increment: 0.00 MiB

Out[50]:

['102',
'103',
'104',
'105',
'106',
'1011',
'1012',
'1015',
'1018',
'1027',
'1028',
'1041',
'1043']

``````

Sure enough, visual inspection confirms that those patients have lesions at that coordinate.

Let's see if it is much slower when we tweak things so the outputs a list of patients directly.

``````

In [51]:

%%timeit

``````
``````

10 loops, best of 3: 54.3 ms per loop

``````

Okay, so it looks like we definitely don't want to just transpose the DF on the fly. What if we do that beforehand.

``````

In [52]:

``````
``````

In [53]:

%%timeit

``````
``````

10 loops, best of 3: 38.5 ms per loop

``````

Hmm, okay. It seems like the slowdown comes from indexing over columns instead of rows. Back to our original plan.

We want our mask database to be persistent, but fast to read and write, so we're going to use PyTables to store it as HDF5.

First, create an HDFstore object and add our DF to it.

``````

In [54]:

``````
``````

In [55]:

``````
``````

In [56]:

``````
``````

Out[56]:

text-align: right;
}

text-align: left;
}

.dataframe tbody tr th {
vertical-align: top;
}

100
101
102
103
104
105
106
107
108
109
...
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063

0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
...
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

5 rows × 64 columns

``````

We can close the store to free up memory when we're not using the mask data.

``````

In [57]:

store.close()
del store

``````

When working with data stored as HD5, we can select only the rows/columns we want to be read into memory.

This makes searching really fast.

``````

In [58]:

flat_index = 630756

``````
``````

In [63]:

%%timeit
vox_row = restore.select('df', start=flat_index, stop=flat_index+1)
res = vox_row.T.iloc[:,0] == 1
patients = list(res[res == True].index)
restore.close()

``````
``````

100 loops, best of 3: 10.3 ms per loop

``````
``````

In [60]:

%memit
vox_row = restore.select('df', start=flat_index, stop=flat_index+1)
res = vox_row.T.iloc[:,0] == 1
patients = list(res[res == True].index)
restore.close()

``````
``````

peak memory: 1985.64 MiB, increment: 0.00 MiB

``````

Much faster than reading the entire DF (though it looks like memory use is the same).

``````

In [ ]:

%%timeit
test_results = store.df.loc[630756,:] == 1
patients = list(test_results[test_results == True].index)
store.close()

``````
``````

In [ ]:

%memit
test_results = store.df.loc[630756,:] == 1
patients = list(test_results[test_results == True].index)
store.close()

``````

We can now quickly seach existing patients, but how about adding mask data for a new patient?

``````

In [ ]:

``````
``````

In [ ]:

data_df = store.get('df') # We have to get/modify a copy, because you can't add rows to an HD5 table.

``````
``````

In [ ]:

``````
``````

In [ ]:

store.put('df', data_df, format='t') # Overwrite the existing DF with the new DF.

``````
``````

In [ ]:

store.close()

``````

Double-check that `store` now contains the new patient data.

``````

In [ ]:

print(store.df.columns)

``````

It's hacky, slow, and and will eat up a lot of memory as the DF grows, but it works for now.

Wrap Functions

Now that we've got things working, we'll put everything (plus some new stuff) into functions.

``````

In [ ]:

from os import path
from pandas import HDFStore
from numpy import ravel_multi_index, random, prod

``````

``````

In [ ]:

"""

Parameters
----------
patient_number : str
Path to a standard-space NIfTI image.
store_path : str
Path to a Pandas HDFSstore (e.g. file.h5)
"""
# Check inputs
assert isinstance(patient_number, str)
if path.exists(store_path) is False:

assert path.exists(store_path)
# Load the mask-data store. If no store exists yet, it will be created.
store = HDFStore(store_path)
# Get the full DataFrame.
data_df = store.get('df')
# Add a new column for the new patient.
# Put the updated DataFrame back in the data store.
store.put('df', data_df, format='t')
# Force all our changes to be written to disk.
store.flush()
store.close()

``````
``````

In [ ]:

%%time

``````

It's slow, but acceptable. Adding new patients won't happen all that often. May have to re-factor when the DF gets bigger.

In the future, we can try refactoring things to utilize `store.select` while storing each mask as a separate node in the HDFStore.

Remove Patient

We may need to remove a patient entirely from the data store.

``````

In [ ]:

``````
``````

In [ ]:

data_df = store.get('df') # We have to get/modify a copy, because you can't add rows to an HD5 table.

``````
``````

In [ ]:

data_df.drop('191', axis=1, inplace=True)

``````
``````

In [ ]:

store.put('df', data_df, format='t') # Overwrite the existing DF with the new DF.

``````
``````

In [ ]:

store.close()

``````

Double-check that patient data for 191 has been removed from `store`.

``````

In [ ]:

print(store.df.columns)

``````
``````

In [ ]:

def remove_patient(patient_number, store_path):
"""

Parameters
----------
patient_number : str
store_path : str
Path to a Pandas HDFSstore (e.g. file.h5)
"""
# Check inputs
assert isinstance(patient_number, str)
assert path.exists(store_path)
store = HDFStore(store_path)
# Get the full DataFrame.
data_df = store.get('df')
# Remove the column for the patient.
data_df.drop(patient_number, axis=1, inplace=True)
# Put the updated DataFrame back in the data store.
store.put('df', data_df, format='t')
# Force all our changes to be written to disk.
store.flush()
store.close()

``````
``````

In [ ]:

%%time

``````
``````

In [ ]:

def coordinate_search(mni_coordinates, img_shape, inverse_affine, store_path):
"""

Parameters
----------
coordinates : tuple
MNI coordinates (x,y,z)
img_shape : tuple
Voxel dimensions of the 3D mask images to be searched.
inverse_affine : array-like
Mapping from MNI space to voxel-space.
store_path : str
Path to a Pandas HDFSstore (e.g. file.h5)

Returns
-------
patients : array-like
A list of patients with damage at the search coordinate.

Notes
-----
Currently only able to handle 2mm resolution MNI coordinates.
"""

voxel_coordinates = nib.affines.apply_affine(inverse_affine, list(mni_coordinates))
voxel_coordinates = [int(i) for i in voxel_coordinates]
voxel_index = np.ravel_multi_index(voxel_coordinates, img_shape)

restore = pd.HDFStore(store_path)
vox_row = restore.select('df', start=voxel_index, stop=voxel_index+1)
res = vox_row.T.iloc[:,0] == 1
patients = list(res[res == True].index)
restore.close()

return patients

``````
``````

In [ ]:

``````

This should run immediately after successful completion of `add_replace_patient_mask`.

``````

In [ ]:

from nilearn import image

def generate_overlap_iamge(overlap_image_path):

# Get an array containing the mask_paths for all patients in the database.

# Save the image
overlap_img.to_filename(overlap_image_path)

``````
``````

In [ ]:

sum_test = generate_overlap_iamge('foo')

``````
``````

In [ ]:

sum_test.shape

``````
``````

In [ ]:

sum_test.to_filename('sum_test.nii.gz')

``````
``````

In [ ]:

``````
``````

In [ ]:

if df is None:
else:
return df

``````
``````

In [ ]:

import memory_profiler as mp

``````
``````

In [ ]:

mp.

``````
``````

In [ ]:

foo = %memit -o

``````
``````

In [ ]:

foo.mem_usage

``````
``````

In [ ]: