Tutorial on how to use Brain_Data() Class

Written by Luke Chang

Brain_Data() is a python class in the nltools codebase that provides a representation of imaging data that is amenable to easily running statistical models. It is based on the fmri_data() class in Matlab developed by Tor Wager distributed in the canlabcore repository. The class relies heavily on nilearn. The basic idea of the class is to represent 3D or 4D data in vector form so that statistical operations can be run more efficiently. 3D data becomes 1D and 4D data becomes 2D.

Initialize Python Modules

Make sure you change the path of the output_dir to where you would like to run this tutorial.


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import glob
import numpy as np
import pandas as pd
import seaborn as sns
import os
import sys
import time
from nltools.simulator import Simulator
from nltools.utils import get_resource_path,get_anatomical
from nltools.analysis import Predict, Roc
from nltools.data import Brain_Data,threshold
from nltools.mask import create_sphere
from nilearn.plotting.img_plotting import plot_epi, plot_roi
import matplotlib.pyplot as plt

from copy import deepcopy

output_dir = '/Users/lukechang/Downloads/Test_Brain_Data'

Create some fake data

For this tutorial we will first create some fake data using nltools simulator() class. This creates voxels in a sphere of radius r with a signal that is correlated with Y cor and with each other cov. Gaussian noise of sigma is added to each voxel. We are simulating n_sub subjects with reps observations each.


In [86]:
tic = time.time() #Start Timer

sim = Simulator()
r = 10
sigma = .5
cor = .8
cov = .6
reps = 10
n_sub = 10
s1 = create_sphere([41, 64, 55], radius=r)
sim.create_cov_data(cor, cov, sigma, mask=s1, reps = reps, n_sub = n_sub, output_dir = output_dir)
print 'Simulation: Elapsed: %.2f seconds' % (time.time() - tic) #Stop timer


Simulation: Elapsed: 39.26 seconds

Load Data Brain_Data Class

next we will load data the data we created as a Brain_Data() instance. This requires specifying a nibabel instance, or a valid file name pointing to nifiti data. You can optionally specify a list of files. To run a prediction analysis you must specify training labels as 'Y'. Here we will load the simulated Y data.


In [2]:
tic = time.time() #Start Timer

y=pd.read_csv(os.path.join(output_dir,'y.csv'),header=None,index_col=None).T
dat = Brain_Data(data=os.path.join(output_dir,'maskdata_cor0.8_cov0.6_sigma0.5.nii.gz'),Y=y)
dat.X = pd.DataFrame({'Intercept':np.ones(len(dat.Y)),'X1':np.array(dat.Y).flatten()},index=None)
holdout = pd.read_csv(os.path.join(output_dir,'rep_id.csv'),header=None,index_col=None).T

print 'Elapsed: %.2f seconds' % (time.time() - tic) #Stop timer


Elapsed: 6.20 seconds

Common Brain_Data() Methods

There are many methods for the Brain_Data() class that can help with basic manipulation of the data. Many methods can be chained as they can output a Brain_Data() instance.


In [5]:
# Find the dimensions of the data.  images x voxels
print 'Shape of dat: ', dat.shape()

# Create a mean for every voxel over images
dat.mean()

# Show the dimensions of the mean image
print 'Shape of mean of dat: ', dat.mean().shape()

# Calculate the Standard deviation for each voxel across images
print 'Shape of standard Deviation of dat: ', dat.std().shape()

# We can create quick plots to inspect the data.
print 'Plot Standard Deviation of dat: '
dat.std().plot()

# We can convert the Brain_Data() instance into a nibabel instance.
d = dat.to_nifti()

# Brain_Data() instances can be sliced to select specific images
print 'Plot 5th image in dat: '
dat[4].plot()

# We can use any type of indexing to slice the data
print dat[[1,6,2]]

# We can empty an object
new = dat.empty()
print new

# We can also concatenate data using the append method
new = new.append(dat[4])
print new
new = new.append(dat[10])
print new

# Any Brain_Data object can be written out to a nifti file
dat.write(os.path.join(output_dir,'Tmp_Data.nii.gz'))


Shape of dat:  (100, 238955)
Shape of mean of dat:  (238955,)
Shape of standard Deviation of dat:  (238955,)
Plot Standard Deviation of dat: 
Plot 5th image in dat: 
nltools.data.Brain_Data(data=(3, 238955), Y=3, X=(3, 2), mask=MNI152_T1_2mm_brain_mask.nii.gz, output_file=[])
nltools.data.Brain_Data(data=(0,), Y=0, X=(0,), mask=MNI152_T1_2mm_brain_mask.nii.gz, output_file=[])
nltools.data.Brain_Data(data=(238955,), Y=1, X=(2,), mask=MNI152_T1_2mm_brain_mask.nii.gz, output_file=[])
nltools.data.Brain_Data(data=(2, 238955), Y=2, X=(2, 2), mask=MNI152_T1_2mm_brain_mask.nii.gz, output_file=[])

Statistical Analyses

There are several methods to conduct statistical analyses such as:

  • One sample t-tests
  • Univariate regression
  • Multivariate prediction

The results of the statistical analyses are output as dictionaries, which can be indexed to find Brain_Data() instances containing each result (e.g., t, p images). The results can be thresholded using the threshold() methods.


In [6]:
# Run a one-sample T-test
out = dat.ttest()
print 'Plot the results of a one sample t-test'
out['t'].plot()

# Threshold t-test results with FDR
print 'Threshold the results using FDR with q<.05 and plot'
tt = threshold(out['t'], out['p'], threshold_dict={'fdr':.05})
tt.plot()

# Run a univariate regression for a design matrix containing an intercept and one independent variable
# for one Subject.  
# X matrix must be a pandas instance
dat.X = pd.DataFrame({'Intercept':np.ones(len(dat.Y)),'X1':np.array(dat.Y).flatten()},index=None)
out = dat.regress()
print 'Plot the beta parameters estimated using OLS for a univariate regression'
out['beta'][1].plot()

# Threshold t-test results with FDR
print 'Threshold the results using FDR with q<.05 and plot'
i=1
tt = threshold(out['t'][i], out['p'][i], threshold_dict={'fdr':.05})
tt.plot()


Plot the results of a one sample t-test
Threshold the results using FDR with q<.05 and plot
Plot the beta parameters estimated using OLS for a univariate regression
Threshold the results using FDR with q<.05 and plot

Run a 2-stage multilevel regression akin to SPM

Here we are going to run a separate regression for each subject and concatenate beta files and then run a one-sample t-test across subjects.

First, we initialize an empty Brain_Data instance and populate it with a loop that runs a separate regression for each participant.

Next, we run a second level t-test to determine which voxels are significantly different from zero and plot the t-test results and a thresholded t-image with 0.05 FDR correction.


In [12]:
n_trials = 10
n_subs = 10

start = 0
stop = n_trials
dat.X = pd.DataFrame({'Intercept':np.ones(len(dat.Y)),'X1':np.array(dat.Y).flatten()},index=None)
all = dat.empty()
for i in xrange(n_subs):
    sub_out = dat[start:stop].regress()
    start = start + n_trials
    stop = stop + n_trials
    tmp = sub_out['beta'].empty(data=False)[1]
    all = all.append(tmp)
    
l2 = all.ttest()
l2['t'].plot()
thr = threshold(l2['t'],l2['p'],threshold_dict={'fdr':.05})
thr.plot()


Run Multivariate Prediction

Running MVPA style analyses using multivariate regression is even easier and faster than univariate methods. All you need to do is specify the algorithm and cross-validation parameters.

The output variable is a dictionary of the most useful output from the prediction analyses. The predict function runs the prediction multiple times. One of the iterations uses all of the data to calculate the 'weight_map'. The other iterations are to estimate the cross-validated predictive accuracy.


In [11]:
tic = time.time() #Start Timer

output = dat.predict(algorithm='ridge', cv_dict={'type': 'kfolds','n_folds': 5,'stratified':dat.Y})

# Display the available data in the output dictionary
print output.keys()

# Plot the multivariate weight map
output['weight_map'].plot()

# Return the cross-validated predicted data
print output['yfit_xval']

print 'Elapsed: %.2f seconds' % (time.time() - tic) #Stop timer


overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.49
overall CV Correlation: 0.90
['weight_map', 'rmse_all', 'yfit_xval', 'intercept_xval', 'r_all', 'weight_map_xval', 'intercept', 'r_xval', 'yfit_all', 'Y', 'cv', 'rmse_xval']
[ 0.04632819  0.41686881 -0.78606106 -0.40392209  0.94478054  0.83679542
 -0.80183435 -0.13487734  0.74462763  1.23734513  0.08044533  0.51321837
 -0.76848681 -0.48602167  0.92156169  0.81328865 -0.82291991 -0.1651295
  0.78246732  1.20070137  0.0926731   0.41886486 -0.77781619 -0.47680881
  0.93999024  0.81560026 -0.8634103  -0.13610012  0.74299828  1.20883652
  0.1065067   0.44383197 -0.80516608 -0.40659454  0.93875631  0.78996263
 -0.82385885 -0.17825509  0.74093832  1.19890048  0.05683944  0.42776243
 -0.78323772 -0.4835654   0.92685838  0.78820283 -0.82391997 -0.16685612
  0.75893818  1.1715892   0.04190071  0.43403152 -0.7959384  -0.48281878
  0.94733353  0.84121896 -0.80768875 -0.13359063  0.79090318  1.220769
  0.09041874  0.4631632  -0.78869891 -0.48326653  0.93636855  0.81694052
 -0.82092671 -0.1829437   0.74105341  1.20076099  0.05704948  0.43303799
 -0.71187052 -0.48746724  0.95648561  0.816202   -0.79261975 -0.16065704
  0.72768275  1.18732874  0.08163816  0.42836808 -0.7773756  -0.47918816
  1.00590315  0.81787409 -0.77043473 -0.13247078  0.74489436  1.21598443
  0.08544442  0.44608331 -0.78702477 -0.47852172  1.00231337  0.81047588
 -0.82398363 -0.1764636   0.76448696  1.22856028]
Elapsed: 2.38 seconds

There are many types of algorithms and cross-validation schemas that one might be interested in using. Here are a few different examples of prediction and classification using k-folds, stratified, and leave-one-subject out cross-validation.


In [6]:
## PREDICTION ALGORITHMS

# Support Vector Regression, with 5 fold cross-validation
stats1 = dat.predict(algorithm='svr', plot=False, cv_dict={'type': 'kfolds','n_folds': 5, 'n':len(dat.Y)},**{'kernel':"linear"})

# Ridge Regression, with 5 fold stratified cross-validation.  
# This means that each fold will have an equal mean on the dat.Y training labels.
# You need to pass in a vector of training labels (dat.Y)
stats2 = dat.predict(algorithm='ridge', cv_dict={'type': 'kfolds','n_folds': 5,'stratified':dat.Y}, plot=False)

# Ridge Regression, with 5 fold between-subject cross-validation, where data for each subject is held out together.
# This means that each fold will have an equal mean on the dat.Y training labels.  
# You need to pass in a vector indicating subject id's of each image
stats3 = dat.predict(algorithm='ridge', cv_dict={'type': 'kfolds','n_folds': 5,'subject_id':holdout}, plot=False)

# Support Vector Regression, with leave-one-subject out cross-validation
# This means that each subject will be left out for each fold.
# You need to pass in a vector indicating subject id's of each image
stats4 = dat.predict(algorithm='svr', cv_dict={'type': 'loso','subject_id': holdout}, plot=False,**{'kernel':"linear"})

# Ridge Regression with 5-fold cross-validation where alpha shrinkage parameter is estimated in nested crossvalidation loop.
# You need to specify grid to search over.
stats5 = dat.predict(algorithm='ridgeCV', cv_dict={'type': 'kfolds','n_folds': 5, 'stratified':dat.Y}, plot=False,**{'alphas':np.linspace(.1, 10, 5)})

## CLASSIFICATION ALGORITHMS

## Support Vector Regression, with 5 fold cross-validation
# stats = dat.predict(algorithm='svm', cv_dict={'type': 'kfolds','n_folds': 5, 'n':len(dat.Y)},**{'kernel':"linear"})

## Support Vector Regression, with 5 fold cross-validation with Platt Scaling
## This will output probabilities of each class
# stats = dat.predict(algorithm='svm', cv_dict={'type': 'kfolds','n_folds': 5, 'n':len(dat.Y)},**{'kernel':'linear','probability':True})

## Logistic classificiation, with 5 fold stratified cross-validation.  
# stats = dat.predict(algorithm='logistic', cv_dict={'type': 'kfolds','n_folds': 5, 'n':len(dat.Y)})

## Ridge classificiation, with 5 fold between-subject cross-validation, where data for each subject is held out together.
# stats = dat.predict(algorithm='ridgeClassifier', cv_dict={'type': 'kfolds','n_folds': 5, 'subject_id':holdout})


/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
overall Root Mean Squared Error: 0.09
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.51
overall CV Correlation: 0.90
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.49
overall CV Correlation: 0.90
overall Root Mean Squared Error: 0.00
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.54
overall CV Correlation: 0.90
overall Root Mean Squared Error: 0.09
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.50
overall CV Correlation: 0.90
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall CV Root Mean Squared Error: 0.49
overall CV Correlation: 0.90
/usr/local/lib/python2.7/site-packages/sklearn/svm/base.py:216: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return column_or_1d(y, warn=True).astype(np.float64)

Bootstrap Brain_Data Analysis Methods

Sometimes we want to empirically estimate the standard error around an analysis using bootstrapping. Here we can bootstrap the main statistical analyses implemented. These are studentized bootstraps and time scales linearly. Eventually, I'll parallelize this to speed it up dramatically.


In [4]:
tic = time.time() #Start Timer

n_samples = 10

# Bootstrap Mean
out1 = dat.bootstrap(analysis_type='mean',n_samples=n_samples)
toc = time.time()
print 'Bootstrap Mean Time: %.2f seconds' % (toc - tic) #Stop timer

# Bootstrap Regress
tic2 = time.time() #Start Timer
out2 = dat.bootstrap(analysis_type='regress',n_samples=n_samples)
toc = time.time()
print 'Bootstrap Regress Time: %.2f seconds' % (toc - tic2) #Stop timer

# Bootstrap Predict
tic2 = time.time() #Start Timer
out3 = dat.bootstrap(analysis_type='predict',n_samples=n_samples, **{'algorithm':'ridge'} )
toc = time.time()
print 'Bootstrap Predict Time: %.2f seconds' % (toc - tic2) #Stop timer

print 'Total Time: %.2f seconds' % (time.time() - tic) #Stop timer


Bootstrap Mean Time: 4.96 seconds
Bootstrap Regress Time: 17.16 seconds
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
overall Root Mean Squared Error: 0.00
overall Correlation: 1.00
Bootstrap Predict Time: 8.00 seconds
Total Time: 30.12 seconds

Apply Mask to Brain_Data Object

Here we show how you can easily mask a Brain_Data instance using the apply_mask method. First, we create a nibabel mask with a spherical ROI with a 10mm radius around a specific point in voxel space. We then create a new masked Brain_Data instance by applying this mask to the dat object.


In [9]:
tic = time.time() #Start Timer

s1 = create_sphere([41, 64, 55], radius=10)
masked_dat = dat.apply_mask(s1)
masked_dat[1].plot()

print 'Total Time: %.2f seconds' % (time.time() - tic) #Stop timer


Total Time: 4.59 seconds

Find Similarity between a Brain_Data object and another image

Often we want to find the spatial similarity between two images. This is easy to do with Brain_Data objects using the similarity method - simply pass in a single Brain_Data or Nibabel instance and it will return either a pearson correlation or dot-product scalar response. If the data object contains multiple images, it will return a vector of correlations.

This method can also automatically detect discrepancies in the mask size between the two objects. We illustrate this by also calculating the similarity between the masked data object and a weight map calculated in the above examples.


In [64]:
tic = time.time() #Start Timer

# Calculate spatial correlation between a bunch of images in the dat Brain_Data object 
# with a single weight_map from the multivariate prediction example
r = dat.similarity(output['weight_map'])
print "Number of voxels going into the correlation: ", dat.shape()[1]
print "Mean Correlation: %.2f" % (np.mean(r))
plt.subplot(1,2,1)
plt.hist(r)

# Now do the same thing with a masked object
r_masked = masked_dat.similarity(output['weight_map'])
print "Number of voxels going into the masked data correlation: ", masked_dat.shape()[1]
print "Mean Correlation: %.2f" % (np.mean(r_masked))
plt.subplot(1,2,2)
plt.hist(r_masked)

print 'Total Time: %.2f seconds' % (time.time() - tic) #Stop timer


Number of voxels going into the correlation:  238955
Mean Correlation: -0.13
Number of voxels going into the masked data correlation:  4169
Mean Correlation: -0.31
Total Time: 3.98 seconds

Decompose Spatial Image into a linear combination of other images


In [10]:
out = output['weight_map'].multivariate_similarity(dat[0:5])
print out['beta']


[  5.61801517e-05  -1.06049219e-05  -3.81801233e-06  -2.47900498e-05
  -1.83853538e-05   3.86009872e-06]

Calculate Distance matrix for images in Brain_Data() instance


In [6]:
distance = dat.distance(method='euclidean')
sns.heatmap(distance)


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d157d90>

Extract average activity within ROI


In [7]:
mask = create_sphere([41, 64, 55], radius=10)
out = dat.extract_roi(mask)
plt.plot(out)


Out[7]:
[<matplotlib.lines.Line2D at 0x117d5aa90>]

In [ ]: