Tutorial for fMRI Classification using Python

Load in Python packages necessary for analysis

Dependencies:

  • nilearn
  • nibabel
  • sklearn (included in Anaconda)
  • matplotlib (included in Anaconda)
  • numpy (included in Anaconda)
  • pandas (included in Anaconda)
  • seaborn

In [ ]:
%matplotlib inline

In [ ]:
# Nilearn for neuro-imaging-specific machine learning
from nilearn import datasets
from nilearn.input_data import NiftiMasker
from nilearn import image

# Nibabel for general neuro-imaging tools
import nibabel

# Scikit-learn for machine learning
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import LeaveOneLabelOut, cross_val_score

import numpy as np
import pandas as pd

# Plotting
import matplotlib.pyplot as plt
from nilearn import plotting
import seaborn as sns
sns.set(context="poster", style="ticks", font="Arial")

Load Haxby dataset


In [ ]:
haxby_dataset = datasets.fetch_haxby(n_subjects=1)

Let's take a quick look at what's been imported:


In [ ]:
haxby_dataset

The Haxby dataset includes anatomical (3D; x $\times$ y $\times$ z) and functional timeseries (4D; x $\times$ y $\times$ x $\times$ TR) data, as well as some functionally & anatomically defined masks, and label data (i.e., which TRs correspond to which conditions).

Check out & Visualize data

We can grab the filepath for the anatomical image like this:


In [ ]:
haxby_dataset.anat[0]

And then use this path to import the anatomical image


In [ ]:
anat_img = nibabel.load(haxby_dataset.anat[0])
anat_data = anat_img.get_data()

In [ ]:
print 'Anatomical data has shape:', anat_data.shape

Anatomical data


In [ ]:
cut_coords = (0, 0, 0) # MNI coordinates to view
plotting.plot_anat(anat_img, cut_coords)

ROI data


In [ ]:
mask_img = nibabel.load(haxby_dataset.mask_vt[0])
mask_data = mask_img.get_data()
plotting.plot_roi(mask_img, bg_img=anat_img, cmap='autumn')

Functional data (mean data, collapsing across time)


In [ ]:
func_img_mean = image.mean_img(haxby_dataset.func[0])
plotting.plot_epi(func_img_mean, cut_coords, cmap='Reds')

Functional data shape


In [ ]:
func_img = nibabel.load(haxby_dataset.func[0])
func_data = func_img.get_data()
func_img.shape

So, each TR a slice 40, 64, 64 was collected, and there were 1452 TRs total

Define masks of interest for subsequent analyses

Here's a list of some masks we might want to use:


In [12]:
mask_names = ['mask_vt', 'mask_face', 'mask_house']

Import labels

In order to understand what's going on in the functional data over time, we can import the labels (i.e., condition) for each TR. In your own data, you might assign these labels by labeling the TRs from 4-12 s post-trial onset (e.g., TRs 3-6 if you have a 2 second TR) with the corresponding trial label, and then take the average value for each voxel over those TRs. As a result, for each trial you will have one value for each voxel, yielding a trial $\times$ voxel data matrix. You'll also have a label vector, that is the length of the number of trials.


In [13]:
labels = np.recfromcsv(haxby_dataset.session_target[0], delimiter=" ")
stimuli = labels['labels']

Make sure the number of stimuli labels matches the number of expected TRs (1452 from above)


In [14]:
stimuli.shape


Out[14]:
(1452,)

What are the unique labels?


In [15]:
np.unique(stimuli)


Out[15]:
rec.array(['bottle', 'cat', 'chair', 'face', 'house', 'rest', 'scissors',
       'scrambledpix', 'shoe'], 
      dtype='|S12')

We might want to remove a stimulus category, such as "rest"


In [16]:
resting_state = stimuli == "rest"
categories = np.unique(stimuli[np.logical_not(resting_state)])
print categories


['bottle' 'cat' 'chair' 'face' 'house' 'scissors' 'scrambledpix' 'shoe']

Now identify which run each stimulus tag corresponds to:


In [17]:
session_labels = labels["chunks"][np.logical_not(resting_state)]

Set up classifiers

L2-regularized logistic regression


In [18]:
lr_classifier = LogisticRegression(penalty='l2', C=1.)

Linear support vector classifier


In [19]:
svc_classifier = SVC(C=1., kernel="linear")

A classifier to set the chance level


In [20]:
dummy_classifier = DummyClassifier()

Set up a data splitting object for cross validation

Leave one run out


In [21]:
cv = LeaveOneLabelOut(session_labels)

Run CV Classification within each mask


In [23]:
# Initialize storage dataframe
df = pd.DataFrame(columns=['subid', 'mask_name', 'category', 
                           'type', 'mean', 'sd'])

func_filename = haxby_dataset.func[0]

# Determine type of classifier
classifier = lr_classifier

# Loop through masks & categories for classification
for mask_name in mask_names:
    # Just get timecourses from voxels within mask
    mask_filename = haxby_dataset[mask_name][0]
    masker = NiftiMasker(mask_img=mask_filename, standardize=True)
    masked_timecourses = masker.fit_transform(func_filename)[np.logical_not(resting_state)]

    for category in categories:
        print "Processing %s %s" % (mask_name, category)
        classification_target = stimuli[np.logical_not(resting_state)] == category
        
        for acc_type, classify in zip(['accuracy', 'chance'], 
                                      [classifier, dummy_classifier]):
            scores = cross_val_score(classify, 
                                     masked_timecourses,
                                     classification_target,
                                     cv=cv, scoring="f1")
            
            df = df.append(pd.DataFrame({'subid': 'subj1', 
                                         'mask_name': mask_name,
                                         'category': category, 
                                         'type': acc_type,
                                         'mean': scores.mean(),
                                         'sd': scores.std()}, index=[0]))


Processing mask_vt bottle
Processing mask_vt cat
Processing mask_vt chair
Processing mask_vt face
Processing mask_vt house
Processing mask_vt scissors
Processing mask_vt scrambledpix
Processing mask_vt shoe
Processing mask_face bottle
Processing mask_face cat
Processing mask_face chair
Processing mask_face face
Processing mask_face house
Processing mask_face scissors
Processing mask_face scrambledpix
Processing mask_face shoe
Processing mask_house bottle
Processing mask_house cat
Processing mask_house chair
Processing mask_house face
Processing mask_house house
Processing mask_house scissors
Processing mask_house scrambledpix
Processing mask_house shoe
/Users/steph-backup/anaconda/lib/python2.7/site-packages/sklearn/metrics/metrics.py:1771: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)

In [24]:
df.head()


Out[24]:
category mask_name mean sd subid type
0 bottle mask_vt 0.437529 0.185599 subj1 accuracy
0 bottle mask_vt 0.117330 0.100232 subj1 chance
0 cat mask_vt 0.469818 0.171203 subj1 accuracy
0 cat mask_vt 0.087427 0.082015 subj1 chance
0 chair mask_vt 0.533427 0.209404 subj1 accuracy

In [25]:
with sns.plotting_context("notebook"):
    g = sns.FacetGrid(df, col="mask_name", aspect=.66, size=5)
    g.map(sns.stripplot, "mean", "category", "type", orient="h",
          palette="deep", size=10)
    g.add_legend(title="")


2-way Classification (Face vs. House) in a mask


In [26]:
np.logical_or(stimuli == 'face', stimuli == 'house')


Out[26]:
array([False, False, False, ..., False, False, False], dtype=bool)

Mask timecourses


In [27]:
mask = 'mask_vt'
smoothing = 4
standardize_ts = True # center & norm timeseries

In [28]:
mask_filename = haxby_dataset[mask][0]
session_labels = labels["chunks"] # runs, to be detrended separately
nifti_masker = NiftiMasker(mask_img=mask_filename, sessions=session_labels,
                           smoothing_fwhm=smoothing, standardize=standardize_ts,
                           memory="nilearn_cache", memory_level=1)
func_filename = haxby_dataset.func[0]
X = nifti_masker.fit_transform(func_filename)

TR $\times$ voxel matrix:


In [29]:
X.shape


Out[29]:
(1452, 577)

Mask the labels and data to only include 2 categories of interest:


In [30]:
stim_mask = np.logical_or(stimuli == 'face', stimuli == 'house')
stimuli = stimuli[stim_mask]
session_labels = session_labels[stim_mask]
X = X[stim_mask]
print X.shape


(216, 577)

Feature selection

Univariate feature selection based on ANOVA (F-test). Select the number of best features to be selected.


In [31]:
num_feat = 500
anova = SelectKBest(f_classif, k=num_feat)

Processing pipeline


In [32]:
classifier = lr_classifier
featsel_classify = Pipeline([('anova', anova), ('classification', classifier)])

In [33]:
featsel_classify.fit(X, stimuli)
stim_pred = featsel_classify.predict(X)

In [34]:
### Look at the classifiers estimates
coef = classifier.coef_
# reverse feature selection
coef = anova.inverse_transform(coef)
# reverse masking
weight_img = nifti_masker.inverse_transform(coef)

In [35]:
plotting.plot_stat_map(weight_img, func_img_mean, title='LR weights')


Out[35]:
<nilearn.plotting.displays.OrthoSlicer at 0x114092550>

Cross validation


In [36]:
cv = LeaveOneLabelOut(session_labels)
cv_scores = []
house_prob = []
face_prob = []

df = pd.DataFrame(columns=['category', 'probability'])

for train, test in cv:
    featsel_classify.fit(X[train], stimuli[train])
    y_pred = featsel_classify.predict(X[test])
    cv_scores.append(np.sum(y_pred == stimuli[test]) / float(np.size(stimuli[test])))

    house_prob = np.array([prob[1] for prob in featsel_classify.predict_proba(X[test])[stimuli[test] == 'house']]).mean()
    face_prob = np.array([prob[0] for prob in featsel_classify.predict_proba(X[test])[stimuli[test] == 'face']]).mean()
    
    df = df.append({'category': 'house', 'probability': house_prob}, ignore_index=True)
    df = df.append({'category': 'face', 'probability': face_prob}, ignore_index=True)

In [37]:
df.head()


Out[37]:
category probability
0 house 0.999566
1 face 0.984899
2 house 0.997781
3 face 0.968549
4 house 0.999330

In [38]:
sns.violinplot(x='category', y='probability', data=df)


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x115238ad0>

In [39]:
sns.violinplot(cv_scores)


Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x1152b4d90>

In [41]:
cv_scores


Out[41]:
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

In [ ]: