Nilearn

If you're working on NeuroImaging data, you should check another Python library, Nilearn, that is design for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modeling, classification, decoding, or connectivity analysis. Use their website to find out more.

As an example of how to use Nilearn, we will use the Haxby 2001 study on a face vs cat discrimination task in a mask of the ventral stream. This part is based on a Nilearn tutorial.

Note that first time you fetch the data, it can take a few minutes.

Downloading data


In [1]:
from nilearn import datasets

# By default 2nd subject will be fetched
haxby_dataset = datasets.fetch_haxby()

We can access anatomical, functional and mask data. And in addition we have true labels.


In [2]:
func_file = haxby_dataset.func[0]
mask_file = haxby_dataset.mask_vt[0]
anat_file = haxby_dataset.anat[0]
labels_file = haxby_dataset.session_target[0]

let's get some info about bold file:


In [3]:
!nib-ls $func_file


/home/neuro/nilearn_data/haxby2001/subj2/bold.nii.gz int16 [ 40,  64,  64, 1452] 3.50x3.75x3.75x2.50   sform

Convert the fMRI volume’s to a data matrix using masks

We need our functional data in a 2D, sample-by-voxel matrix. To get that, we'll select a set of voxels in Ventral Temporal cortex defined by mask from the Haxby study:


In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from nilearn import plotting
plotting.plot_roi(mask_file, anat_file, cmap='Paired', dim=-.5)


/opt/conda/envs/neuro/lib/python3.6/site-packages/matplotlib/__init__.py:1405: UserWarning: 
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

  warnings.warn(_use_error_msg)
Out[4]:
<nilearn.plotting.displays.OrthoSlicer at 0x7f5447097128>

Now we will create masker using the NiftiMasker. NiftiMasker is an object that applies a mask to a dataset and returns the masked voxels as a vector at each time point. Here we use standardizing=True the time-series are centered and normed.


In [5]:
from nilearn.input_data import NiftiMasker
masker = NiftiMasker(mask_img=mask_file, standardize=True)

# We give the masker a filename and retrieve a 2D array ready
# for machine learning with scikit-learn
fmri_masked = masker.fit_transform(func_file)

fmr_mask is a NumPy array and its shape corresponds to the number of time-points times the number of voxels in the mask.


In [6]:
print(fmri_masked.shape)


(1452, 464)

To recover the original data shape (giving us a masked and z-scored BOLD series), you can use masker.inverse_transform.

Load the behavioral labels and choosing conditions

The label_file is CSV file, we can read it using NumPy:


In [7]:
labels =  np.recfromcsv(labels_file, delimiter=" ")
labels


Out[7]:
rec.array([(b'rest',  0), (b'rest',  0), (b'rest',  0), ..., (b'rest', 11),
           (b'rest', 11), (b'rest', 11)], 
          dtype=[('labels', 'S12'), ('chunks', '<i8')])

It's an array that have labels that gives information about condition and chunks represents a run number. We will use conditions:


In [8]:
conditions = labels['labels']
np.unique(conditions)


Out[8]:
array([b'bottle', b'cat', b'chair', b'face', b'house', b'rest',
       b'scissors', b'scrambledpix', b'shoe'],
      dtype='|S12')

We see that there are 9 different conditions, but we will use faces and cats only. Let's create another mask (this time masking the time points) that we will apply to our fmri_mask


In [9]:
condition_mask = np.logical_or(conditions == b'face', conditions == b'cat')
conditions_2lb = conditions[condition_mask]
fmri_masked_2lb = fmri_masked[condition_mask]
print(fmri_masked_2lb.shape)


(216, 464)

Decoding with an SVM

Now we will use a learning algorithm from scikit-learn to apply to our neuroImaging data. We will use a Support Vector Classification, with a linear kernel.


In [10]:
from sklearn.svm import SVC
svc = SVC(kernel='linear')

Let's split our data and fit the model using the training set:


In [11]:
from sklearn.model_selection import train_test_split
fmri_tr, fmri_ts, cond_tr, cond_ts = train_test_split(fmri_masked_2lb, conditions_2lb)
svc.fit(fmri_tr, cond_tr)


Out[11]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

And we can check the score for the testing set:


In [12]:
svc.score(fmri_ts, cond_ts)


Out[12]:
1.0

Exercise 1

Validate the model using cross_val_score, try different kernels for SVM.

Run cross_val_score for SVC linear kernel:


In [13]:
from sklearn.model_selection import cross_val_score, LeaveOneOut

svc_ln = SVC(kernel='linear')
scores = cross_val_score(svc_ln, fmri_masked_2lb, conditions_2lb, cv=LeaveOneOut())
print("Linear kernel: scores = {}, mean score = {:03.2f}".format(scores, scores.mean()))


Linear kernel: scores = [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.
  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.], mean score = 0.93

And now, let's try a default kernel


In [14]:
svc_rbf = SVC()
scores = cross_val_score(svc_rbf, fmri_masked_2lb, conditions_2lb, cv=LeaveOneOut())
print("Default rbf kernel: scores = {}, mean score = {:03.2f}".format(scores, scores.mean()))


Default rbf kernel: scores = [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  0.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  0.  1.  0.], mean score = 0.93

In [15]:
# write your solution here

# 1. read about available kernels http://scikit-learn.org/stable/modules/svm.html and initialize models with two different kernels

# 2. run cross_val_score for both models and compare the results

Exercise 2

Check if KNeighborsClassifier would work for this dataset. Validate the model in the same way as SVC.


In [16]:
from sklearn.neighbors import KNeighborsClassifier
clf_kn = KNeighborsClassifier()

scores = cross_val_score(clf_kn, fmri_masked_2lb, conditions_2lb, cv=LeaveOneOut())
print("Scores: {}, mean score = {:03.2f}".format(scores, scores.mean()))


Scores: [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  0.  0.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.], mean score = 0.95

As we can see, this model also works pretty well for the dataset.


In [17]:
# write your solution here

# 1. initialize  KNeighborsClassifier (you can change the default arguments)

# 2. run cross_val_score

We can check weights assigned to the features by the model:


In [18]:
coef = svc.coef_
print(coef)


[[ -6.66092940e-02  -1.14664454e-02  -1.97769182e-02  -3.04724624e-02
    1.53187515e-02   6.28550982e-03   7.83290001e-03  -8.12416516e-02
    1.57173128e-02  -5.20420111e-02  -7.35369434e-03  -1.00199015e-02
   -2.30872973e-02  -4.37551427e-02  -3.14713848e-02  -3.82188263e-04
    4.13592361e-03  -1.20209509e-02  -7.98476207e-04  -2.89294506e-02
    4.50636049e-02  -4.95792799e-02  -6.37967127e-02   1.27540192e-02
    3.08607109e-02   8.38984690e-03  -2.84729729e-02  -9.08582416e-03
   -3.25462769e-02   4.84650435e-02   2.65477129e-02  -6.03446888e-02
    2.21949168e-02  -2.66166702e-02   4.37225208e-03  -3.68251458e-02
    2.36926896e-02  -8.63238170e-03  -1.12463147e-02  -2.66835631e-02
   -1.12645053e-01   4.32792123e-02   1.91639841e-02  -4.16300764e-04
   -2.09639050e-02   1.74558139e-02  -4.66711437e-03   4.43542561e-02
   -2.92823793e-02   2.80123631e-02   3.92462613e-02  -1.07184601e-02
    2.85507711e-02   2.55060524e-02  -5.14979160e-02  -8.37271812e-03
   -8.99568377e-03  -4.22588254e-02   3.21002014e-02  -1.52894764e-01
   -4.11951769e-02  -2.46584622e-02   6.68832252e-02  -7.37358093e-02
   -3.30621776e-02   4.73652435e-03   4.64159249e-03  -2.94682841e-02
    4.65739482e-02  -3.17542844e-02  -2.31947118e-03   1.58858552e-03
    7.55793850e-02  -3.16880731e-02   6.66573721e-02   2.17932772e-02
    1.83931246e-02   3.07360067e-03   7.73173971e-03  -7.75594974e-03
    5.87865393e-03  -5.11462419e-03  -1.04721737e-02  -1.81089863e-02
    1.68354203e-02   7.59414511e-03  -2.64620107e-02  -4.54163266e-02
   -7.11829388e-02   2.95527206e-02  -9.70854606e-03  -4.14534959e-02
   -4.52257850e-02  -4.94848016e-02  -5.75299337e-02  -1.19304552e-02
    5.90080879e-03  -2.72396836e-02   2.83738141e-02   1.04419041e-02
   -3.72566510e-03   6.86058610e-02   5.74612554e-03  -3.69863182e-02
    2.36124310e-03  -2.33668068e-02   1.54836416e-02   3.48280204e-02
   -8.61549756e-03  -1.96733233e-03  -5.74018316e-03  -2.01088854e-02
   -6.50593458e-02  -3.82073611e-02   3.98823938e-02  -5.49746249e-03
   -4.43856260e-02   3.94352299e-02  -1.40294888e-02   1.46129643e-02
   -3.90352878e-02  -2.77302524e-02  -1.66721299e-03  -4.26999751e-02
    1.17293211e-02   4.70720380e-02  -7.44029351e-03  -1.38260748e-02
   -2.84424136e-02  -4.98894500e-03   3.04211411e-02   2.69175099e-02
   -6.04877298e-02  -1.97436501e-02   1.61024571e-02  -3.86152897e-03
    2.22219038e-03   1.24064608e-02  -1.64149711e-02   3.26857594e-02
   -1.82706827e-02  -3.82791102e-03   1.13735633e-02  -4.07283474e-02
    1.09018526e-02  -5.45755383e-03  -3.57280130e-02   1.31790318e-02
   -4.32526614e-02  -6.58191158e-02   1.65992875e-02  -2.98589705e-02
    8.93553713e-03   8.83694914e-03   5.21681602e-02   1.11310968e-03
    4.61273155e-02   5.74548058e-02   4.14349762e-02   2.85437439e-02
    4.83021168e-03   3.53883776e-02   7.16804281e-02   3.55828699e-02
    4.55933856e-02   4.69393862e-02  -3.68196306e-02   1.22546987e-03
    5.85629680e-02  -4.45043790e-02   2.90818222e-02   8.41173435e-03
   -4.70950903e-02  -2.30175177e-02   6.49553400e-02   1.08345583e-02
    3.83955918e-02   2.58440647e-02   1.95741270e-02   5.99291541e-03
    2.96122732e-02   1.52747797e-02   4.35949947e-02  -3.73579463e-03
   -4.69235714e-02  -4.38696298e-02  -1.46204861e-03   4.67766772e-02
   -4.39521319e-02  -1.59827437e-02  -9.22092560e-03  -4.99227347e-02
    3.92395342e-02   1.58457380e-02  -1.04210462e-02   6.13159700e-03
   -5.15512669e-02  -3.42626979e-02   2.92864431e-02   8.37738998e-02
   -6.06123793e-02   2.70997904e-02  -4.00892010e-02   5.04710083e-02
   -1.83007378e-02  -1.05758417e-02   1.56170463e-02   2.78732756e-02
    6.00915622e-02  -1.56519144e-02   1.17189396e-02  -3.46413327e-03
   -2.06240809e-02  -2.05392876e-03  -3.81039595e-02  -4.95142191e-02
   -8.54838580e-02  -1.54473537e-02  -2.14555933e-03  -1.45801516e-02
   -2.52599246e-02  -5.72166954e-02  -4.77737248e-02   2.18746363e-02
   -5.72837891e-02  -3.70067502e-02   1.29876222e-02  -1.58538093e-02
   -1.20641971e-03  -9.89828864e-02  -1.89482211e-02  -3.35726383e-02
   -2.39971385e-02   1.86673431e-02  -3.54086108e-02   3.86412333e-03
   -1.39883033e-02  -1.20868514e-02  -1.66487172e-02   1.04211722e-02
   -3.10496328e-02  -1.38396838e-02  -1.15385203e-02  -4.52620710e-02
   -2.43641601e-02   1.23421811e-02  -5.30799618e-02   7.89912869e-02
    5.51104194e-03  -1.07508822e-02  -2.62954100e-02   6.47003122e-03
    3.80202062e-02  -2.63646110e-02  -5.12589489e-02  -3.29960439e-02
   -3.95669174e-02  -2.44723850e-02   1.66446589e-02  -3.89787437e-02
    3.80163611e-02   1.44357169e-02  -2.03398202e-03   1.19386287e-02
   -2.07430326e-03   4.83116082e-02  -5.87081051e-03   9.96981409e-03
    9.24369969e-02  -1.89856365e-03   3.38525447e-02  -2.53167682e-02
   -4.42243816e-02   3.80863139e-02  -4.36789694e-02  -4.08668375e-02
    4.52121418e-02   4.58237690e-02   6.16037823e-02   7.94447537e-03
    5.42738259e-02  -2.32439022e-02   1.38974057e-03  -1.96421704e-03
   -3.94666169e-02  -4.92139757e-02  -4.91442567e-03   2.22656526e-02
    6.17238038e-03  -5.98887488e-03   1.11552668e-02   6.61433206e-02
    3.28566740e-02  -2.40463645e-02   6.59969172e-03   2.31143432e-02
   -2.87909036e-05  -1.30701691e-02  -1.31642254e-02  -1.23429249e-02
   -4.49072949e-02   3.99666986e-02   1.05340543e-02  -1.15014554e-02
    4.50894965e-03   3.94318953e-02   7.76942204e-03  -5.06452442e-03
    1.73834910e-02  -2.71641078e-02  -2.28030823e-02  -3.21435327e-02
   -2.05233813e-02  -1.16259520e-03   3.40695021e-02   7.50503810e-02
    1.55515488e-02   4.39980193e-03   8.39727542e-02   6.11097245e-02
   -1.86473653e-02   4.03223066e-03   2.00553200e-01   3.07612765e-02
   -4.71878579e-02   3.75039856e-02   1.18929521e-02  -1.55597072e-03
   -3.38733350e-02   2.18463424e-02  -3.81421968e-02  -6.02352386e-02
   -4.03418061e-02   3.17157727e-03   3.65261805e-03   1.39883829e-02
   -3.13805362e-02  -4.66404147e-02   7.68205116e-02  -7.59416413e-03
    5.84626818e-02  -5.31430746e-02   4.16912873e-02   2.21404862e-02
    3.42767178e-02  -9.28268803e-03   4.28308190e-02   7.22953408e-03
   -8.80853458e-03   5.74802367e-02  -5.69906713e-03   1.07659517e-01
    7.36889536e-02  -7.34572056e-03   3.22236630e-02  -9.34539758e-03
    4.64880145e-02  -2.64203979e-02  -1.74993686e-03  -1.20223795e-03
   -1.36016762e-02  -2.68100684e-02  -5.23593375e-02  -2.78367917e-02
   -7.50371958e-03   2.52751717e-02  -5.34968904e-03  -3.12923501e-03
    6.19767586e-02   6.81726197e-02  -1.10387375e-02  -1.41285406e-02
    7.62092732e-03   5.36694406e-02   8.23659048e-02   1.95401016e-02
    1.37919240e-02   5.74915876e-03  -3.77069451e-03   1.14118183e-02
    1.65866120e-02   2.31981441e-02   1.85833728e-02  -5.57346424e-02
    3.50370604e-02   5.05710567e-02   3.18320266e-02  -3.85107901e-02
   -2.24476042e-02  -5.17532164e-02   6.36872025e-02  -1.65945677e-02
    1.22840565e-03   1.96216943e-02   2.82291528e-02   3.28765767e-02
    2.32925130e-02   1.44691967e-02  -2.75843378e-02   1.36150647e-02
    3.64741076e-02  -6.63773051e-03  -1.13488074e-02  -4.88108806e-03
   -1.47694316e-02  -2.39176219e-02   2.28872207e-02  -8.31679843e-03
   -2.08321473e-02   2.03498373e-02  -1.87606409e-03  -4.95671856e-03
   -7.05464469e-03   4.81112328e-02   2.05175323e-02  -3.91128763e-03
   -2.81307677e-02  -1.26098641e-02  -6.26384758e-02   2.08362114e-02
   -4.42558506e-02  -2.65804991e-02  -1.59164433e-02  -1.85420985e-02
   -4.96276114e-02   5.96242908e-03   2.72774825e-03   4.22640623e-02
   -3.69243433e-03   5.38563831e-02   3.65670438e-02  -1.87276649e-02
    5.28732493e-02  -5.77530170e-02   2.59937561e-02  -1.60316422e-02
    6.03029717e-02  -8.37686982e-02  -4.63741393e-02   1.41207520e-02
   -4.68709875e-02  -4.22585335e-02   2.61570487e-02  -4.23731220e-03
   -1.03053275e-02  -3.32727082e-02  -1.68188513e-02  -5.32478621e-04
   -4.60568235e-02   1.57072056e-02   1.39228413e-03  -1.76918043e-02
    9.76744152e-03   1.93952109e-02   4.12846781e-02  -1.77910086e-02
    2.26442949e-02   2.56596896e-02   1.98036330e-03  -4.01846248e-02
    3.53867463e-03  -1.47935130e-02  -5.22775046e-02   3.13660600e-03]]

Our array should have the same size as the VT mask:


In [19]:
coef.shape


Out[19]:
(1, 464)

We need to turn it back into an original Nifti image, in essence, “inverting” what the NiftiMasker has done. For this, we can call inverse_transform on the NiftiMasker:


In [20]:
coef_img = masker.inverse_transform(coef)
print(coef_img)


<class 'nibabel.nifti1.Nifti1Image'>
data shape (40, 64, 64, 1)
affine: 
[[  -3.5      0.       0.      68.25 ]
 [   0.       3.75     0.    -118.125]
 [   0.       0.       3.75  -118.125]
 [   0.       0.       0.       1.   ]]
metadata:
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b''
dim_info        : 0
dim             : [ 4 40 64 64  1  1  1  1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : float64
bitpix          : 64
slice_start     : 0
pixdim          : [-1.    3.5   3.75  3.75  1.    1.    1.    1.  ]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 0
slice_code      : unknown
xyzt_units      : 0
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b''
aux_file        : b''
qform_code      : unknown
sform_code      : aligned
quatern_b       : 0.0
quatern_c       : 1.0
quatern_d       : 0.0
qoffset_x       : 68.25
qoffset_y       : -118.125
qoffset_z       : -118.125
srow_x          : [ -3.5    0.     0.    68.25]
srow_y          : [   0.       3.75     0.    -118.125]
srow_z          : [   0.       0.       3.75  -118.125]
intent_name     : b''
magic           : b'n+1'

If we need, we can save the image:


In [21]:
coef_img.to_filename('haxby_svc_weights.nii.gz')

Plotting the SVM weights

Now, lets plot our weights on top of the anatomical image:


In [22]:
from nilearn.plotting import plot_stat_map

plot_stat_map(coef_img, anat_file, vmax=0.1, dim=-1,
              title="SVM weights", display_mode="yx")


Out[22]:
<nilearn.plotting.displays.YXSlicer at 0x7f543f0dc0b8>

Now you can see which area in VT cortex are important to distinguish between the two conditions according to our model.

Exercise 3

Try to run model using all conditions (except rest state). This is multiclass classification, try one-vs-all and one-vs-one strategies (can read more here), which one should be faster? Does the new model has as high score as the one with two conditions only?

We will start from creating a new mask that removes only rest state.


In [23]:
conditions_new = conditions[conditions != b'rest']
fmri_masked_new = fmri_masked[conditions != b'rest']
fmri_masked_new.shape


Out[23]:
(864, 464)

Running One-vs-one multiclass classification. Note, that this will take longer than last time, see the shape of our data.


In [24]:
from sklearn.model_selection import cross_val_score, LeaveOneOut

svc_new_ovo = SVC(kernel='linear', decision_function_shape="ovo")
scores = cross_val_score(svc_new_ovo, fmri_masked_new, conditions_new, cv=LeaveOneOut())
print("Scores: {}, mean score = {:03.2f}".format(scores, scores.mean()))


Scores: [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.
  0.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  0.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  0.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  0.  0.  1.  1.  0.  1.  1.  1.  1.  1.  0.  0.  1.  1.
  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  0.  0.  1.  1.  0.  1.
  0.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.], mean score = 0.90

Let's try now one-vs-all now, it should be faster.


In [25]:
svc_new_ovr = SVC(kernel='linear', decision_function_shape="ovr")
scores = cross_val_score(svc_new_ovr, fmri_masked_new, conditions_new, cv=LeaveOneOut())
print("Scores: {}, mean score = {:03.2f}".format(scores, scores.mean()))


Scores: [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.
  0.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  0.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  0.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  0.  0.  1.  1.  0.  1.  1.  1.  1.  1.  0.  0.  1.  1.
  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  0.  0.  1.  1.  0.  1.
  0.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.
  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.], mean score = 0.90

Both methods give the same rusult.


In [26]:
# write your solution here

# 1. create a new mask and apply to conditions and fmri_masked

# 2. initialize SVC model with two different decision_function_shape; run cross_val_score and compare results

Exercise 4

Using one of the models from previous exercise check which condition is the easiest to identify by the model and which one is the hardest?

Lets split manually for two sets and use the model with ovr from previous exercise.


In [27]:
fmri_new_tr, fmri_new_ts, cond_new_tr, cond_new_ts = train_test_split(fmri_masked_new, conditions_new)
svc_new_ovr.fit(fmri_new_tr, cond_new_tr)
cond_new_pred = svc_new_ovr.predict(fmri_new_ts)

Now we will create a dictionary and for every condition we will check how often the model identified it correctly:


In [28]:
acc_cond = {}
for cn in np.unique(cond_new_ts):
    acc_cond[cn] = cond_new_pred[(cond_new_pred==cn) & (cond_new_ts==cn)].shape[0] / cond_new_ts[cond_new_ts==cn].shape[0]

print(acc_cond)


{b'bottle': 0.8636363636363636, b'cat': 0.8928571428571429, b'chair': 0.9583333333333334, b'face': 0.9642857142857143, b'house': 1.0, b'scissors': 0.8333333333333334, b'scrambledpix': 0.896551724137931, b'shoe': 0.8636363636363636}

Looks like house was the easiest for our model.


In [29]:
# write your code here

# 1. split the data using train_test_split and fit the model using training data

# 2. run predict on testing data

# 3. for every condition calculate how often model identified it correctly