This notebook illustrate how to define a cross-validation, a classifier, to classify a set a features and to visualize results. Documentation: https://etiennecmb.github.io/classification.html

Import librairies


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# u can use %matplotlib notebook, but there is some bugs with xticks and title

from brainpipe.classification import *
from brainpipe.visual import *

Create a random dataset

We are going to create a random dataset for a 2 class problem, n_trials pear class and n_feature. The quality of decoding of features will be increasing with the ranking, meaning that the first feature is going to be a bad one, the second alittle bit better, the third..., the last, the best one.

Dataset settings


In [2]:
n_trials = 100    # Number of trials pear class
n_features = 15   # Number of features

Create datasets


In [3]:
spread = np.linspace(0, 0.3, n_features)
class1 = np.random.uniform(size=(n_trials, n_features)) + spread
class2 = np.random.uniform(size=(n_trials, n_features)) - spread
x = np.concatenate((class1, class2), axis=0)
y = np.ravel([[k]*n_trials for k in np.arange(2)])

plt.figure(0, figsize=(12,6))
plt.boxplot(x);
rmaxis(plt.gca(), ['top', 'right']);
plt.xlabel('Features'), plt.ylabel('Values');


Classification

Define a classifier

We are going to create a classifier: a Support Vector Machine with a rbf kernel


In [4]:
model = 'svm'
kern='rbf'
clf_obj = defClf(y, clf=model, kern=kern)

Define a cross-validation

Here, we create a five times 10-stratified cross-validation


In [5]:
rep = 5
cvmodel = 'skfold'
cv_obj = defCv(y, cvtype=cvmodel, rep=rep)

Create the classification object

Basically, this object link the cross-validation and the classifier together and provide a bundle of functions.


In [6]:
cla_obj = classify(y, clf=clf_obj, cvtype=cv_obj)

Be sure of your settings

If you want to be sure of your current settings, you can print embedded tables

Classifier and cross-validation


In [7]:
cla_obj.info.clfinfo


Out[7]:
Chance (theorical, %) Class Classifier Cross-validation Repetition
0 50.0 [0, 1] Support Vector Machine (kernel=rbf) 5-times, 10 Stratified k-folds 5

Statistics


In [8]:
cla_obj.info.statinfo


Out[8]:
Chance (binomial, %) Chance (theorical, %) Class N-class
0 {'p_0.05': 56.0, 'p_0.01': 58.0, 'p_0.001': 61.0} 50.0 [0, 1] 2

Run the classification

Here, we classify each feature separatly and estimate the significiancy of each one using 20 permutations (randomize vector label 20 times which to at least p<0.05 p-values)


In [9]:
da, pvalue, daperm = cla_obj.fit(x, n_perm=20, method='label_rnd')

Display features information and save to an excel file


In [10]:
# Export every information to ecel :
filename = 'classification_demo.xlsx'
cla_obj.info.to_excel(filename)

# Display informations about features :
cla_obj.info.featinfo


Out[10]:
Settings SVM-rbf / 5-rep_10-skfold
Results DA (%) STD (+/-) p-values (Binomial) p-values (Permutations) Group
0 47.9 2.22 0.7376888221388045 0.45 0
1 55.6 0.49 0.051819519218933796 0.15 1
2 55.8 0.6 0.051819519218933796 0.05 2
3 60.5 1.0 0.0011397192509029486 0.05 3
4 60.3 0.68 0.0018174739762649716 0.05 4
5 57.0 0.84 0.020018595806698514 0.05 5
6 63.2 0.24 8.209262515823657e-05 0.05 6
7 65.2 0.93 6.928725786226053e-06 0.05 7
8 64.4 0.73 2.48648673960572e-05 0.05 8
9 66.1 0.58 1.7735890870396176e-06 0.05 9
10 71.2 0.4 5.149597415154972e-10 0.05 10
11 71.7 0.4 2.0076962314874436e-10 0.05 11
12 71.7 0.4 2.0076962314874436e-10 0.05 12
13 77.2 0.51 1.2212453270876722e-15 0.05 13
14 80.9 0.58 1.1102230246251565e-16 0.05 14

Plot your results

Plot decoding


In [11]:
plt.figure(1, figsize=(12,8))

cla_obj.daplot(da, daperm=daperm, chance_method='perm', rmax=['top', 'right'],
               dpax=['bottom', 'left'], cmap='viridis')


Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f19547402b0>

Plot confusion matrix of the last feature only


In [25]:
# Get each confusion matrix of each feature :
cm = cla_obj.cm()

# Plot the confusion matrix of the mast feature (-1):
fig2 = plt.figure(2, figsize=(6, 6))
cla_obj.cmplot(fig2, cm[-1, ...], fignum=2, figtitle='My figure', subspace={'top':0.85},
               title='Example of a confusion matrix', vmin=16, vmax=83);

# Save the plot :
fig2.savefig('My_confusion_matrix.png', dpi=300, bbox_inches='tight')


(2, 2)

Introduction to grouping

The group parameter and multi-features

Previously, we saw how to classify each feature separatly. Now we are going to see how to group features. In this example, we defined 15 features. So, we are going to define 3 groups features

  • 'Group1: the bad one': the 5 first
  • 'Group2: the middle one': the 3 following
  • 'Group3: the best one': the last 7 features

In [13]:
# Define the group parameter :
grp = ['Group1: the bad one']*5 + ['Group2: the middle one']*3 + ['Group3: the best one']*7

# Define a new classification object for this example :
cla_obj2 = classify(y, clf='lda', cvtype='sss', cvArg={'rep':30, 'n_folds':10})

# Run classification, not on each feature, but on each group of features:
da2, pvalue2, daperm2 = cla_obj2.fit(x, grp=grp, method='label_rnd', n_perm=50)

Plot the grouping decoding


In [14]:
plt.figure(3, figsize=(8, 6))
cla_obj2.daplot(da2, cmap='Spectral_r', ylim=[45, 100], chance_method='perm',
                daperm=daperm2, chance_color='darkgreen');
cla_obj2.info.featinfo


Out[14]:
Settings LDA / 30-rep_10-sss
Results DA (%) STD (+/-) p-values (Binomial) p-values (Permutations) Group
0 66.05 2.81 1.7735890870396176e-06 0.02 Group1: the bad one
1 71.61666666666666 3.8 2.0076962314874436e-10 0.02 Group2: the middle one
2 98.66666666666667 0.77 1.1102230246251565e-16 0.02 Group3: the best one

As we can see above, the classification is applied on each group. The mf parameter of the fit() function is just a shortcut to say that all features have to be consider together. U can use the grp parameter for:

  • Grouping features together
  • Labelize each one/group to have nice tables and plot