This notebook illustrate how to define a cross-validation, a classifier, to classify a set a features and to visualize results. Documentation: https://etiennecmb.github.io/classification.html
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# u can use %matplotlib notebook, but there is some bugs with xticks and title
from brainpipe.classification import *
from brainpipe.visual import *
We are going to create a random dataset for a 2 class problem, n_trials pear class and n_feature. The quality of decoding of features will be increasing with the ranking, meaning that the first feature is going to be a bad one, the second alittle bit better, the third..., the last, the best one.
In [2]:
n_trials = 100 # Number of trials pear class
n_features = 15 # Number of features
In [3]:
spread = np.linspace(0, 0.3, n_features)
class1 = np.random.uniform(size=(n_trials, n_features)) + spread
class2 = np.random.uniform(size=(n_trials, n_features)) - spread
x = np.concatenate((class1, class2), axis=0)
y = np.ravel([[k]*n_trials for k in np.arange(2)])
plt.figure(0, figsize=(12,6))
plt.boxplot(x);
rmaxis(plt.gca(), ['top', 'right']);
plt.xlabel('Features'), plt.ylabel('Values');
In [4]:
model = 'svm'
kern='rbf'
clf_obj = defClf(y, clf=model, kern=kern)
In [5]:
rep = 5
cvmodel = 'skfold'
cv_obj = defCv(y, cvtype=cvmodel, rep=rep)
In [6]:
cla_obj = classify(y, clf=clf_obj, cvtype=cv_obj)
In [7]:
cla_obj.info.clfinfo
Out[7]:
In [8]:
cla_obj.info.statinfo
Out[8]:
In [9]:
da, pvalue, daperm = cla_obj.fit(x, n_perm=20, method='label_rnd')
In [10]:
# Export every information to ecel :
filename = 'classification_demo.xlsx'
cla_obj.info.to_excel(filename)
# Display informations about features :
cla_obj.info.featinfo
Out[10]:
In [11]:
plt.figure(1, figsize=(12,8))
cla_obj.daplot(da, daperm=daperm, chance_method='perm', rmax=['top', 'right'],
dpax=['bottom', 'left'], cmap='viridis')
Out[11]:
In [25]:
# Get each confusion matrix of each feature :
cm = cla_obj.cm()
# Plot the confusion matrix of the mast feature (-1):
fig2 = plt.figure(2, figsize=(6, 6))
cla_obj.cmplot(fig2, cm[-1, ...], fignum=2, figtitle='My figure', subspace={'top':0.85},
title='Example of a confusion matrix', vmin=16, vmax=83);
# Save the plot :
fig2.savefig('My_confusion_matrix.png', dpi=300, bbox_inches='tight')
Previously, we saw how to classify each feature separatly. Now we are going to see how to group features. In this example, we defined 15 features. So, we are going to define 3 groups features
In [13]:
# Define the group parameter :
grp = ['Group1: the bad one']*5 + ['Group2: the middle one']*3 + ['Group3: the best one']*7
# Define a new classification object for this example :
cla_obj2 = classify(y, clf='lda', cvtype='sss', cvArg={'rep':30, 'n_folds':10})
# Run classification, not on each feature, but on each group of features:
da2, pvalue2, daperm2 = cla_obj2.fit(x, grp=grp, method='label_rnd', n_perm=50)
In [14]:
plt.figure(3, figsize=(8, 6))
cla_obj2.daplot(da2, cmap='Spectral_r', ylim=[45, 100], chance_method='perm',
daperm=daperm2, chance_color='darkgreen');
cla_obj2.info.featinfo
Out[14]:
As we can see above, the classification is applied on each group. The mf parameter of the fit() function is just a shortcut to say that all features have to be consider together. U can use the grp parameter for: