Comparing machine learning approaches to predict SEEG accuracy

Stereoencephalography (SEEG) is a technique used in drug-resistant epilepsy patients that may be a candidate for surgical resection of the epileptogenic zone. Multiple electrodes are placed using a so-called "frame based" stereotactic approach, in our case using the Leksell frame. In our previous paper "Methodology, outcome, safety and in vivo accuracy in traditional frame-based stereoelectroencephalography" by Van der Loo et al (2017) we reported on SEEG electrode implantation accuracy in a cohort of 71 patients who were operated between September 2008 and April 2016, in whom a total of 902 electrodes were implanted. Data for in vivo application accuracy analysis were available for 866 electrodes.

The goal of the current project is to use a public version of this dataset (without any personal identifiers) to predict electrode implantation accuracy by using and comparing different machine learning approaches.

Pieter Kubben, MD, PhD
neurosurgeon @ Maastricht University Medical Center, The Netherlands

For any questions you can reach me by email or on Twitter.

Data description

The public dataset contains these variables:

  • PatientPosition: patient position during surgery (nominal: supine, prone)
  • Contacts: nr of contacts of electrode implanted (ordinal: 5, 8, 10, 12, 15, 18)
  • ElectrodeType: describes trajectory type (nominal: oblique, orthogonal). Oblique refers to implantation using the Leksell arc, and orthogonal using a dedicated L-piece mounted on the frame (mostly implants in temporal lobe) when arc angles become too high (approx > 155°) or too low (approx < 25°)
  • PlanningX: planned Cartesian X coord of target (numeric, in mm)
  • PlanningY: planned Cartesian Y coord of target (numeric, in mm)
  • PlanningZ: planned Cartesian Z coord of target (numeric, in mm)
  • PlanningRing: planned ring coord, the trajectory direction in sagittal plane (numeric, in degrees); defines entry
  • PlanningArc: planned arc coord (trajectory direction in coronal plane (numeric, in degrees); defines entry
  • DuraTipDistancePlanned: distance from dura mater (outer sheet covering the brain surface) to target (numeric, in mm)
  • EntryX: real Cartesian X coord of entry point (numeric, in mm)
  • EntryY: real Cartesian Y coord of entry point (numeric, in mm)
  • EntryZ: real Cartesian Z coord of entry point (numeric, in mm)
  • TipX: real Cartesian X coord of target point (numeric, in mm)
  • TipY: real Cartesian Y coord of target point (numeric, in mm)
  • TipZ: real Cartesian Z coord of target point (numeric, in mm)
  • SkinSkullDistance: distance between skin surfacce and skull surface (numeric, in mm)
  • SkullThickness: skull thickness (numeric, in mm)
  • SkullAngle: insertion angle of electrode relative to skull (numeric, in degrees)
  • ScrewLength: length of bone screw used to guide and fixate electrode (ordinal: 20, 25, 30, 35 mm)

The electrodes are the Microdeep depth electrodes by DIXI Medical.

To the limited extent possible in this case I tried to make these FAIR data and adhere to FAIR guiding principles. In practice this meant I introduced the topic, described my data and created a DOI.

Now let's get started.


In [1]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
%matplotlib inline
import warnings; warnings.simplefilter('ignore')
%xmode plain; # shorter error messages

# global setting whether to save figures or not
# will save as 300 dpi PNG - all filenames start with "fig_"
save_figures = False

In [2]:
# load data
electrodes = pd.read_csv('../data/electrodes_public.csv')

# find missing values
nan_rows = sum([True for idx,row in electrodes.iterrows() if any(row.isnull())])
print('Nr of rows with missing values:', nan_rows)


Nr of rows with missing values: 823

We will calculate target point localization error (TPLE) using the Euclidean distance and remove the entry data as we won't be using them (still wanted to share them in dataset though).


In [3]:
# calculate TPLE and remove entry data from dataframe
electrodes['TPLE'] = np.sqrt(np.square(electrodes['TipX'] - electrodes['PlanningX']) + 
                              np.square(electrodes['TipY'] - electrodes['PlanningY']) + 
                              np.square(electrodes['TipZ'] - electrodes['PlanningZ'])
                             ).round(1)
electrodes.drop(['EntryX', 'EntryY', 'EntryZ'], axis = 1, inplace = True)
electrodes.head()


Out[3]:
PatientPosition Contacts ElectrodeType PlanningX PlanningY PlanningZ PlanningRing PlanningArc DuraTipDistancePlanned TipX TipY TipZ SkinSkullDistance SkullThickness SkullAngle ScrewLength TPLE
0 Supine 18.0 Oblique 125.8 106.5 135.5 154.6 90.2 86.6 126.4 106.8 135.2 7.0 9.7 70.3 25.0 0.7
1 Supine 18.0 Oblique 130.6 131.0 136.4 155.4 96.6 105.9 134.7 132.9 136.0 9.4 7.4 63.8 30.0 4.5
2 Supine 10.0 Oblique 139.1 104.9 124.0 131.1 146.0 35.2 139.1 108.3 124.4 9.8 5.5 66.4 30.0 3.4
3 Supine 10.0 Oblique 137.7 112.2 115.0 96.2 137.1 35.9 136.4 115.2 115.0 8.4 6.5 66.3 25.0 3.3
4 Supine 12.0 Oblique 126.0 76.0 124.1 159.1 156.3 39.5 124.6 75.8 126.8 7.4 4.6 84.7 25.0 3.0

Now we will remove large outliers (difference between planned and real coord) in the Z-axis as electrode insertion length (depth) is influenced also by other factors (calculations regarding depth which could lead to either too superficial or too deep, but also possible malfixation of the screw cap which may cause loosening of the electrode and hence a more superficial position.. it won't migrate into the depth spontaneously). These are very limited numbers and would too much influence further analysis.


In [4]:
# check for outliers in Z axis
large_depth_error = electrodes[np.abs(electrodes['PlanningZ'] - electrodes['TipZ']) > 10]
print('Outliers in Z axis (> 10mm):\n\n', large_depth_error['TPLE'])

# remove outliers
electrodes.drop(large_depth_error.index, inplace = True)
print('\nNew dataframe shape:', electrodes.shape) # removed 6 rows


Outliers in Z axis (> 10mm):

 446    14.8
462    14.5
463    11.2
508    45.8
612    14.2
632    20.0
Name: TPLE, dtype: float64

New dataframe shape: (860, 17)

We need to structure our data properly for further analysis and convert the categorical variables (nominal, ordinal) to the category type.


In [5]:
# convert categorical columns to "category" dtype
catcols = ['PatientPosition', 'Contacts', 'ElectrodeType', 'ScrewLength']
for cat in catcols:
    electrodes[cat] = electrodes[cat].astype('category')

# confirm correct types for all columns now
electrodes.dtypes


Out[5]:
PatientPosition           category
Contacts                  category
ElectrodeType             category
PlanningX                  float64
PlanningY                  float64
PlanningZ                  float64
PlanningRing               float64
PlanningArc                float64
DuraTipDistancePlanned     float64
TipX                       float64
TipY                       float64
TipZ                       float64
SkinSkullDistance          float64
SkullThickness             float64
SkullAngle                 float64
ScrewLength               category
TPLE                       float64
dtype: object

Let's get a short description of our TPLE data.


In [6]:
# get summary data on TPLE
tple = electrodes['TPLE']
tple.describe().round(1)


Out[6]:
count    860.0
mean       3.4
std        2.3
min        0.2
25%        2.0
50%        2.9
75%        4.1
max       19.8
Name: TPLE, dtype: float64

We now have data in the right format, but for classification we need to bin the continuous outcome variables EPLE and TPLE into categories. Alternatively we could approach this as a regression problem, but given the relative limited amount of data classification should lead to a better prediction model that is still relevant for potential clinical use.

Let's create a new variable TPLE category for this purpose.


In [7]:
# create different possible cuts to create categories (and experiment with them)
tple_max = tple.max().round()
electrodes_3cat = pd.cut(tple, bins = [0, 2.5, 5, tple_max], labels = ['0 - 2.5', '2.5 - 5', '> 5'])
electrodes_4cat = pd.cut(tple, bins = [0, 2, 4, 6, tple_max], labels = ['0 - 2', '2 - 4', '4 - 6', '> 6'])
electrodes_4quant = pd.cut(tple, bins = [0, tple.quantile(.25), tple.median(), tple.quantile(.75), tple_max], labels = ['0 - 2', '2 - 3', '3 - 4', '> 4'])
electrodes_5cat = pd.cut(tple, bins = [0, 1, 2, 5, 10, tple_max], labels = ['0 - 1', '1 - 2', '2 - 5', '5 - 10', '> 10'])
electrodes_7cat = pd.cut(tple, bins = [0, 1, 2, 4, 6, 8, 10, tple_max], labels = ['0 - 1', '1 - 2', '2 - 4', '4 - 6', '6 - 8', '8 - 10', '> 10'])

# apply cut to create TPLE category column
electrodes['TPLE category'] = electrodes_5cat
nr_of_y_categories = len(electrodes['TPLE category'].unique()) # needed for confusion matrix and keras MLP

# check correct conversion with first 15 rows
electrodes[['TPLE', 'TPLE category']].head(15).T


Out[7]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
TPLE 0.7 4.5 3.4 3.3 3 1.8 2.3 4.2 4.8 3.4 3.9 3.6 7.4 4.2 4.4
TPLE category 0 - 1 2 - 5 2 - 5 2 - 5 2 - 5 1 - 2 2 - 5 2 - 5 2 - 5 2 - 5 2 - 5 2 - 5 5 - 10 2 - 5 2 - 5

In [8]:
# count nr of items in each category
electrodes[['TPLE', 'TPLE category']].groupby('TPLE category').count().T


Out[8]:
TPLE category 0 - 1 1 - 2 2 - 5 5 - 10 > 10
TPLE 50 184 492 112 22

So we have a lof of electrodes in the 2-5mm TPLE category, which is explained well with an interquartile range of 2.0 - 4.1. Let's create some plots to learn more about our data.

Visualization

Let's first get an impression of variable distributions.


In [9]:
# plot TPLE distribution
tple.plot(kind = 'density', figsize = (15,5), title = 'TPLE density plot');
plt.xlim(0,10);
if save_figures:
    plt.savefig('fig_tple_density_plot.png', dpi = 300)



In [10]:
# plot variable distributions (density)
params = {'subplots': True, 'layout': (3,4), 'sharex': False, 'figsize': (16, 12)}
elec_visual = electrodes.drop(['TPLE'], axis = 1)
elec_visual.plot(kind='density', **params);
if save_figures:
    plt.savefig('fig_variables_density_plot.png', dpi = 300)



In [11]:
# alternatively, make a box plot
elec_visual.plot(kind='box', sharey = False, **params);
if save_figures:
    plt.savefig('fig_variables_boxplot.png', dpi = 300)


Now we have a visual impression of how variables are distributed. As we want to develop a model to predict TPLE category, it would be helpful to see to what extent variable distribution differs per TPLE category.


In [12]:
# first relate numeric variables to TPLE category
numcols = list(electrodes.columns[electrodes.dtypes == float].drop('TPLE'))
fig, axes = plt.subplots(4,3, figsize = (15,20))
fig.subplots_adjust(hspace=.5)
electrodes.boxplot(column = numcols, by = 'TPLE category', ax = axes);
if save_figures:
    plt.savefig('fig_tplecompare_boxplot.png', dpi = 300)


In contrast to numerical variables in which we can plot categories against actual feature values, in categorical columns we can "only" plot categories against the number of items in each category. This is what we will do next.

Note that these are absolute counts, and no ratios (e.g. supine position is used most and this reflects also below). So the point is to inspect the graphs for any remarkable ratio differences between categories and not simply look at the highest bars...


In [13]:
# show absolute TPLE count per category 
elec_count = 'Electrode count'
for cat in catcols:
    cat_count = electrodes[[cat, 'TPLE','TPLE category']].groupby([cat,'TPLE category']).count()
    cat_count.rename_axis({'TPLE': elec_count}, axis = 'columns', inplace = True)
    sns.factorplot(x = cat, y = elec_count, data = cat_count.reset_index(), kind = 'bar', hue = 'TPLE category',
                   size = 5, aspect = 3);
    if save_figures:
        plt.savefig('fig_tplecompare_cat_{}.png'.format(cat), dpi = 300)


Now, how do they correlate with each other? To be able to get correlations we first have to deal with missing data. We will create a temporary soluting here for visualisation and use another (for that purpose recommended) approach for predictive modelling.


In [14]:
# copy dataframe to temporary frame in which we'll impute missing values
elec_pairplot = electrodes

# create a utility function to check if we (still) have missing values present in our data
def print_missing(dataframe):
    '''checks which columns contain missing values and returns count'''
    df_na = dataframe.columns[electrodes.isnull().any()]
    
    if len(df_na) > 0:
        print('Missing data present in {} columns: \n'.format(len(df_na)))
        for c in df_na:
            print('- {} ({})'.format(c, dataframe[c].isnull().sum()))
    else:
        print('No missing data found! :-)')
        
print_missing(elec_pairplot)


Missing data present in 5 columns: 

- Contacts (56)
- PlanningRing (269)
- PlanningArc (269)
- DuraTipDistancePlanned (54)
- ScrewLength (534)

In [15]:
# use median values for numeric columns
for missing in ['PlanningRing', 'PlanningArc', 'DuraTipDistancePlanned']:
    elec_pairplot[missing] = elec_pairplot[missing].fillna(elec_pairplot[missing].median())

# use most frequent value for categorical columns
for missing in ['Contacts', 'ScrewLength']:
    most_frequent_value = elec_pairplot[missing].value_counts().index[0]
    print('Most frequent value in category "{}": {}'.format(missing, most_frequent_value))
    elec_pairplot[missing] = elec_pairplot[missing].fillna(most_frequent_value)

print_missing(elec_pairplot)


Most frequent value in category "Contacts": 18.0
Most frequent value in category "ScrewLength": 25.0
No missing data found! :-)

From practical experience I can confirm that we do use 18 contact points frequently often and it makes sense to use those here (only 56 missing... more advanced may be to correlate with other variables and decide then). Regarding screw length, 25mm is used by far the most, so even despite the fact that most values are missing (534 / 860) I still kept them in (in the original paper I referred to, increasing screw length seems to correspond with increasing TPLE).

Now let's look at some correlations.


In [16]:
# pairplot to correlate variable distributions per category.. busy plot
sns.pairplot(elec_pairplot[['PatientPosition', 'Contacts', 'ElectrodeType', 'DuraTipDistancePlanned', 
                            'SkinSkullDistance', 'SkullThickness', 'SkullAngle', 'ScrewLength', 'TPLE category']], 
             hue = 'TPLE category');
if save_figures:
    plt.savefig('fig_electrodes_corr_pairplot.png', dpi = 300)


Those are not exactly easily separable clusters... looks bad for further predictive modelling.... :-(

Alternatively we will use Peason's correlations to create a "heatmap". To include categorical variables we need to "one hot encode" them first (using pd.get_dummies()).


In [17]:
plt.figure(figsize = (15,12));
electrodes_dummies = pd.get_dummies(electrodes.drop(['TPLE category'], axis = 1))

# correlations for continuous variables only
# sns.heatmap(electrodes.corr(), square = True, annot = True, linewidths = .5, cmap = 'RdBu_r');

# correlations including categorical variables
sns.heatmap(electrodes_dummies.corr(), square = True, annot = False, cmap = 'RdBu_r');

if save_figures:
    plt.tight_layout() # needed to prevent cutting X-axis labels
    plt.savefig('fig_electrodes_corr_heatmap.png', dpi = 300)


TPLE classification

We will now use several machine learning classification approaches (manual, auto ML and a quick glance at deep learning) to predict TPLE category. I tried several feature selection and dimensionality reduction approaches that I left in place as comments as they did not contribute to better results.

Manual approach


In [30]:
from sklearn.preprocessing import LabelEncoder, PolynomialFeatures, Imputer, MinMaxScaler
from sklearn.feature_selection import SelectKBest, chi2, VarianceThreshold
elec_features = electrodes.drop(['TPLE', 'TPLE category'], axis = 1)

# encode categorical features (no one hot encoding to avoid creating too much features)
for cat in catcols:
    elec_features[cat] = LabelEncoder().fit_transform(elec_features[cat])

X = elec_features # to start with
# deal with missing values the sklearn way (and do not impute if not needed to)
X = Imputer(strategy = 'most_frequent').fit_transform(X)

# feature selection - none of these actually improved accuracy (not tried for all other outcomes)
# X = elec_features[['ScrewLength', 'PatientPosition', 'ElectrodeType', 'Contacts', 'SkullAngle']] 
# X = elec_features['SkullAngle'][:, np.newaxis]
# X = MinMaxScaler().fit_transform(X) # is it correct to do this on whole dataset?
# X = VarianceThreshold(threshold=(.8 * (1 - .8))).fit_transform(X)
# X = PolynomialFeatures(2, interaction_only = True).fit_transform(X)
# X = SelectKBest(chi2, k = 5).fit_transform(X, y)

# encode outcome (separate encoder, want to make sure to be able to reverse later)
le = LabelEncoder()
y = le.fit_transform(electrodes['TPLE category'])

In [31]:
# dimensionality reduction using PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = X.shape[1])
X_pca = pca.fit_transform(X)

# plot PCA variance ratio
plt.figure(figsize = (10,6));
plt.plot(pca.explained_variance_ratio_.cumsum());
plt.title('Explained variance ratio by PCA components');
plt.xlabel('PCA component');
plt.ylabel('Cumulative variance ratio');
if save_figures:
    plt.savefig('fig_mlmanual_pca.png', dpi = 300)



In [32]:
# use 5 principal components... (explains > 95% of variance) even slightly reduces accuracy...
# X = PCA(5).fit_transform(X)

Now split the data intro train and test set (maybe I should have done this earlier, e.g. using a Pipeline, for future predictions, but for now I'll leave it as is).


In [33]:
# split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 33, stratify = y)

In [39]:
# import modules
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from xgboost import XGBClassifier

# prepare models
models = {'LOG': LogisticRegression(),
          'LDA': LinearDiscriminantAnalysis(),
          'KNN': KNeighborsClassifier(),
          'CART': DecisionTreeClassifier(),
          'NB': GaussianNB(),
          'SVM': SVC(),
          'LSVM1': LinearSVC(penalty='l1', dual = False),
          # 'LSVM2': LinearSVC(),
          'RF': RandomForestClassifier(),
          'ADA': AdaBoostClassifier(),
          'XGB': XGBClassifier()
         }

Everything is set up. We will now evalute our models for accuracy and plot the results.


In [40]:
# evaluate each model in turn
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
names = []; results = []
print('MODEL: ACCURACY (STD)\n')

for name, model in models.items():
    cv_results = cross_val_score(model, X_train, y_train, cv = KFold(10), scoring = 'accuracy')
    names.append(name)
    results.append(cv_results)
    print('{}: {:.2f} ({:.2f})'.format(name, cv_results.mean(), cv_results.std()))
    # print(classification_report(y_test, model.fit(X_train, y_train).predict(X_test), target_names = le.classes_))


MODEL: ACCURACY (STD)

LOG: 0.57 (0.03)
LDA: 0.57 (0.04)
KNN: 0.47 (0.06)
CART: 0.42 (0.05)
NB: 0.48 (0.08)
SVM: 0.57 (0.04)
LSVM1: 0.57 (0.04)
RF: 0.52 (0.05)
ADA: 0.44 (0.08)
XGB: 0.57 (0.04)

In [43]:
# boxplot algorithm comparison
fig = plt.figure(figsize = (12,6))
plt.title('Accuracy comparison for different models')
ax = fig.add_subplot(111)
ax.set_ylim(0.3, 0.7)
plt.boxplot(results)
ax.set_xticklabels(names); # set xtick labels after plotting! (otherwise default values override custom labels)
if save_figures:
    plt.savefig('fig_mlmanual_boxplot.png', dpi = 300)



In [84]:
# create confusion matrix to compare true labels and predicted labels
from sklearn.metrics import confusion_matrix
model = LogisticRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
conf_mat_labels = sorted(list(electrodes['TPLE category'].unique()))
conf_mat = confusion_matrix(y_test, y_pred).T
# conf_mat = confusion_matrix(y_test, y_pred, labels = conf_mat_labels)

plt.figure(figsize = (7, 7));
plt.title('Confusion matrix for model predictions')
sns.heatmap(conf_mat, annot = True, fmt = 'd', cbar = False, square = True, cmap = 'Purples', linewidth = .5,
            xticklabels = conf_mat_labels, yticklabels = conf_mat_labels);
plt.xlabel('True TPLE category');
plt.ylabel('Predicted TPLE category');
if save_figures:
    plt.savefig('fig_mlmanual_confusionmatrix.png', dpi = 300)


Hmm... TPLE category 2-5 is overrepresented... how many variables of our test set do we actually have in each category?


In [85]:
print('Nr of values in each category for y_test:\n')
y_test_unique = np.unique(le.inverse_transform(y_test), return_counts=True)
print('Unique values:', y_test_unique[0])
print('Unique values count:', y_test_unique[1])


Nr of values in each category for y_test:

Unique values: ['0 - 1' '1 - 2' '2 - 5' '5 - 10' '> 10']
Unique values count: [ 15  55 148  33   7]