Model Explanation for Classification Models

This document describes the usage of a classification model to provide an explanation for a given prediction.

Model explanation provides the ability to interpret the effect of the predictors on the composition of an individual score. These predictors can then be ranked according to their contribution in the final score (leading to a positive or negative decision).

Model explanation has always been used in credit risk applications in presence of regulatory settings . The credit company is expected to give the customer the main (top n) reasons why the credit application was rejected (also known as reason codes).

Model explanation was also recently introduced by the European Union’s new General Data Protection Regulation (GDPR, https://arxiv.org/pdf/1606.08813.pdf) to add the possibility to control the increasing use of machine learning algorithms in routine decision-making processes.

The law will also effectively create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them.

The process we will use here is similar to LIME. The main difference is that LIME uses a data sampling around score value locally, while here we perform as full cross-statistics computation between the predictors and the score and use a local piece-wise linear approximation.

Sample scikit-learn Classification Model

Here, we will use a sciki-learn classification model on a standard dataset (breast cancer detection model).

The dataset used contains 30 predictor variables (numerical features) and one binary target (dependant variable). For practical reasons, we will restrict our study to the first 4 predictors in this document.



In [1]:

    
from sklearn import datasets
import pandas as pd

%matplotlib inline

ds = datasets.load_breast_cancer();
NC = 4
lFeatures = ds.feature_names[0:NC]

df_orig = pd.DataFrame(ds.data[:,0:NC] , columns=lFeatures)
df_orig['TGT'] = ds.target
df_orig.sample(6, random_state=1960)









    Out[1]:







  
    
      
      mean radius
      mean texture
      mean perimeter
      mean area
      TGT
    
  
  
    
      50
      11.76
      21.60
      74.72
      427.9
      1
    
    
      457
      13.21
      25.25
      84.10
      537.9
      1
    
    
      259
      15.53
      33.56
      103.70
      744.9
      0
    
    
      85
      18.46
      18.52
      121.10
      1075.0
      0
    
    
      348
      11.47
      16.03
      73.02
      402.7
      1
    
    
      462
      14.40
      26.99
      92.25
      646.1
      1

For the classification task, we will build a ridge regression model, and train it on a part of the full dataset



In [2]:

    
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=120, random_state = 1960)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_orig[lFeatures].values, 
                                                    df_orig['TGT'].values, 
                                                    test_size=0.2, 
                                                    random_state=1960)

df_train = pd.DataFrame(X_train , columns=lFeatures)
df_train['TGT'] = y_train
df_test = pd.DataFrame(X_test , columns=lFeatures)
df_test['TGT'] = y_test

clf.fit(X_train , y_train)









    Out[2]:





RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=120, n_jobs=1, oob_score=False, random_state=1960,
            verbose=0, warm_start=False)



In [3]:

    
# clf.predict_proba(df[lFeatures])[:,1]

Model Explanation

The goal here is to be able, for a given individual, the impact of each predictor on the final score.

For our model, we will do this by analyzing cross statistics between (binned) predictors and the (binned) final score.

For each score bin, we fit a linear model locally and use it to explain the score. This is generalization of the linear case, based on the fact that any model can be approximated well enough locally be a linear function (inside each score_bin). The more score bins we use, the more data we have, the better the approximation is.

For a random forest , the score can be seen as the probability of the positive class.



In [4]:

    
from sklearn.linear_model import *
def create_score_stats(df, feature_bins = 4 , score_bins=30):
    df_binned = df.copy()
    df_binned['Score'] = clf.predict_proba(df[lFeatures].values)[:,0]
    df_binned['Score_bin'] = pd.qcut(df_binned['Score'] , q=score_bins, labels=False, duplicates='drop')
    df_binned['Score_bin_labels'] = pd.qcut(df_binned['Score'] , q=score_bins, labels=None, duplicates='drop')

    for col in lFeatures:
        df_binned[col + '_bin'] = pd.qcut(df[col] , feature_bins, labels=False, duplicates='drop')
    
    binned_features = [col + '_bin' for col in lFeatures]
    lInterpolated_Score= pd.Series(index=df_binned.index)
    bin_classifiers = {}
    coefficients = {}
    intercepts = {}
    for b in range(score_bins):
        bin_clf = Ridge(random_state = 1960)
        bin_indices = (df_binned['Score_bin'] == b)
        # print("PER_BIN_INDICES" , b , bin_indexes)
        bin_data = df_binned[bin_indices]
        bin_X = bin_data[binned_features]
        bin_y = bin_data['Score']
        if(bin_y.shape[0] > 0):
            bin_clf.fit(bin_X , bin_y)
            bin_classifiers[b] = bin_clf
            bin_coefficients = dict(zip(lFeatures, [bin_clf.coef_.ravel()[i] for i in range(len(lFeatures))]))
            # print("PER_BIN_COEFFICIENTS" , b , bin_coefficients)
            coefficients[b] = bin_coefficients
            intercepts[b] = bin_clf.intercept_
            predicted = bin_clf.predict(bin_X)
            lInterpolated_Score[bin_indices] = predicted

    df_binned['Score_interp'] = lInterpolated_Score 
    return (df_binned , bin_classifiers , coefficients, intercepts)

For simplicity, to describe our method, we use 5 score bins and 5 predictor bins.

We fit our local models on the training dataset, each model is fit on the values inside its score bin.



In [17]:

    
(df_cross_stats , per_bin_classifiers , per_bin_coefficients, per_bin_intercepts) = create_score_stats(df_train , feature_bins=5 , score_bins=10)


def debrief_score_bin_classifiers(bin_classifiers):
    binned_features = [col + '_bin' for col in lFeatures]
    score_classifiers_df = pd.DataFrame(index=(['intercept'] + list(binned_features)))
    for (b, bin_clf) in per_bin_classifiers.items():
        bin
        score_classifiers_df['score_bin_' + str(b) + "_model"] = [bin_clf.intercept_] + list(bin_clf.coef_.ravel())
    return score_classifiers_df
    
df = debrief_score_bin_classifiers(per_bin_classifiers)
df.head(10)









    Out[17]:







  
    
      
      score_bin_0_model
      score_bin_1_model
      score_bin_2_model
      score_bin_3_model
      score_bin_4_model
      score_bin_5_model
    
  
  
    
      intercept
      0.000112
      0.015389
      0.041015
      0.700053
      0.675309
      1.0
    
    
      mean radius_bin
      -0.001893
      0.002518
      0.005307
      -0.115488
      -0.015272
      0.0
    
    
      mean texture_bin
      0.000337
      0.001535
      0.002225
      -0.000978
      0.014017
      0.0
    
    
      mean perimeter_bin
      0.001050
      -0.002346
      0.007146
      0.011127
      0.063761
      0.0
    
    
      mean area_bin
      0.001642
      0.002734
      0.005307
      0.014674
      0.022713
      0.0

From the table above, we see that lower score values (score_bin_0) are all around zero probability and are not impacted by the predictor values, higher score values (score_bin_5) are all around 1 and are also not impacted. This is what one expects from a good classification model.

in the score bin 3, the score values increase significantly with mean area_bin and decrease with mean radius_bin values.

Predictor Effects

Predictor effects describe the impact of specific predictor values on the final score. For example, some values of a predictor can increase or decrease the score locally by 0.10 or more points and change the negative decision to a positive one.

The predictor effect reflects how a specific predictor increases the score (above or below the mean local contribtution of this variable).



In [18]:

    
for col in lFeatures:
    lcoef = df_cross_stats['Score_bin'].apply(lambda x : per_bin_coefficients.get(x).get(col))
    lintercept = df_cross_stats['Score_bin'].apply(lambda x : per_bin_intercepts.get(x))
    lContrib = lcoef * df_cross_stats[col + '_bin'] + lintercept/len(lFeatures)
    df1 = pd.DataFrame();
    df1['contrib'] = lContrib
    df1['Score_bin'] = df_cross_stats['Score_bin']
    lContribMeanDict = df1.groupby(['Score_bin'])['contrib'].mean().to_dict()
    lContribMean =  df1['Score_bin'].apply(lambda x : lContribMeanDict.get(x))
    # print("CONTRIB_MEAN" , col, lContribMean)
    df_cross_stats[col + '_Effect'] = lContrib - lContribMean

df_cross_stats.sample(6, random_state=1960)









    Out[18]:







  
    
      
      mean radius
      mean texture
      mean perimeter
      mean area
      TGT
      Score
      Score_bin
      Score_bin_labels
      mean radius_bin
      mean texture_bin
      mean perimeter_bin
      mean area_bin
      Score_interp
      mean radius_Effect
      mean texture_Effect
      mean perimeter_Effect
      mean area_Effect
    
  
  
    
      162
      13.21
      28.06
      84.88
      538.4
      1
      0.058333
      2
      (0.0333, 0.145]
      2
      4
      2
      2
      0.085435
      0.000000
      0.003994
      -0.000183
      3.469447e-18
    
    
      137
      20.26
      23.03
      132.40
      1264.0
      0
      1.000000
      5
      (0.992, 1.0]
      4
      4
      4
      4
      1.000000
      0.000000
      0.000000
      0.000000
      0.000000e+00
    
    
      304
      16.30
      15.70
      104.70
      819.8
      1
      0.175000
      3
      (0.145, 0.79]
      3
      1
      3
      3
      0.430013
      -0.066727
      0.001304
      0.005934
      9.130438e-03
    
    
      81
      13.05
      19.31
      82.61
      527.2
      1
      0.033333
      1
      (0.00833, 0.0333]
      2
      2
      2
      2
      0.024270
      0.001209
      0.000246
      -0.001220
      1.202913e-03
    
    
      450
      13.20
      17.43
      84.13
      541.6
      1
      0.000000
      0
      (-0.001, 0.00833]
      2
      1
      2
      2
      0.002047
      -0.002202
      -0.000083
      0.001244
      1.909459e-03
    
    
      184
      12.43
      17.00
      78.60
      477.3
      1
      0.000000
      0
      (-0.001, 0.00833]
      1
      1
      1
      1
      0.001248
      -0.000309
      -0.000083
      0.000194
      2.676812e-04

The previous sample, shows that the first individual lost 0.000000 score points due to the feature $X_1$, gained 0.003994 with the feature $X_2$, etc

Reason Codes

The reason codes are a user-oriented representation of the decision making process. These are the predictors ranked by their effects.



In [19]:

    
import numpy as np
reason_codes = np.argsort(df_cross_stats[[col + '_Effect' for col in lFeatures]].values, axis=1)
df_rc = pd.DataFrame(reason_codes, columns=['reason_idx_' + str(NC-c) for c in range(NC)])
df_rc = df_rc[list(reversed(df_rc.columns))]
df_rc = pd.concat([df_cross_stats , df_rc] , axis=1)
for c in range(NC):
    reason = df_rc['reason_idx_' + str(c+1)].apply(lambda x : lFeatures[x])
    df_rc['reason_' + str(c+1)] = reason
    # detailed_reason = df_rc['reason_idx_' + str(c+1)].apply(lambda x : lFeatures[x] + "_bin")
    # df_rc['detailed_reason_' + str(c+1)] = df_rc[['reason_' + str(c+1) , ]]
    
df_rc.sample(6, random_state=1960)









    Out[19]:







  
    
      
      mean radius
      mean texture
      mean perimeter
      mean area
      TGT
      Score
      Score_bin
      Score_bin_labels
      mean radius_bin
      mean texture_bin
      ...
      mean perimeter_Effect
      mean area_Effect
      reason_idx_1
      reason_idx_2
      reason_idx_3
      reason_idx_4
      reason_1
      reason_2
      reason_3
      reason_4
    
  
  
    
      162
      13.21
      28.06
      84.88
      538.4
      1
      0.058333
      2
      (0.0333, 0.145]
      2
      4
      ...
      -0.000183
      3.469447e-18
      1
      3
      0
      2
      mean texture
      mean area
      mean radius
      mean perimeter
    
    
      137
      20.26
      23.03
      132.40
      1264.0
      0
      1.000000
      5
      (0.992, 1.0]
      4
      4
      ...
      0.000000
      0.000000e+00
      3
      2
      1
      0
      mean area
      mean perimeter
      mean texture
      mean radius
    
    
      304
      16.30
      15.70
      104.70
      819.8
      1
      0.175000
      3
      (0.145, 0.79]
      3
      1
      ...
      0.005934
      9.130438e-03
      3
      2
      1
      0
      mean area
      mean perimeter
      mean texture
      mean radius
    
    
      81
      13.05
      19.31
      82.61
      527.2
      1
      0.033333
      1
      (0.00833, 0.0333]
      2
      2
      ...
      -0.001220
      1.202913e-03
      0
      3
      1
      2
      mean radius
      mean area
      mean texture
      mean perimeter
    
    
      450
      13.20
      17.43
      84.13
      541.6
      1
      0.000000
      0
      (-0.001, 0.00833]
      2
      1
      ...
      0.001244
      1.909459e-03
      3
      2
      1
      0
      mean area
      mean perimeter
      mean texture
      mean radius
    
    
      184
      12.43
      17.00
      78.60
      477.3
      1
      0.000000
      0
      (-0.001, 0.00833]
      1
      1
      ...
      0.000194
      2.676812e-04
      3
      2
      1
      0
      mean area
      mean perimeter
      mean texture
      mean radius
    
  

6 rows × 25 columns



In [8]:

    
df_rc[['reason_' + str(NC-c) for c in range(NC)]].describe()









    Out[8]:







  
    
      
      reason_4
      reason_3
      reason_2
      reason_1
    
  
  
    
      count
      455
      455
      455
      455
    
    
      unique
      4
      4
      4
      4
    
    
      top
      mean radius
      mean texture
      mean perimeter
      mean area
    
    
      freq
      202
      198
      191
      216

Going Further

This was an introductory document with a simple linear classifier. Deeper analysis can be made to extend this study

Non-linear models (generalizing contributions, non-parameteric setting ?)
Other classifiers (SVMs, Decision Trees, Naive Bayes, MLP, Ensembles, etc)
Other predictors (categorical features, ordered, ...)
More risk scoring (https://kdd11pmml.files.wordpress.com/2011/09/p2_flint_guazzelli_kdd_20112.pdf)



In [ ]:

	mean radius	mean texture	mean perimeter	mean area	TGT
50	11.76	21.60	74.72	427.9	1
457	13.21	25.25	84.10	537.9	1
259	15.53	33.56	103.70	744.9	0
85	18.46	18.52	121.10	1075.0	0
348	11.47	16.03	73.02	402.7	1
462	14.40	26.99	92.25	646.1	1

	score_bin_0_model	score_bin_1_model	score_bin_2_model	score_bin_3_model	score_bin_4_model	score_bin_5_model
intercept	0.000112	0.015389	0.041015	0.700053	0.675309	1.0
mean radius_bin	-0.001893	0.002518	0.005307	-0.115488	-0.015272	0.0
mean texture_bin	0.000337	0.001535	0.002225	-0.000978	0.014017	0.0
mean perimeter_bin	0.001050	-0.002346	0.007146	0.011127	0.063761	0.0
mean area_bin	0.001642	0.002734	0.005307	0.014674	0.022713	0.0

	mean radius	mean texture	mean perimeter	mean area	TGT	Score	Score_bin	Score_bin_labels	mean radius_bin	mean texture_bin	mean perimeter_bin	mean area_bin	Score_interp	mean radius_Effect	mean texture_Effect	mean perimeter_Effect	mean area_Effect
162	13.21	28.06	84.88	538.4	1	0.058333	2	(0.0333, 0.145]	2	4	2	2	0.085435	0.000000	0.003994	-0.000183	3.469447e-18
137	20.26	23.03	132.40	1264.0	0	1.000000	5	(0.992, 1.0]	4	4	4	4	1.000000	0.000000	0.000000	0.000000	0.000000e+00
304	16.30	15.70	104.70	819.8	1	0.175000	3	(0.145, 0.79]	3	1	3	3	0.430013	-0.066727	0.001304	0.005934	9.130438e-03
81	13.05	19.31	82.61	527.2	1	0.033333	1	(0.00833, 0.0333]	2	2	2	2	0.024270	0.001209	0.000246	-0.001220	1.202913e-03
450	13.20	17.43	84.13	541.6	1	0.000000	0	(-0.001, 0.00833]	2	1	2	2	0.002047	-0.002202	-0.000083	0.001244	1.909459e-03
184	12.43	17.00	78.60	477.3	1	0.000000	0	(-0.001, 0.00833]	1	1	1	1	0.001248	-0.000309	-0.000083	0.000194	2.676812e-04

	reason_4	reason_3	reason_2	reason_1
count	455	455	455	455
unique	4	4	4	4
top	mean radius	mean texture	mean perimeter	mean area
freq	202	198	191	216