Model understanding and interpretability

In this colab, we will

  • Will learn how to interpret model results and reason about the features
  • Visualize the model results

Please complete the exercises and answer the questions tagged ???.


In [ ]:
import time

# We will use some np and pandas for dealing with input data.
import numpy as np
import pandas as pd
# And of course, we need tensorflow.
import tensorflow as tf

from matplotlib import pyplot as plt
from IPython.display import clear_output

In [ ]:
tf.__version__

Below we demonstrate both local and global model interpretability for gradient boosted trees.

Local interpretability refers to an understanding of a model’s predictions at the individual example level, while global interpretability refers to an understanding of the model as a whole.

For local interpretability, we show how to create and visualize per-instance contributions using the technique outlined in Palczewska et al and by Saabas in Interpreting Random Forests (this method is also available in scikit-learn for Random Forests in the treeinterpreter package). To distinguish this from feature importances, we refer to these values as directional feature contributions (DFCs).

For global interpretability we show how to retrieve and visualize gain-based feature importances, permutation feature importances and also show aggregated DFCs.

Setup

Load dataset

We will be using the titanic dataset, where the goal is to predict passenger survival given characteristiscs such as gender, age, class, etc.


In [ ]:
tf.logging.set_verbosity(tf.logging.ERROR)
tf.set_random_seed(123)

# Load dataset.
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')

# Feature columns.
fcol = tf.feature_column
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

def one_hot_cat_column(feature_name, vocab):
  return fcol.indicator_column(
      fcol.categorical_column_with_vocabulary_list(feature_name,
                                                 vocab))
fc = []
for feature_name in CATEGORICAL_COLUMNS:
  # Need to one-hot encode categorical features.
  vocabulary = dftrain[feature_name].unique()
  fc.append(one_hot_cat_column(feature_name, vocabulary))

for feature_name in NUMERIC_COLUMNS:
  fc.append(fcol.numeric_column(feature_name,
                                dtype=tf.float32))

# Input functions.
def make_input_fn(X, y, n_epochs=None):
  def input_fn():
    dataset = tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'), y))
    # For training, cycle thru dataset as many times as need (n_epochs=None).
    dataset = (dataset
      .repeat(n_epochs)
      .batch(len(y)))  # Use entire dataset since this is such a small dataset.
    return dataset
  return input_fn

# Training and evaluation input functions.
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, n_epochs=1)

Interpret model

Local interpretability

Output directional feature contributions (DFCs) to explain individual predictions, using the approach outlined in Palczewska et al and by Saabas in Interpreting Random Forests. The DFCs are generated with:

pred_dicts = list(est.experimental_predict_with_explanations(pred_input_fn))


In [ ]:
params = {
  'n_trees': 50,
  'max_depth': 3,
  'n_batches_per_layer': 1,
  # You must enable center_bias = True to get DFCs. This will force the model to
  # make an initial prediction before using any features (e.g. use the mean of
  # the training labels for regression or log odds for classification when
  # using cross entropy loss).
  'center_bias': True
}

est = tf.estimator.BoostedTreesClassifier(fc, **params)
# Train model.
est.train(train_input_fn)

# Evaluation.
results = est.evaluate(eval_input_fn)
clear_output()
pd.Series(results).to_frame()

Local interpretability

Next you will output the directional feature contributions (DFCs) to explain individual predictions using the approach outlined in Palczewska et al and by Saabas in Interpreting Random Forests (this method is also available in scikit-learn for Random Forests in the treeinterpreter package). The DFCs are generated with:

pred_dicts = list(est.experimental_predict_with_explanations(pred_input_fn))

(Note: The method is named experimental as we may modify the API before dropping the experimental prefix.)


In [ ]:
import matplotlib.pyplot as plt
import seaborn as sns
sns_colors = sns.color_palette('colorblind')

In [ ]:
pred_dicts = list(est.experimental_predict_with_explanations(eval_input_fn))

In [ ]:
def clean_feature_names(df):
    """Boilerplate code to cleans up feature names -- this is unneed in TF 2.0"""
    df.columns = [v.split(':')[0].split('_indi')[0] for v in df.columns.tolist()]
    df = df.T.groupby(level=0).sum().T
    return df

# Create DFC Pandas dataframe.
labels = y_eval.values
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
df_dfc = pd.DataFrame([pred['dfc'] for pred in pred_dicts])
df_dfc.columns = est._names_for_feature_id
df_dfc = clean_feature_names(df_dfc)
df_dfc.describe()

In [ ]:
# Sum of DFCs + bias == probabality.
bias = pred_dicts[0]['bias']
dfc_prob = df_dfc.sum(axis=1) + bias
np.testing.assert_almost_equal(dfc_prob.values,
                               probs.values)

Plot results

Exercise: Plot figures for multiple examples. How would you explain each plot in plain english?


In [ ]:
import seaborn as sns  # Make plotting nicer.
sns_colors = sns.color_palette('colorblind')

def plot_dfcs(example_id):
    label, prob = labels[ID], probs[ID]
    example = df_dfc.iloc[ID]  # Choose ith example from evaluation set.
    TOP_N = 8  # View top 8 features.
    sorted_ix = example.abs().sort_values()[-TOP_N:].index
    ax = example[sorted_ix].plot(kind='barh', color='g', figsize=(10,5))
    ax.grid(False, axis='y')

    plt.title('Feature contributions for example {}\n pred: {:1.2f}; label: {}'.format(ID, prob, label))
    plt.xlabel('Contribution to predicted probability')

ID = 102  # Change this.
plot_dfcs(ID)

Prettier plotting

Color codes based on directionality and adds feature values on figure. Please do not worry about the details of the plotting code :)


In [ ]:
def plot_example_pretty(example):
    """Boilerplate code for better plotting :)"""
    
    def _get_color(value):
        """To make positive DFCs plot green, negative DFCs plot red."""
        green, red = sns.color_palette()[2:4]
        if value >= 0: return green
        return red


    def _add_feature_values(feature_values, ax):
        """Display feature's values on left of plot."""
        x_coord = ax.get_xlim()[0]
        OFFSET = 0.15
        for y_coord, (feat_name, feat_val) in enumerate(feature_values.items()):
            t = plt.text(x_coord, y_coord - OFFSET, '{}'.format(feat_val), size=12)
            t.set_bbox(dict(facecolor='white', alpha=0.5))
        from matplotlib.font_manager import FontProperties
        font = FontProperties()
        font.set_weight('bold')
        t = plt.text(x_coord, y_coord + 1 - OFFSET, 'feature\nvalue',
        fontproperties=font, size=12)


    TOP_N = 8 # View top 8 features.
    sorted_ix = example.abs().sort_values()[-TOP_N:].index  # Sort by magnitude.
    example = example[sorted_ix]
    colors = example.map(_get_color).tolist()
    ax = example.to_frame().plot(kind='barh',
                          color=[colors],
                          legend=None,
                          alpha=0.75,
                          figsize=(10,6))
    ax.grid(False, axis='y')
    ax.set_yticklabels(ax.get_yticklabels(), size=14)
    _add_feature_values(dfeval.iloc[ID].loc[sorted_ix], ax)
    ax.set_title('Feature contributions for example {}\n pred: {:1.2f}; label: {}'.format(ID, probs[ID], labels[ID]))
    ax.set_xlabel('Contribution to predicted probability', size=14)
    plt.show()
    return ax

In [ ]:
# Plot results.
ID = 102
example = df_dfc.iloc[ID]  # Choose ith example from evaluation set.
ax = plot_example_pretty(example)

Global feature importances

  1. Gain-based feature importances using est.experimental_feature_importances
  2. Aggregate DFCs using est.experimental_predict_with_explanations
  3. Permutation importances

Gain-based feature importances measure the loss change when splitting on a particular feature, while permutation feature importances are computed by evaluating model performance on the evaluation set by shuffling each feature one-by-one and attributing the change in model performance to the shuffled feature.

In general, permutation feature importance are preferred to gain-based feature importance, though both methods can be unreliable in situations where potential predictor variables vary in their scale of measurement or their number of categories and when features are correlated (source). Check out this article for an in-depth overview and great discussion on different feature importance types.

1. Gain-based feature importances


In [ ]:
features, importances = est.experimental_feature_importances(normalize=True)
df_imp = pd.DataFrame(importances, columns=['importances'], index=features)
# For plotting purposes. This is not needed in TF 2.0.
df_imp = clean_feature_names(df_imp.T).T.sort_values('importances', ascending=False)

# Visualize importances.
N = 8
ax = df_imp.iloc[0:N][::-1]\
    .plot(kind='barh',
          color=sns_colors[0],
          title='Gain feature importances',
          figsize=(10, 6))
ax.grid(False, axis='y')

plt.tight_layout()

??? What does the x axis represent?

??? Can we completely trust these results and the magnitudes?

2. Average absolute DFCs

We can also average the absolute values of DFCs to understand impact at a global level.


In [ ]:
# Plot.
dfc_mean = df_dfc.abs().mean()
sorted_ix = dfc_mean.abs().sort_values()[-8:].index  # Average and sort by absolute.
ax = dfc_mean[sorted_ix].plot(kind='barh',
                       color=sns_colors[1],
                       title='Mean |directional feature contributions|',
                       figsize=(10, 6))
ax.grid(False, axis='y')

We can also see how DFCs vary as a feature value varies.


In [ ]:
age = pd.Series(df_dfc.age.values, index=dfeval.age.values).sort_index()
sns.jointplot(age.index.values, age.values);

Visualizing the model's prediction surface

Lets first simulate/create training data using the following formula:

$z=x* e^{-x^2 - y^2}$

Where $z$ is the dependent variable we are trying to predict and $x$ and $y$ are the features.


In [ ]:
from numpy.random import uniform, seed
from matplotlib.mlab import griddata

# Create fake data
seed(0)
npts = 5000
x = uniform(-2, 2, npts)
y = uniform(-2, 2, npts)
z = x*np.exp(-x**2 - y**2)

In [ ]:
# Prep data for training.
df = pd.DataFrame({'x': x, 'y': y, 'z': z})

xi = np.linspace(-2.0, 2.0, 200),
yi = np.linspace(-2.1, 2.1, 210),
xi,yi = np.meshgrid(xi, yi)

df_predict = pd.DataFrame({
    'x' : xi.flatten(),
    'y' : yi.flatten(),
})
predict_shape = xi.shape

In [ ]:
def plot_contour(x, y, z, **kwargs):
  # Grid the data.
  plt.figure(figsize=(10, 8))
  # Contour the gridded data, plotting dots at the nonuniform data points.
  CS = plt.contour(x, y, z, 15, linewidths=0.5, colors='k')
  CS = plt.contourf(x, y, z, 15,
                    vmax=abs(zi).max(), vmin=-abs(zi).max(), cmap='RdBu_r')
  plt.colorbar()  # Draw colorbar.
  # Plot data points.
  plt.xlim(-2, 2)
  plt.ylim(-2, 2)

We can visualize our function:


In [ ]:
zi = griddata(x, y, z, xi, yi, interp='linear')
plot_contour(xi, yi, zi)
plt.scatter(df.x, df.y, marker='.')
plt.title('Contour on training data')
plt.show()

In [ ]:
def predict(est):
  """Predictions from a given estimator."""
  predict_input_fn = lambda: tf.data.Dataset.from_tensors(dict(df_predict))
  preds = np.array([p['predictions'][0] for p in est.predict(predict_input_fn)])
  return preds.reshape(predict_shape)

First let's try to fit a linear model to the data.


In [ ]:
fc = [tf.feature_column.numeric_column('x'),
      tf.feature_column.numeric_column('y')]

train_input_fn = make_input_fn(df, df.z)
est = tf.estimator.LinearRegressor(fc)
est.train(train_input_fn, max_steps=500);

In [ ]:
plot_contour(xi, yi, predict(est))

Not very good at all...

??? Why is the linear model not performing well for this problem? Can you think of how to improve it just using a linear model?

Next let's try to fit a GBDT model to it and try to understand what the model does


In [ ]:
for n_trees in [1,2,3,10,30,50,100,200]:
  est = tf.estimator.BoostedTreesRegressor(fc,
                                          n_batches_per_layer=1,
                                          max_depth=4,
                                          n_trees=n_trees)
  est.train(train_input_fn)
  plot_contour(xi, yi, predict(est))
  plt.text(-1.8, 2.1, '# trees: {}'.format(n_trees), color='w', backgroundcolor='black', size=20)

A lot better. The model is learning to approximate the true loss surface. But I think we can do better.

??? Exercise Can you get a better fit?

??? BONUS: If you have time, try making your own functions to learn and visualize them above.

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License