Cloud AI Platform + What-if Tool: Playground XGBoost Example

This notebook shows how to use the What-if Tool on a deployed Cloud AI Platform model. You don't need your own cloud project to run this notebook.

For instructions on creating a Cloud project, see the documentation here.


In [ ]:
import sys
python_version = sys.version_info[0]

In [0]:
# If you're running on Colab, you'll need to install the What-if Tool package and authenticate on the TF instance
def pip_install(module):
    if python_version == '2':
        !pip install {module} --quiet
    else:
        !pip3 install {module} --quiet

try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    pip_install('witwidget')

    from google.colab import auth
    auth.authenticate_user()

In [0]:
import pandas as pd
import numpy as np
import witwidget

from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

Loading the test dataset

The model we'll be exploring here is a binary classification model built with XGBoost and trained on a mortgage dataset. It predicts whether or not a mortgage application will be approved. In this section we'll:

  • Download some test data from Cloud Storage and load it into a numpy array + Pandas DataFrame
  • Preview the features for our model in Pandas

In [0]:
# Download our Pandas dataframe and our test features and labels
!gsutil cp gs://mortgage_dataset_files/data.pkl .
!gsutil cp gs://mortgage_dataset_files/x_test.npy .
!gsutil cp gs://mortgage_dataset_files/y_test.npy .

In [0]:
# Preview the features from our model as a pandas DataFrame
features = pd.read_pickle('data.pkl')
features.head()

In [0]:
# Load the test features and labels into numpy ararys
x_test = np.load('x_test.npy')
y_test = np.load('y_test.npy')

In [0]:
# Combine the features and labels into one array for the What-if Tool
test_examples = np.hstack((x_test,y_test.reshape(-1,1)))

Using the What-if Tool to interpret our model

With our test examples ready, we can now connect our model to the What-if Tool using the WitWidget. To use the What-if Tool with Cloud AI Platform, we need to send it:

  • A Python list of our test features + ground truth labels
  • Optionally, the names of our columns
  • Our Cloud project, model, and version name (we've created a public one for you to play around with)

See the next cell for some exploration ideas in the What-if Tool.


In [0]:
# Create a What-if Tool visualization, it may take a minute to load
# See the cell below this for exploration ideas

# This prediction adjustment function is needed as this xgboost model's
# prediction returns just a score for the positive class of the binary
# classification, whereas the What-If Tool expects a list of scores for each
# class (in this case, both the negative class and the positive class).

def adjust_prediction(pred):
  return [1 - pred, pred]

config_builder = (WitConfigBuilder(test_examples.tolist(), features.columns.tolist() + ['mortgage_status'])
  .set_ai_platform_model('wit-caip-demos', 'xgb_mortgage', 'v1', adjust_prediction=adjust_prediction)
  .set_target_feature('mortgage_status')
  .set_label_vocab(['denied', 'approved']))
WitWidget(config_builder, height=800)

What-if Tool exploration ideas

  • Individual data points: the default graph shows all data points from the test set, colored by their ground truth label (approved or denied)

    • Try selecting data points close to the middle and tweaking some of their feature values. Then run inference again to see if the model prediction changes
    • Select a data point and then move the "Show nearest counterfactual datapoint" slider to the right. This will highlight a data point with feature values closest to your original one, but with a different prediction
  • Binning data: create separate graphs for individual features

    • From the "Binning - X axis" dropdown, try selecting one of the agency codes, for example "Department of Housing and Urban Development (HUD)". This will create 2 separate graphs, one for loan applications from the HUD (graph labeled 1), and one for all other agencies (graph labeled 0). This shows us that loans from this agency are more likely to be denied
  • Exploring overall performance: Click on the "Performance & Fairness" tab to view overall performance statistics on the model's results on the provided dataset, including confusion matrices, PR curves, and ROC curves.

    • Experiment with the threshold slider, raising and lowering the positive classification score the model needs to return before it decides to predict "approved" for the loan, and see how it changes accuracy, false positives, and false negatives.
    • On the left side "Slice by" menu, select "loan_purpose_Home purchase". You'll now see performance on the two subsets of your data: the "0" slice shows when the loan is not for a home purchase, and the "1" slice is for when the loan is for a home purchase. Notice that the model's false positive rate is much higher on loans for home purchases. If you expand the rows to look at the confusion matrices, you can see that the model predicts "approved" more often for home purchase loans.
    • You can use the optimization buttons on the left side to have the tool auto-select different positive classification thresholds for each slice in order to achieve different goals. If you select the "Demographic parity" button, then the two thresholds will be adjusted so that the model predicts "approved" for a similar percentage of applicants in both slices. What does this do to the accuracy, false positives and false negatives for each slice?