In [ ]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

AI Explanations: Explaining a tabular data model

View on GitHub

Overview

This tutorial shows how to train a Keras model on tabular data and deploy it to the AI Explanations service to get feature attributions on your deployed model.

If you've already got a trained model and want to deploy it to AI Explanations, skip to the Export the model as a TF 1 SavedModel section.

Dataset

The dataset used for this tutorial was created by combining two BigQuery Public Datasets: London Bikeshare data and NOAA weather data.

Objective

The goal is to train a model using the Keras Sequential API that predicts how long a bike trip took based on the trip start time, distance, day of week, and various weather data during that day.

This tutorial focuses more on deploying the model to AI Explanations than on the design of the model itself.

Costs

This tutorial uses billable components of Google Cloud Platform (GCP):

  • AI Platform for:
    • Prediction
    • Explanation: AI Explanations comes at no extra charge to prediction prices. However, explanation requests take longer to process than normal predictions, so heavy usage of AI Explanations along with auto-scaling may result in more nodes being started and thus more charges
  • Cloud Storage for:
    • Storing model files for deploying to Cloud AI Platform

Learn about AI Platform pricing and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select Runtime --> Change runtime type

This tutorial assumes you are running the notebook either in Colab or Cloud AI Platform Notebooks.

Set up your GCP project

The following steps are required, regardless of your notebook environment.

  1. Select or create a GCP project.

  2. Make sure that billing is enabled for your project.

  3. Enable the AI Platform Training & Prediction and Compute Engine APIs.

  4. Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

Note: Jupyter runs lines prefixed with ! as shell commands, and it interpolates Python variables prefixed with $ into these commands.


In [ ]:
PROJECT_ID = "<your-project-id>"

Authenticate your GCP account

If you are using AI Platform Notebooks, your environment is already authenticated. Skip this step.

If you are using Colab, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.


In [ ]:
import sys, os
import warnings
import googleapiclient

warnings.filterwarnings('ignore')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
# If you are running this notebook in Colab, follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

def install_dlvm_packages():
  !pip install tabulate

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()
  !pip install witwidget --quiet
  !pip install tensorflow==1.15.0 --quiet
  !gcloud config set project $PROJECT_ID

elif "DL_PATH" in os.environ:
  install_dlvm_packages()

Create a Cloud Storage bucket

The following steps are required, regardless of your notebook environment.

When you submit a training job using the Cloud SDK, you upload a Python package containing your training code to a Cloud Storage bucket. AI Platform runs the code from this package. In this tutorial, AI Platform also saves the trained model that results from your job in the same bucket. You can then create an AI Platform model version based on this output in order to serve online predictions.

Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the REGION variable, which is used for operations throughout the rest of this notebook. Make sure to choose a region where Cloud AI Platform services are available. You may not use a Multi-Regional Storage bucket for training with AI Platform.


In [ ]:
BUCKET_NAME = "<your-bucket-name>"
REGION = "us-central1"

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.


In [ ]:
!gsutil mb -l $REGION gs://$BUCKET_NAME

Import libraries

Import the libraries we'll be using in this tutorial. This tutorial has been tested with TensorFlow versions 1.14 and 1.15.


In [ ]:
import tensorflow as tf 
import pandas as pd
import numpy as np 
import json
import time

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from tabulate import tabulate

# Should be 1.15.0
print(tf.__version__)

Downloading and preprocessing data

In this section you'll download the data to train your model from a public GCS bucket. The original data is from the BigQuery datasets linked above. For your convenience, we've joined the London bike and NOAA weather tables, done some preprocessing, and provided a subset of that dataset here.


In [ ]:
# Copy the data to your notebook instance
!gsutil cp 'gs://explanations_sample_data/bike-data.csv' ./

Read the data with Pandas

We'll use Pandas to read the data into a DataFrame and then do some additional pre-processing.


In [ ]:
data = pd.read_csv('bike-data.csv')

# Shuffle the data
data = data.sample(frac=1, random_state=2)

# Drop rows with null values
data = data[data['wdsp'] != 999.9]
data = data[data['dewp'] != 9999.9]

# Rename some columns for readability
data=data.rename(columns = {'day_of_week':'weekday'})
data=data.rename(columns = {'max':'max_temp'})
data=data.rename(columns = {'dewp': 'dew_point'})

# Drop columns we won't use to train this model
data = data.drop(columns=['start_station_name', 'end_station_name', 'bike_id', 'snow_ice_pellets'])

# Convert trip duration from seconds to minutes so it's easier to understand
data['duration'] = data['duration'].apply(lambda x:float(x / 60))

In [ ]:
# Preview the first 5 rows
data.head()

In [ ]:
# Save duration to its own DataFrame and remove it from the original DataFrame
labels = data['duration']
data = data.drop(columns=['duration'])

Split data into train and test sets

We'll split our data into train and test sets using an 80 / 20 train / test split.


In [ ]:
# Use 80/20 train/test split
train_size = int(len(data) * .8)
print ("Train size: %d" % train_size)
print ("Test size: %d" % (len(data) - train_size))

# Split our data into train and test sets
train_data = data[:train_size]
train_labels = labels[:train_size]

test_data = data[train_size:]
test_labels = labels[train_size:]

Build, train, and evaluate our model with Keras

We'll use tf.keras to build a simple Sequential model that takes our 10 features as input and predicts trip duration in minutes (numerical value).


In [ ]:
# Build our model
model = tf.keras.Sequential(name="bike_predict")
model.add(tf.keras.layers.Dense(64, input_dim=len(train_data.iloc[0]), activation='relu'))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(1))

In [ ]:
# Compile the model and see a summary
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='mean_squared_logarithmic_error', optimizer=optimizer)
model.summary()

Create an input data pipeline with tf.data


In [ ]:
batch_size = 256
epochs = 3

input_train = tf.data.Dataset.from_tensor_slices(train_data)
output_train = tf.data.Dataset.from_tensor_slices(train_labels)
input_train = input_train.batch(batch_size).repeat()
output_train = output_train.batch(batch_size).repeat()
train_dataset = tf.data.Dataset.zip((input_train, output_train))

Train the model


In [ ]:
# This will take about a minute to run
# To keep training time short, we're not using the full dataset
model.fit(train_dataset, steps_per_epoch=train_size // batch_size, epochs=epochs)

Evaluate the trained model locally


In [ ]:
# Run evaluation
results = model.evaluate(test_data, test_labels)
print(results)

In [ ]:
# Send test instances to model for prediction
predict = model.predict(test_data[:5])

In [ ]:
# Preview predictions on the first 5 examples from our test dataset
for i, val in enumerate(predict):
  print('Predicted duration: {}'.format(round(val[0])))
  print('Actual duration: {} \n'.format(test_labels.iloc[i]))

Export the model as a TF 1 SavedModel

AI Explanations currently supports TensorFlow 1.x. In order to deploy our model in a format compatible with AI Explanations, we'll follow the steps below to convert our Keras model to a TF Estimator, and then use the export_saved_model method to generate the SavedModel and save it in GCS.


In [ ]:
## Convert our Keras model to an estimator
keras_estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir='savedmodel_export')

In [ ]:
# We need this serving input function to export our model in the next cell
serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
    {'dense_input': model.input}
)

In [ ]:
export_path = keras_estimator.export_saved_model(
  'gs://' + BUCKET_NAME + '/explanations',
  serving_input_receiver_fn=serving_fn
).decode('utf-8')
print(export_path)

Use TensorFlow's saved_model_cli to inspect the model's SignatureDef. We'll use this information when we deploy our model to AI Explanations in the next section.


In [ ]:
!saved_model_cli show --dir $export_path --all

Deploy the model to AI Explanations

In order to deploy the model to Explanations, we need to generate an explanations_metadata.json file and upload this to the Cloud Storage bucket with our SavedModel. Then we'll deploy the model using gcloud.

Prepare explanation metadata

We need to tell AI Explanations the names of the input and output tensors our model is expecting, which we print below.

The value for input_baselines tells the explanations service what the baseline input should be for our model. Here we're using the median for all of our input features. That means the baseline prediction for this model will be the trip duration our model predicts for the median of each feature in our dataset.

Since this model accepts a single numpy array with all numerical feature, we can optionally pass an index_feature_mapping list to AI Explanations to make the API response easier to parse. When we provide a list of feature names via this parameter, the service will return a key / value mapping of each feature with its corresponding attribution value.


In [ ]:
# Print the names of our tensors
print('Model input tensor: ', model.input.name)
print('Model output tensor: ', model.output.name)

In [ ]:
explanation_metadata = {
    "inputs": {
      "data": {
        "input_tensor_name": model.input.name,
        "input_baselines": [train_data.median().values.tolist()],
        "encoding": "bag_of_features", 
        "index_feature_mapping": train_data.columns.tolist()
      }
    },
    "outputs": {
      "duration": {
        "output_tensor_name": model.output.name
      }
    },
  "framework": "tensorflow"
  }

Since this is a regression model (predicting a numerical value), the baseline prediction will be the same for every example we send to the model. If this were instead a classification model, each class would have a different baseline prediction.


In [ ]:
# Write the json to a local file
with open('explanation_metadata.json', 'w') as output_file:
  json.dump(explanation_metadata, output_file)

In [ ]:
!gsutil cp explanation_metadata.json $export_path

Create the model


In [ ]:
MODEL = 'bike'

In [ ]:
# Create the model if it doesn't exist yet (you only need to run this once)
!gcloud ai-platform models create $MODEL --enable-logging --regions=us-central1

Create the model version

Creating the version will take ~5-10 minutes. Note that your first deploy may take longer.


In [ ]:
# Each time you create a version the name should be unique
VERSION = 'v1'

In [ ]:
# Create the version with gcloud
explain_method = 'integrated-gradients'
!gcloud beta ai-platform versions create $VERSION \
--model $MODEL \
--origin $export_path \
--runtime-version 1.15 \
--framework TENSORFLOW \
--python-version 3.7 \
--machine-type n1-standard-4 \
--explanation-method $explain_method \
--num-integral-steps 25

In [ ]:
# Make sure the model deployed correctly. State should be `READY` in the following log
!gcloud ai-platform versions describe $VERSION --model $MODEL

Getting predictions and explanations on deployed model

Now that your model is deployed, you can use the AI Platform Prediction API to get feature attributions. We'll pass it a single test example here and see which features were most important in the model's prediction. Here we'll use the AI Platform Prediction API to get our prediction and explanation. You can also use gcloud.

Format our explanation request

To make our AI Explanations request, we need to create a JSON object with our test data for prediction.


In [ ]:
# Format data for prediction to our model
prediction_json = {'dense_input': test_data.iloc[0].values.tolist()}

Making the explain request

The following predict_json function will make an explain request to the AI Platform Prediction API.


In [ ]:
# This is adapted from a sample in the docs
# Find it here: https://cloud.google.com/ai-platform/prediction/docs/online-predict#python

def predict_json(project, model, instances, version=None):
    """Send json data to a deployed model for prediction.

    Args:
        project (str): project where the AI Platform Model is deployed.
        model (str): model name.
        instances ([Mapping[str: Any]]): Keys should be the names of Tensors
            your deployed model expects as inputs. Values should be datatypes
            convertible to Tensors, or (potentially nested) lists of datatypes
            convertible to tensors.
        version: str, version of the model to target.
    Returns:
        Mapping[str: any]: dictionary of prediction results defined by the
            model.
    """

    service = googleapiclient.discovery.build('ml', 'v1')
    name = 'projects/{}/models/{}'.format(project, model)

    if version is not None:
        name += '/versions/{}'.format(version)

    response = service.projects().explain(
        name=name,
        body={'instances': instances}
    ).execute()

    if 'error' in response:
        raise RuntimeError(response['error'])

    return response

In [ ]:
response = predict_json(PROJECT_ID, MODEL, prediction_json, VERSION)
print(response)

Understanding the explanations response

First, let's look at the trip duration our model predicted and compare it to the actual value


In [ ]:
explanations = response['explanations'][0]['attributions_by_label'][0]

predicted = round(explanations['example_score'], 2)
print('Predicted duration: ' + str(predicted) + ' minutes')
print('Actual duration: ' + str(test_labels.iloc[0]) + ' minutes')

Next let's look at the feature attributions for this particular example. Positive attribution values mean a particular feature pushed our model prediction up by that amount, and vice versa for negative attribution values.


In [ ]:
feature_names = test_data.columns.tolist()
attributions = explanations['attributions']
rows = []
for i,val in enumerate(feature_names):
  rows.append([val, test_data.iloc[1].tolist()[i], attributions[val][0]])
print(tabulate(rows,headers=['Feature name', 'Feature value', 'Attribution value']))

Sanity check our explanations

To better make sense of the feature attributions we're getting, we should compare them with our model's baseline. In most cases, the sum of your attribution values + the baseline should be very close to your model's predicted value for each input. Also note that for regression models, the baseline_score returned from AI Explanations will be the same for each example sent to your model. For classification models, each class will have its own baseline.

In this section we'll send 10 test examples to our model for prediction in order to compare the feature attributions with the baseline. Then we'll run each test example's attributions through two sanity checks in the sanity_check_explanations method.


In [ ]:
# Prepare 10 test examples to our model for prediction
pred_batch = []
for i in range(10):
  pred_batch.append({'dense_input': test_data.iloc[i].values.tolist()})

In [ ]:
# Make the request using the method we defined above
batch_explain = predict_json(PROJECT_ID, MODEL, pred_batch, VERSION)

In the function below we perform two sanity checks for models using Integrated Gradient (IG) explanations and one sanity check for models using Sampled Shapley.


In [ ]:
def sanity_check_explanations(example, mean_tgt_value=None, variance_tgt_value=None):
  passed_test = 0
  total_test = 1
  # `attributions` is a dict where keys are the feature names
  # and values are the feature attributions for each feature
  attribution_vals = [x[0] for x in example['attributions_by_label'][0]['attributions'].values()]
  baseline_score = example['attributions_by_label'][0]['baseline_score']
  sum_with_baseline = np.sum(attribution_vals) + baseline_score
  predicted_val = example['attributions_by_label'][0]['example_score']
  # Sanity check 1       
  # The prediction at the input is equal to that at the baseline.
  #  Please use a different baseline. Some suggestions are: random input, training
  #  set mean.
  if abs(predicted_val - baseline_score) <= 0.05:
    print('Warning: example score and baseline score are too close.')
    print('You might not get attributions.')
  else:
    passed_test += 1
 
  # Sanity check 2 (only for models using Integrated Gradient explanations)
  # Ideally, the sum of the integrated gradients must be equal to the difference
  # in the prediction probability at the input and baseline. Any discrepency in
  # these two values is due to the errors in approximating the integral.
  if explain_method == 'integrated-gradients':
    total_test += 1
    want_integral = predicted_val - baseline_score
    got_integral = sum(attribution_vals)
    if abs(want_integral-got_integral)/abs(want_integral) > 0.05:  
        print('Warning: Integral approximation error exceeds 5%.') 
        print('Please try increasing the number of integrated gradient steps.')
    else:
        passed_test += 1
 
  print(passed_test, ' out of ', total_test, ' sanity checks passed.')

In [ ]:
for i in batch_explain['explanations']:
  sanity_check_explanations(i)


2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.
2  out of  2  sanity checks passed.

Understanding AI Explanations with the What-If Tool

In this section we'll use the What-If Tool to better understand how our model is making predictions. See the cell below the What-if Tool for visualization ideas.

The What-If-Tool expects data with keys for each feature name, but our model expects a flat list. The functions below convert data to the format required by the What-If Tool.


In [ ]:
# This is the number of data points we'll send to the What-if Tool
WHAT_IF_TOOL_SIZE = 500

from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

def create_list(ex_dict):
  new_list = []
  for i in feature_names:
    new_list.append(ex_dict[i])
  return new_list

def example_dict_to_input(example_dict):
  return { 'dense_input': create_list(example_dict) }

from collections import OrderedDict
wit_data = test_data.iloc[:WHAT_IF_TOOL_SIZE].copy()
wit_data['duration'] = test_labels[:WHAT_IF_TOOL_SIZE]
wit_data_dict = wit_data.to_dict(orient='records', into=OrderedDict)

In [ ]:
config_builder = WitConfigBuilder(
    wit_data_dict
  ).set_ai_platform_model(
      PROJECT_ID,
      MODEL,
      VERSION,
      adjust_example=example_dict_to_input
  ).set_target_feature('duration').set_model_type('regression')
WitWidget(config_builder)

What-If Tool visualization ideas

On the x-axis, you'll see the predicted trip duration for the test inputs you passed to the What-If Tool. Each circle represents one of your test examples. If you click on a circle, you'll be able to see the feature values for that example along with the attribution values for each feature.

  • You can edit individual feature values and re-run prediction directly within the What-If Tool. Try changing distance, click Run inference and see how that affects the model's prediction
  • You can sort features for an individual example by their attribution value, try changing the sort from the attributions dropdown
  • The What-If Tool also lets you create custom visualizations. You can do this by changing the values in the dropdown menus above the scatter plot visualization. For example, you can sort data points by inference error, or by their similarity to a single datapoint.

Cleaning up

To clean up all GCP resources used in this project, you can delete the GCP project you used for the tutorial.

Alternatively, you can clean up individual resources by running the following commands:


In [ ]:
# Delete model version resource
!gcloud ai-platform versions delete $VERSION --quiet --model $MODEL

# Delete model resource
!gcloud ai-platform models delete $MODEL --quiet

# Delete Cloud Storage objects that were created
!gsutil -m rm -r $BUCKET_NAME

If your Cloud Storage bucket doesn't contain any other objects and you would like to delete it, run gsutil rm -r gs://$BUCKET_NAME.