Comparing Keras and Scikit models deployed on Cloud AI Platform with the What-if Tool

In this notebook we'll use the UCI wine quality dataset to train both tf.keras and Scikit learn regression models that will predict the quality rating of a wine given 11 numerical data points about the wine. You'll learn how to:

  • Build, train, and then deploy tf.keras and Scikit Learn models to Cloud AI Platform
  • Use the What-if Tool to compare two different models deployed on CAIP

You will need a Google Cloud Platform account and project to run this notebook. Instructions for creating a project can be found here.

Installing dependencies


In [ ]:
import sys
python_version = sys.version_info[0]

In [0]:
# If you're running on Colab, you'll need to install the What-if Tool package and authenticate
def pip_install(module):
    if python_version == '2':
        !pip install {module} --quiet
    else:
        !pip3 install {module} --quiet

try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    pip_install('witwidget')

    from google.colab import auth
    auth.authenticate_user()

In [0]:
import pandas as pd
import numpy as np
import tensorflow as tf
import witwidget
import os
import pickle

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

from sklearn.utils import shuffle
from sklearn.linear_model import LinearRegression
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

# This has been tested on TF 1.14
print(tf.__version__)

Download and process data

In this section we'll:

  • Download the wine quality data directly from UCI Machine Learning
  • Read it into a Pandas dataframe and preview it
  • Split the data and labels into train and test sets

In [0]:
!wget 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv'

In [0]:
data = pd.read_csv('winequality-white.csv', index_col=False, delimiter=';')
data = shuffle(data, random_state=4)

In [0]:
data.head()

In [0]:
labels = data['quality']

In [0]:
print(labels.value_counts())

In [0]:
data = data.drop(columns=['quality'])

In [0]:
train_size = int(len(data) * 0.8)
train_data = data[:train_size]
train_labels = labels[:train_size]

test_data = data[train_size:]
test_labels = labels[train_size:]

In [0]:
train_data.head()

Train tf.keras model

In this section we'll:

  • Build a regression model using tf.keras to predict a wine's quality score
  • Train the model
  • Add a layer to the model to prepare it for serving

In [0]:
# This is the size of the array we'll be feeding into our model for each wine example
input_size = len(train_data.iloc[0])
print(input_size)

In [0]:
model = Sequential()
model.add(Dense(200, input_shape=(input_size,), activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

In [0]:
model.summary()

In [0]:
model.fit(train_data.values,train_labels.values, epochs=4, batch_size=32, validation_split=0.1)

Deploy keras model to Cloud AI Platform

In this section we'll:

  • Set up some global variables for our GCP project
  • Add a serving layer to our model so we can deploy it on Cloud AI Platform
  • Run the deploy command to deploy our model
  • Generate a test prediction on our deployed model

In [0]:
# Update these to your own GCP project + model names
GCP_PROJECT = 'your_gcp_project'
KERAS_MODEL_BUCKET = 'gs://your_storage_bucket'
KERAS_VERSION_NAME = 'v1'

In [0]:
# Add the serving input layer below in order to serve our model on AI Platform
class ServingInput(tf.keras.layers.Layer):
  # the important detail in this boilerplate code is "trainable=False"
  def __init__(self, name, dtype, batch_input_shape=None):
    super(ServingInput, self).__init__(trainable=False, name=name, dtype=dtype, batch_input_shape=batch_input_shape)
  def get_config(self):
    return {'batch_input_shape': self._batch_input_shape, 'dtype': self.dtype, 'name': self.name }

restored_model = model

serving_model = tf.keras.Sequential()
serving_model.add(ServingInput('serving', tf.float32, (None, input_size)))
serving_model.add(restored_model)
tf.contrib.saved_model.save_keras_model(serving_model, os.path.join(KERAS_MODEL_BUCKET, 'keras_export'))  # export the model to your GCS bucket
export_path = KERAS_MODEL_BUCKET + '/keras_export'

In [0]:
# Configure gcloud to use your project
!gcloud config set project $GCP_PROJECT

In [0]:
# Create a new model in our project, you only need to run this once
!gcloud ai-platform models create keras_wine

In [0]:
# Deploy the model to Cloud AI Platform
!gcloud beta ai-platform versions create $KERAS_VERSION_NAME --model keras_wine \
--origin=$export_path \
--python-version=3.5 \
--runtime-version=1.14 \
--framework='TENSORFLOW'

In [0]:
%%writefile predictions.json
[7.8, 0.21, 0.49, 1.2, 0.036, 20.0, 99.0, 0.99, 3.05, 0.28, 12.1]

In [0]:
# Test the deployed model on an example from our test set
# The correct score for this prediction is 7
prediction = !gcloud ai-platform predict --model=keras_wine --json-instances=predictions.json --version=$KERAS_VERSION_NAME
print(prediction[1])

Build and train Scikit learn model

In this section we'll:

  • Train a regression model using Scikit Learn
  • Save the model to a local file using pickle

In [0]:
SKLEARN_VERSION_NAME = 'v1'
SKLEARN_MODEL_BUCKET = 'gs://sklearn_model_bucket'

In [0]:
scikit_model = LinearRegression().fit(train_data.values, train_labels.values)

In [0]:
# Export the model to a local file using pickle
pickle.dump(scikit_model, open('model.pkl', 'wb'))

Deploy Scikit model to CAIP

In this section we'll:

  • Copy our saved model file to Cloud Storage
  • Run the gcloud command to deploy our model
  • Generate a prediction on our deployed model

In [0]:
# Copy the saved model to Cloud Storage
!gsutil cp ./model.pkl gs://wine_sklearn/model.pkl

In [0]:
# Create a new model in our project, you only need to run this once
!gcloud ai-platform models create sklearn_wine

In [0]:
!gcloud beta ai-platform versions create $SKLEARN_VERSION_NAME --model=sklearn_wine \
--origin=$SKLEARN_MODEL_BUCKET \
--runtime-version=1.14 \
--python-version=3.5 \
--framework='SCIKIT_LEARN'

In [0]:
# Test the model usnig the same example instance from above
!gcloud ai-platform predict --model=sklearn_wine --json-instances=predictions.json --version=$SKLEARN_VERSION_NAME

Compare tf.keras and Scikit models with the What-if Tool

Now we're ready for the What-if Tool! In this section we'll:

  • Create an array of our test examples with their ground truth values. The What-if Tool works best when we send the actual values for each example input.
  • Instantiate the What-if Tool using the set_compare_ai_platform_model method. This lets us compare 2 models deployed on Cloud AI Platform.

In [0]:
# Create np array of test examples + their ground truth labels
test_examples = np.hstack((test_data[:200].values,test_labels[:200].values.reshape(-1,1)))
print(test_examples.shape)

In [0]:
# Create a What-if Tool visualization, it may take a minute to load
# See the cell below this for exploration ideas

# We use `set_predict_output_tensor` here becuase our tf.keras model returns a dict with a 'sequential' key

config_builder = (WitConfigBuilder(test_examples.tolist(), data.columns.tolist() + ['quality'])
  .set_ai_platform_model(GCP_PROJECT, 'keras_wine', KERAS_VERSION_NAME).set_predict_output_tensor('sequential').set_uses_predict_api(True)
  .set_target_feature('quality')
  .set_model_type('regression')
  .set_compare_ai_platform_model(GCP_PROJECT, 'sklearn_wine', SKLEARN_VERSION_NAME))
WitWidget(config_builder, height=800)

What-if Tool Exploration ideas

  • Look at the scatter plot of "Inference value scikit_wine" vs "Inference value keras_wine"

    • Examples off of the diagonal represent wines for which the two models have large disagreement on the quality score. Click on some of these and explore the features.
    • You can also click on individual examples, change some of the feature values for that example, and compare the impact of that change on the model's prediction
    • Check out the partial dependence plots to see what features are causes the large skew between the two models.
  • Go to the Performance tab and see the overall performance of each model. Is one more accurate over the test data than the other?

    • In this tab, use the "Slice by" dropdown to slice the data into subgroups and see how both models perform across those subgroups. Try slicing by alcohol. Which model has more consistent performance across the slices?