By deploying or using this software you agree to comply with the AI Hub Terms of Service and the Google APIs Terms of Service. To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.

Overview

This notebook provides an example workflow of using the Distributed XGBoost ML container for training a regression ML model.

Dataset

The notebook uses the Boston housing price regression dataset. It containers 506 observations with 13 features describing a house in Boston and a corresponding house price, stored in a 506x14 table.

Objective

The goal of this notebook is to go through a common training workflow:

  • Create a dataset
  • Train an ML model using the AI Platform Training service
  • Identify if the model was trained successfully by looking at the generated "Run Report"
  • Deploy the model for serving using the AI Platform Prediction service
  • Use the endpoint for online predictions
  • Interactively inspect the deployed ML model with the What-If Tool

Costs

This tutorial uses billable components of Google Cloud Platform (GCP):

  • Cloud AI Platform
  • Cloud Storage

Learn about Cloud AI Platform pricing and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Set up your local development environment

If you are using Colab or AI Platform Notebooks, your environment already meets all the requirements to run this notebook. You can skip this step.

Otherwise, make sure your environment meets this notebook's requirements. You need the following:

  • The Google Cloud SDK
  • Git
  • Python 3
  • virtualenv
  • Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to Setting up a Python development environment and the Jupyter installation guide provide detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:

  1. Install and initialize the Cloud SDK.

  2. Install Python 3.

  3. Install virtualenv and create a virtual environment that uses Python 3.

  4. Activate that environment and run pip install jupyter in a shell to install Jupyter.

  5. Run jupyter notebook in a shell to launch Jupyter.

  6. Open this notebook in the Jupyter Notebook Dashboard.

Set up your GCP project

The following steps are required, regardless of your notebook environment.

  1. Select or create a GCP project.. When you first create an account, you get a $300 free credit towards your compute/storage costs.

  2. Make sure that billing is enabled for your project.

  3. Enable the AI Platform APIs and Compute Engine APIs.

  4. Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

Note: Jupyter runs lines prefixed with ! as shell commands, and it interpolates Python variables prefixed with $ into these commands.


In [ ]:
PROJECT_ID = "[your-project-id]" #@param {type:"string"}
! gcloud config set project $PROJECT_ID

Authenticate your GCP account

If you are using AI Platform Notebooks, your environment is already authenticated. Skip this step.

If you are using Colab, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

Otherwise, follow these steps:

  1. In the GCP Console, go to the Create service account key page.

  2. From the Service account drop-down list, select New service account.

  3. In the Service account name field, enter a name.

  4. From the Role drop-down list, select Machine Learning Engine > AI Platform Admin and Storage > Storage Object Admin.

  5. Click Create. A JSON file that contains your key downloads to your local environment.

  6. Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.


In [ ]:
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
  %env GOOGLE_APPLICATION_CREDENTIALS ''

Create a Cloud Storage bucket

The following steps are required, regardless of your notebook environment.

You need to have a "workspace" bucket that will hold the dataset and the output from the ML Container. Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the REGION variable, which is used for operations throughout the rest of this notebook. Make sure to choose a region where Cloud AI Platform services are available. You may not use a Multi-Regional Storage bucket for training with AI Platform.


In [ ]:
BUCKET_NAME = "[your-bucket-name]" #@param {type:"string"}
REGION = 'us-central1' #@param {type:"string"}

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.


In [ ]:
! gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:


In [ ]:
! gsutil ls -al gs://$BUCKET_NAME

PIP Install Packages and dependencies


In [ ]:
! pip install witwidget

Import libraries and define constants


In [ ]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time
import pandas as pd
import tensorflow as tf
from IPython.core.display import HTML
from googleapiclient import discovery
import numpy as np
from sklearn.metrics import r2_score

Create a dataset


In [4]:
bh = tf.keras.datasets.boston_housing
(X_train, y_train), (X_eval, y_eval) = bh.load_data()

data_mean = X_train.mean(axis=0)
data_std = X_train.std(axis=0)

X_train = (X_train - data_mean) / data_std
X_eval = (X_eval - data_mean) / data_std

training = pd.DataFrame(X_train)
training.columns = ["f{}".format(c) for c in training.columns]
training['target'] = y_train

validation = pd.DataFrame(X_eval)
validation.columns = ["f{}".format(c) for c in validation.columns]
validation['target'] = y_eval

print('Training data head')
display(training.head())

training_data = os.path.join('gs://', BUCKET_NAME, 'data/train.csv')
validation_data = os.path.join('gs://', BUCKET_NAME, 'data/valid.csv')

print('Copy the data in bucket ...')
with tf.io.gfile.GFile(training_data, 'w') as f:
  training.to_csv(f, index=False)
with tf.io.gfile.GFile(validation_data, 'w') as f:
  validation.to_csv(f, index=False)


Training data head
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 target
0 -0.272246 -0.483615 -0.435762 -0.256833 -0.165227 -0.176443 0.813062 0.116698 -0.626249 -0.595170 1.148500 0.448077 0.825220 15.2
1 -0.403427 2.991784 -1.333912 -0.256833 -1.215182 1.894346 -1.910361 1.247585 -0.856463 -0.348433 -1.718189 0.431906 -1.329202 42.3
2 0.124940 -0.483615 1.028326 -0.256833 0.628642 -1.829688 1.110488 -1.187439 1.675886 1.565287 0.784476 0.220617 -1.308500 50.0
3 -0.401494 -0.483615 -0.869402 -0.256833 -0.361560 -0.324558 -1.236672 1.107180 -0.511142 -1.094663 0.784476 0.448077 -0.652926 21.1
4 -0.005634 -0.483615 1.028326 -0.256833 1.328612 0.153642 0.694808 -0.578572 1.675886 1.565287 0.784476 0.389882 0.263497 17.7
Copy the data in bucket ...

Cloud training

Accelerator and distribution support

GPU Multi-GPU Node TPU Workers Parameter Server
Yes No No Yes No

To have distribution and/or accelerators to your AI Platform training call, use parameters similar to the examples as shown below.

--master-machine-type standard_gpu \
    --worker-machine-type standard_gpu \
    --worker-count 2 \

AI Platform training


In [ ]:
output_location = os.path.join('gs://', BUCKET_NAME, 'output')

job_name = "xgboost_regression_{}".format(time.strftime("%Y%m%d%H%M%S"))
!gcloud ai-platform jobs submit training $job_name \
    --master-image-uri gcr.io/aihub-c2t-containers/kfp-components/trainer/dist_xgboost:latest \
    --region $REGION \
    --scale-tier CUSTOM \
    --master-machine-type standard \
    -- \
    --output-location {output_location} \
    --training-data {training_data} \
    --validation-data {validation_data} \
    --target-column target \
    --data-type csv \
    --objective reg:tweedie

Local training snippet

Note that the training can also be done locally with Docker

docker run \
    -v /tmp:/tmp \
    -it gcr.io/aihub-c2t-containers/kfp-components/trainer/dist_xgboost:latest \
    --training-data /tmp/train.csv \
    --validation-data /tmp/valid.csv \
    --output-location /tmp/output \
    --target-column target \
    --data-type csv \
    --objective reg:tweedie

Inspect the Run Report

The "Run Report" will help you identify if the model was successfully trained.


In [6]:
if not tf.io.gfile.exists(os.path.join(output_location, 'report.html')):
  raise RuntimeError('The file report.html was not found. Did the training job finish?')

with tf.io.gfile.GFile(os.path.join(output_location, 'report.html')) as f:
  display(HTML(f.read()))


temp_input_nb
+ Table of Contents

Runtime arguments

value
training_data gs://aihub-content-test/xgboost_regression/data/train.csv
target_column target
validation_data gs://aihub-content-test/xgboost_regression/data/valid.csv
job_dir None
output_location gs://aihub-content-test/xgboost_regression/output
data_type csv
fresh_start False
weight_column None
number_of_classes 1
num_round 10
early_stopping_rounds -1
verbosity 1
eta 0.3
gamma 0.01
max_depth 6
min_child_weight 1
max_delta_step 0
subsample 1
colsample_bytree 1
colsample_bylevel 1
colsample_bynode 1
reg_lambda 1
alpha 0
scale_pos_weight 1
objective reg:tweedie
tree_method auto
remainder None

Tensorboard snippet

To see the training progress, you can need to install the latest tensorboard with the command: pip install -U tensorboard and then run one of the following commands.

Local tensorboard

tensorboard --logdir gs://aihub-content-test/xgboost_regression/output

Publicly shared tensorboard

tensorboard dev upload --logdir gs://aihub-content-test/xgboost_regression/output

Datasets

Data reading snippet

import tensorflow as tf
import pandas as pd

sample = pd.DataFrame()
for filename in tf.io.gfile.glob('gs://aihub-content-test/xgboost_regression/data/valid.csv'):
  with tf.io.gfile.GFile(filename, 'r') as f:
    sample = sample.append(
      pd.read_csv(f, nrows=sample_size-len(sample)))
  if len(sample) >= sample_size:
    break

Training set sample

f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 target
0 -0.3699 -0.4836 -0.7209 -0.2568 -0.4298 2.7929 0.0498 -0.0436 -0.1658 -0.5952 -0.4896 0.2571 -1.2133 48.3
1 -0.3524 -0.4836 -0.1770 -0.2568 -0.1140 0.5190 0.4834 -0.2085 -0.6262 -0.6132 -0.0346 0.4481 -1.1318 22.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
402 -0.3744 -0.4836 -0.2079 -0.2568 0.2360 -1.2372 0.1394 -0.4644 -0.3960 -0.0897 0.3294 0.4481 1.1592 19.7
403 0.0878 -0.4836 1.0283 -0.2568 1.3713 -3.8173 0.6769 -1.0490 1.6759 1.5653 0.7845 -0.0009 -0.7758 27.5

404 rows × 14 columns

Validation set sample

f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 target
0 -0.3677 -0.4836 -0.5475 -0.2568 -0.5494 -0.3189 -0.6848 0.4837 -0.5111 -0.7155 0.5115 0.4481 -0.6957 20.4
1 0.1838 -0.4836 1.0283 -0.2568 1.3286 0.5472 1.0460 -0.6831 1.6759 1.5653 0.7845 0.0054 0.6886 15.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
402 2.4049 -0.4836 1.0283 -0.2568 1.0384 -1.3585 0.7199 -1.0321 1.6759 1.5653 0.7845 -2.4195 1.9183 10.4
403 0.6250 -0.4836 1.0283 -0.2568 1.3286 0.6502 0.8991 -0.6136 1.6759 1.5653 0.7845 -3.7034 0.8238 14.9

404 rows × 14 columns

Dataset inspection

You can use AI Platform to create a detailed inspection report for your dataset with the following console snippet:

DATA=gs://aihub-content-test/xgboost_regression/data/valid.csv
#DATA=gs://aihub-content-test/xgboost_regression/data/train.csv
OUTPUT_LOCATION=gs://aihub-content-test/xgboost_regression/output
# can be one of: tfrecord, parquet, avro, csv, json, bigquery
DATA_TYPE=csv
MAX_SAMPLE_SIZE=10000
JOB_NAME=tabular_data_inspection_$(date '+%Y%m%d_%H%M%S')

gcloud ai-platform jobs submit training $JOB_NAME \
  --stream-logs \
  --master-image-uri gcr.io/kf-pipeline-contrib/kfp-components/oob_algorithm/tabular_data_inspection:latest \
  -- \
  --output-location $OUTPUT_LOCATION \
  --data $DATA \
  --data-type $DATA_TYPE \
  --max-sample-size $MAX_SAMPLE_SIZE

Predictions

Local predictions snippet

Data reading snippet

Use the following python snippet if you need a sample of the input data locally.

import xgboost as xgb
import tensorflow as tf


tf.io.gfile.copy('gs://aihub-content-test/xgboost_regression/output/model.bst', '/tmp/model.bst')
bst = xgb.Booster({'nthread': 4})
bst.load_model('/tmp/model.bst')

predictions = bst.predict(xgb.DMatrix(data))

Training predictions

0
0 31.2152
1 19.8102
... ...
402 15.6326
403 18.0093

404 rows × 1 columns

Validation predictions

0
0 17.2700
1 14.0588
... ...
402 8.8163
403 11.4939

404 rows × 1 columns

Metrics

Actual VS Prediction

Distribution of the predictions

Distribution of the residuals

Absolute value of the residuals

Absolute value of the residuals as a percent of the target

Prediction tables

Training data and prediction

Best predictions

target predicted-target residual_abs residual f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12
row
68 10.2 10.27 0.06698 -0.06698 1.53 -0.4836 1.028 -0.2568 0.9701 -0.06218 1.11 -1.161 1.676 1.565 0.7845 0.4145 1.248
386 7.2 7.27 0.07043 -0.07043 1.416 -0.4836 1.028 -0.2568 1.218 -1.397 1.042 -1.141 1.676 1.565 0.7845 0.4481 2.494
164 11.9 12.01 0.108 -0.108 1.839 -0.4836 1.028 -0.2568 0.8677 -3.003 1.11 -1.264 1.676 1.565 0.7845 0.1642 1.463
193 13.6 13.37 0.2275 0.2275 -0.3943 -0.4836 2.445 -0.2568 0.4408 -0.4007 1.067 -0.9233 -0.6262 1.836 0.739 0.3758 0.7355
253 16 15.65 0.3511 0.3511 -0.3872 0.5695 -0.8782 -0.2568 -0.8908 -0.4247 0.874 1.518 -0.1658 -0.7336 0.557 0.2479 0.2345
100 7.5 7.869 0.3691 -0.3691 0.7681 -0.4836 1.028 -0.2568 1.038 0.7264 0.7808 -0.9473 1.676 1.565 0.7845 -3.545 1.801
343 8.3 7.869 0.4309 0.4309 1.313 -0.4836 1.028 -0.2568 1.038 -0.5235 0.9456 -0.9028 1.676 1.565 0.7845 -3.693 1.608
291 5 5.465 0.4645 -0.4645 3.75 -0.4836 1.028 -0.2568 1.158 -1.148 1.11 -1.11 1.676 1.565 0.7845 0.4481 2.463
381 8.4 7.901 0.4985 0.4985 1.076 -0.4836 1.028 -0.2568 1.559 -0.4684 0.6769 -0.9467 1.676 1.565 0.7845 -3.041 2.937
188 7 6.493 0.5074 0.5074 4.551 -0.4836 1.028 -0.2568 1.158 -2.466 1.11 -1.027 1.676 1.565 0.7845 -2.835 3.345
209 8.3 8.816 0.5163 -0.5163 2.281 -0.4836 1.028 -0.2568 1.158 -1.295 0.9672 -1.005 1.676 1.565 0.7845 0.4481 0.9701
329 8.4 7.827 0.5728 0.5728 0.8741 -0.4836 1.028 -0.2568 1.371 0.7856 0.2684 -0.9598 1.676 1.565 0.7845 -3.259 1.38
367 5 5.61 0.6103 -0.6103 6.953 -0.4836 1.028 -0.2568 1.158 -0.8239 1.11 -1.142 1.676 1.565 0.7845 0.3212 1.413
146 12.7 12.01 0.6919 0.6919 0.1001 -0.4836 1.028 -0.2568 1.329 -0.4106 0.6769 -0.5719 1.676 1.565 0.7845 -3.663 0.8652
57 12.6 11.91 0.6933 0.6933 0.6696 -0.4836 1.028 -0.2568 1.559 -0.02269 0.9887 -0.7606 1.676 1.565 0.7845 0.3589 0.5105
248 6.3 7.015 0.715 -0.715 0.6687 -0.4836 1.028 -0.2568 1.158 -0.5855 0.315 -1.105 1.676 1.565 0.7845 -0.1769 2.378
341 16.8 16.07 0.7293 0.7293 -0.3815 -0.4836 -0.2079 -0.2568 0.236 -0.3387 0.383 -0.6126 -0.396 -0.08966 0.3294 0.4481 0.2193
124 11.8 11.05 0.7473 0.7473 -0.1046 -0.4836 1.246 -0.2568 2.677 -1.924 1.032 -1.181 -0.5111 -0.01744 -1.718 0.4481 2.284
195 9.6 8.788 0.8121 0.8121 1.157 -0.4836 1.028 -0.2568 1.559 0.2735 0.8704 -0.857 1.676 1.565 0.7845 -3.482 0.7327
392 13.8 12.99 0.8133 0.8133 0.4671 -0.4836 1.028 -0.2568 0.2274 -1.185 0.9456 -0.6463 1.676 1.565 0.7845 -0.02344 0.7452
121 10.9 10.07 0.825 0.825 3.675 -0.4836 1.028 -0.2568 1.038 -0.09181 0.3472 -0.9259 1.676 1.565 0.7845 -3.574 0.2456
350 8.7 7.869 0.8309 0.8309 1.239 -0.4836 1.028 -0.2568 1.559 -0.1623 1.11 -0.9006 1.676 1.565 0.7845 -3.675 1.892
400 8.8 7.901 0.8985 0.8985 1.77 -0.4836 1.028 -0.2568 1.218 -2.679 0.7951 -1.135 1.676 1.565 0.7845 -0.7336 2.469
203 10.9 9.932 0.9679 0.9679 1.314 -0.4836 1.028 -0.2568 0.9701 0.392 1.078 -1.095 1.676 1.565 0.7845 0.4481 1.151
373 13.1 12.04 1.057 1.057 -0.1407 -0.4836 1.246 -0.2568 2.677 -1.404 0.8955 -0.9882 -0.5111 -0.01744 -1.718 -2.832 0.4691
216 8.5 9.562 1.062 -1.062 0.4255 -0.4836 1.028 -0.2568 1.158 -0.7336 1.071 -1.039 1.676 1.565 0.7845 0.4076 0.9908
245 14.6 13.54 1.062 1.062 0.703 -0.4836 1.028 -0.2568 0.4835 -0.1158 0.9922 -0.7742 1.676 1.565 0.7845 0.2651 0.73
293 11.3 10.18 1.118 1.118 0.5896 -0.4836 1.028 -0.2568 1.218 -1.031 1.11 -1.065 1.676 1.565 0.7845 0.4481 1.499
149 15.2 14.06 1.141 1.141 0.1838 -0.4836 1.028 -0.2568 1.329 0.5472 1.046 -0.6831 1.676 1.565 0.7845 0.005392 0.6886
157 13.4 12.24 1.156 1.156 0.3221 -0.4836 1.028 -0.2568 1.329 0.6798 0.8453 -0.6987 1.676 1.565 0.7845 -3.771 0.6486

Worst predictions

target predicted-target residual_abs residual f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12
row
25 50 30.01 19.99 19.99 0.3027 -0.4836 1.028 3.894 0.6286 1.056 1.021 -1.252 1.676 1.565 0.7845 0.3965 -1.35
60 50 30.47 19.53 19.53 0.1249 -0.4836 1.028 -0.2568 0.6286 -1.83 1.11 -1.187 1.676 1.565 0.7845 0.2206 -1.309
138 50 30.97 19.03 19.03 0.49 -0.4836 1.028 3.894 0.9445 -0.5531 0.7378 -1.288 1.676 1.565 0.7845 -0.07344 -0.5329
110 50 30.97 19.03 19.03 0.5945 -0.4836 1.028 -0.2568 0.6286 -0.07206 1.11 -1.268 1.676 1.565 0.7845 0.1209 -0.4431
224 48.8 29.77 19.03 19.03 -0.3494 0.3589 -1.049 -0.2568 0.7652 3.006 0.8059 -0.716 -0.5111 -0.8539 -2.492 0.3413 -0.9428
244 50 31.22 18.78 18.78 -0.4042 3.308 -1.454 3.894 -1.335 2.336 -1.584 1.058 -0.9716 -1.251 -2.219 0.4334 -1.322
170 50 31.22 18.78 18.78 -0.4043 2.886 -1.565 -0.2568 -1.155 2.268 -1.326 0.941 -0.6262 -0.9081 -1.855 0.4197 -1.349
83 50 31.22 18.78 18.78 -0.4036 3.518 -1.238 -0.2568 -1.206 2.492 -1.33 0.6795 -0.6262 -1.095 -1.718 0.3805 -1.361
269 50 31.3 18.7 18.7 0.2086 -0.4836 1.028 3.894 0.6286 0.5867 0.9958 -1.176 1.676 1.565 0.7845 0.2186 -1.244
388 50 31.82 18.18 18.18 -0.3487 -0.4836 -0.7209 -0.2568 -0.4555 3.467 0.5013 -0.4172 -0.1658 -0.5952 -0.4896 0.2896 -1.119
154 50 32.4 17.6 17.6 -0.188 -0.4836 1.246 -0.2568 0.4067 2.344 0.9743 -0.8356 -0.5111 -0.01744 -1.718 0.1544 -1.248
278 50 32.4 17.6 17.6 -0.2472 -0.4836 1.246 -0.2568 0.4067 1.724 0.7808 -0.8726 -0.5111 -0.01744 -1.718 0.209 -1.52
58 50 32.4 17.6 17.6 -0.2412 -0.4836 1.246 3.894 0.4067 2.973 0.8919 -0.7784 -0.5111 -0.01744 -1.718 0.3582 -1.3
115 50 32.4 17.6 17.6 -0.3395 0.3589 -1.049 -0.2568 0.7652 3.438 0.6411 -0.9564 -0.5111 -0.8539 -2.492 0.3715 -1.052
134 48.5 31.22 17.28 17.28 -0.402 3.518 -1.238 -0.2568 -1.206 2.237 -1.283 0.6795 -0.6262 -1.095 -1.718 0.4042 -1.233
0 48.3 31.22 17.08 17.08 -0.3699 -0.4836 -0.7209 -0.2568 -0.4298 2.793 0.04979 -0.04358 -0.1658 -0.5952 -0.4896 0.2571 -1.213
299 36.2 20.89 15.31 15.31 -0.3983 -0.4836 -1.271 -0.2568 -0.592 -0.1736 -0.2441 -0.5634 -0.7414 -1.281 -0.3076 0.4481 -0.4542
252 46 31.22 14.78 14.78 -0.3991 0.3589 -1.143 3.894 -0.977 1.944 -0.692 0.7258 -0.5111 -1.143 -1.627 0.2371 -1.343
232 45.4 31.22 14.18 14.18 -0.4019 0.3589 -1.143 -0.2568 -0.977 2.191 -0.1616 0.4707 -0.5111 -1.143 -1.627 0.346 -1.239
114 44.8 31.22 13.58 13.58 -0.3716 -0.4836 -0.7209 -0.2568 -0.4555 2.82 0.3329 -0.4172 -0.1658 -0.5952 -0.4896 0.322 -1.187
336 44 31.22 12.78 12.78 -0.4041 3.308 -1.081 -0.2568 -1.394 1.674 -1.247 1.28 -0.7414 -0.9743 -1.172 0.3357 -1.329
3 43.8 31.22 12.58 12.58 -0.3969 -0.4836 -1.207 -0.2568 -0.9591 2.191 -1.151 -0.1209 -0.8565 -0.7817 -0.2166 0.4122 -1.266
234 37 24.72 12.28 12.28 -0.396 1.412 -1.127 -0.2568 -1.027 0.9647 -1.703 1.351 -0.5111 -0.04753 -1.491 0.2436 -1.055
315 37.3 25.49 11.81 11.81 -0.3972 2.886 -0.9047 -0.2568 -1.249 1.243 -1.48 0.6788 -0.6262 -0.9683 0.3294 0.4481 -1.267
15 43.5 31.82 11.68 11.68 -0.3472 0.3589 -1.049 -0.2568 0.1506 1.697 -0.5881 -0.4282 -0.5111 -0.8539 -2.492 0.3779 -1.322
21 39.8 28.17 11.63 11.63 -0.3986 -0.4836 -1.271 -0.2568 -0.592 2.113 0.5121 -0.4928 -0.7414 -1.281 -0.3076 0.4338 -0.715
95 37.9 26.35 11.55 11.55 -0.3959 -0.4836 -1.271 -0.2568 -0.592 1.253 0.831 -0.5127 -0.7414 -1.281 -0.3076 0.4185 -1.093
166 36.2 24.72 11.48 11.48 -0.3983 -0.4836 -1.312 -0.2568 -0.8481 1.241 -0.5307 1.145 -0.7414 -1.107 0.1019 0.4481 -1.023
143 41.3 30.01 11.29 11.29 -0.2732 -0.4836 1.246 -0.2568 0.4067 0.9535 1.017 -0.9188 -0.5111 -0.01744 -1.718 0.09199 -1.125
187 42.3 31.22 11.08 11.08 -0.4034 2.992 -1.334 -0.2568 -1.215 1.894 -1.91 1.248 -0.8565 -0.3484 -1.718 0.4319 -1.329

Validation data and prediction

Best predictions

target predicted-target residual_abs residual f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12
row
85 10.2 10.27 0.06698 -0.06698 1.53 -0.4836 1.028 -0.2568 0.9701 -0.06218 1.11 -1.161 1.676 1.565 0.7845 0.4145 1.248
79 7.2 7.27 0.07043 -0.07043 1.416 -0.4836 1.028 -0.2568 1.218 -1.397 1.042 -1.141 1.676 1.565 0.7845 0.4481 2.494
329 11.9 12.01 0.108 -0.108 1.839 -0.4836 1.028 -0.2568 0.8677 -3.003 1.11 -1.264 1.676 1.565 0.7845 0.1642 1.463
212 13.6 13.37 0.2275 0.2275 -0.3943 -0.4836 2.445 -0.2568 0.4408 -0.4007 1.067 -0.9233 -0.6262 1.836 0.739 0.3758 0.7355
12 16 15.65 0.3511 0.3511 -0.3872 0.5695 -0.8782 -0.2568 -0.8908 -0.4247 0.874 1.518 -0.1658 -0.7336 0.557 0.2479 0.2345
40 7.5 7.869 0.3691 -0.3691 0.7681 -0.4836 1.028 -0.2568 1.038 0.7264 0.7808 -0.9473 1.676 1.565 0.7845 -3.545 1.801
133 8.3 7.869 0.4309 0.4309 1.313 -0.4836 1.028 -0.2568 1.038 -0.5235 0.9456 -0.9028 1.676 1.565 0.7845 -3.693 1.608
283 5 5.465 0.4645 -0.4645 3.75 -0.4836 1.028 -0.2568 1.158 -1.148 1.11 -1.11 1.676 1.565 0.7845 0.4481 2.463
139 8.4 7.901 0.4985 0.4985 1.076 -0.4836 1.028 -0.2568 1.559 -0.4684 0.6769 -0.9467 1.676 1.565 0.7845 -3.041 2.937
81 7 6.493 0.5074 0.5074 4.551 -0.4836 1.028 -0.2568 1.158 -2.466 1.11 -1.027 1.676 1.565 0.7845 -2.835 3.345
16 8.3 8.816 0.5163 -0.5163 2.281 -0.4836 1.028 -0.2568 1.158 -1.295 0.9672 -1.005 1.676 1.565 0.7845 0.4481 0.9701
325 8.4 7.827 0.5728 0.5728 0.8741 -0.4836 1.028 -0.2568 1.371 0.7856 0.2684 -0.9598 1.676 1.565 0.7845 -3.259 1.38
390 5 5.61 0.6103 -0.6103 6.953 -0.4836 1.028 -0.2568 1.158 -0.8239 1.11 -1.142 1.676 1.565 0.7845 0.3212 1.413
136 12.7 12.01 0.6919 0.6919 0.1001 -0.4836 1.028 -0.2568 1.329 -0.4106 0.6769 -0.5719 1.676 1.565 0.7845 -3.663 0.8652
364 12.6 11.91 0.6933 0.6933 0.6696 -0.4836 1.028 -0.2568 1.559 -0.02269 0.9887 -0.7606 1.676 1.565 0.7845 0.3589 0.5105
352 6.3 7.015 0.715 -0.715 0.6687 -0.4836 1.028 -0.2568 1.158 -0.5855 0.315 -1.105 1.676 1.565 0.7845 -0.1769 2.378
86 16.8 16.07 0.7293 0.7293 -0.3815 -0.4836 -0.2079 -0.2568 0.236 -0.3387 0.383 -0.6126 -0.396 -0.08966 0.3294 0.4481 0.2193
304 11.8 11.05 0.7473 0.7473 -0.1046 -0.4836 1.246 -0.2568 2.677 -1.924 1.032 -1.181 -0.5111 -0.01744 -1.718 0.4481 2.284
219 9.6 8.788 0.8121 0.8121 1.157 -0.4836 1.028 -0.2568 1.559 0.2735 0.8704 -0.857 1.676 1.565 0.7845 -3.482 0.7327
239 13.8 12.99 0.8133 0.8133 0.4671 -0.4836 1.028 -0.2568 0.2274 -1.185 0.9456 -0.6463 1.676 1.565 0.7845 -0.02344 0.7452
284 10.9 10.07 0.825 0.825 3.675 -0.4836 1.028 -0.2568 1.038 -0.09181 0.3472 -0.9259 1.676 1.565 0.7845 -3.574 0.2456
129 8.7 7.869 0.8309 0.8309 1.239 -0.4836 1.028 -0.2568 1.559 -0.1623 1.11 -0.9006 1.676 1.565 0.7845 -3.675 1.892
391 8.8 7.901 0.8985 0.8985 1.77 -0.4836 1.028 -0.2568 1.218 -2.679 0.7951 -1.135 1.676 1.565 0.7845 -0.7336 2.469
67 10.9 9.932 0.9679 0.9679 1.314 -0.4836 1.028 -0.2568 0.9701 0.392 1.078 -1.095 1.676 1.565 0.7845 0.4481 1.151
78 13.1 12.04 1.057 1.057 -0.1407 -0.4836 1.246 -0.2568 2.677 -1.404 0.8955 -0.9882 -0.5111 -0.01744 -1.718 -2.832 0.4691
61 8.5 9.562 1.062 -1.062 0.4255 -0.4836 1.028 -0.2568 1.158 -0.7336 1.071 -1.039 1.676 1.565 0.7845 0.4076 0.9908
244 14.6 13.54 1.062 1.062 0.703 -0.4836 1.028 -0.2568 0.4835 -0.1158 0.9922 -0.7742 1.676 1.565 0.7845 0.2651 0.73
292 11.3 10.18 1.118 1.118 0.5896 -0.4836 1.028 -0.2568 1.218 -1.031 1.11 -1.065 1.676 1.565 0.7845 0.4481 1.499
1 15.2 14.06 1.141 1.141 0.1838 -0.4836 1.028 -0.2568 1.329 0.5472 1.046 -0.6831 1.676 1.565 0.7845 0.005392 0.6886
169 13.4 12.24 1.156 1.156 0.3221 -0.4836 1.028 -0.2568 1.329 0.6798 0.8453 -0.6987 1.676 1.565 0.7845 -3.771 0.6486

Worst predictions

target predicted-target residual_abs residual f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12
row
34 50 30.01 19.99 19.99 0.3027 -0.4836 1.028 3.894 0.6286 1.056 1.021 -1.252 1.676 1.565 0.7845 0.3965 -1.35
46 50 30.47 19.53 19.53 0.1249 -0.4836 1.028 -0.2568 0.6286 -1.83 1.11 -1.187 1.676 1.565 0.7845 0.2206 -1.309
296 50 30.97 19.03 19.03 0.49 -0.4836 1.028 3.894 0.9445 -0.5531 0.7378 -1.288 1.676 1.565 0.7845 -0.07344 -0.5329
328 50 30.97 19.03 19.03 0.5945 -0.4836 1.028 -0.2568 0.6286 -0.07206 1.11 -1.268 1.676 1.565 0.7845 0.1209 -0.4431
229 48.8 29.77 19.03 19.03 -0.3494 0.3589 -1.049 -0.2568 0.7652 3.006 0.8059 -0.716 -0.5111 -0.8539 -2.492 0.3413 -0.9428
164 50 31.22 18.78 18.78 -0.4036 3.518 -1.238 -0.2568 -1.206 2.492 -1.33 0.6795 -0.6262 -1.095 -1.718 0.3805 -1.361
228 50 31.22 18.78 18.78 -0.4043 2.886 -1.565 -0.2568 -1.155 2.268 -1.326 0.941 -0.6262 -0.9081 -1.855 0.4197 -1.349
230 50 31.22 18.78 18.78 -0.4042 3.308 -1.454 3.894 -1.335 2.336 -1.584 1.058 -0.9716 -1.251 -2.219 0.4334 -1.322
103 50 31.3 18.7 18.7 0.2086 -0.4836 1.028 3.894 0.6286 0.5867 0.9958 -1.176 1.676 1.565 0.7845 0.2186 -1.244
380 50 31.82 18.18 18.18 -0.3487 -0.4836 -0.7209 -0.2568 -0.4555 3.467 0.5013 -0.4172 -0.1658 -0.5952 -0.4896 0.2896 -1.119
310 50 32.4 17.6 17.6 -0.188 -0.4836 1.246 -0.2568 0.4067 2.344 0.9743 -0.8356 -0.5111 -0.01744 -1.718 0.1544 -1.248
76 50 32.4 17.6 17.6 -0.2472 -0.4836 1.246 -0.2568 0.4067 1.724 0.7808 -0.8726 -0.5111 -0.01744 -1.718 0.209 -1.52
279 50 32.4 17.6 17.6 -0.3395 0.3589 -1.049 -0.2568 0.7652 3.438 0.6411 -0.9564 -0.5111 -0.8539 -2.492 0.3715 -1.052
401 50 32.4 17.6 17.6 -0.2412 -0.4836 1.246 3.894 0.4067 2.973 0.8919 -0.7784 -0.5111 -0.01744 -1.718 0.3582 -1.3
211 48.5 31.22 17.28 17.28 -0.402 3.518 -1.238 -0.2568 -1.206 2.237 -1.283 0.6795 -0.6262 -1.095 -1.718 0.4042 -1.233
52 48.3 31.22 17.08 17.08 -0.3699 -0.4836 -0.7209 -0.2568 -0.4298 2.793 0.04979 -0.04358 -0.1658 -0.5952 -0.4896 0.2571 -1.213
47 36.2 20.89 15.31 15.31 -0.3983 -0.4836 -1.271 -0.2568 -0.592 -0.1736 -0.2441 -0.5634 -0.7414 -1.281 -0.3076 0.4481 -0.4542
266 46 31.22 14.78 14.78 -0.3991 0.3589 -1.143 3.894 -0.977 1.944 -0.692 0.7258 -0.5111 -1.143 -1.627 0.2371 -1.343
335 45.4 31.22 14.18 14.18 -0.4019 0.3589 -1.143 -0.2568 -0.977 2.191 -0.1616 0.4707 -0.5111 -1.143 -1.627 0.346 -1.239
208 44.8 31.22 13.58 13.58 -0.3716 -0.4836 -0.7209 -0.2568 -0.4555 2.82 0.3329 -0.4172 -0.1658 -0.5952 -0.4896 0.322 -1.187
101 44 31.22 12.78 12.78 -0.4041 3.308 -1.081 -0.2568 -1.394 1.674 -1.247 1.28 -0.7414 -0.9743 -1.172 0.3357 -1.329
218 43.8 31.22 12.58 12.58 -0.3969 -0.4836 -1.207 -0.2568 -0.9591 2.191 -1.151 -0.1209 -0.8565 -0.7817 -0.2166 0.4122 -1.266
180 37 24.72 12.28 12.28 -0.396 1.412 -1.127 -0.2568 -1.027 0.9647 -1.703 1.351 -0.5111 -0.04753 -1.491 0.2436 -1.055
15 37.3 25.49 11.81 11.81 -0.3972 2.886 -0.9047 -0.2568 -1.249 1.243 -1.48 0.6788 -0.6262 -0.9683 0.3294 0.4481 -1.267
334 43.5 31.82 11.68 11.68 -0.3472 0.3589 -1.049 -0.2568 0.1506 1.697 -0.5881 -0.4282 -0.5111 -0.8539 -2.492 0.3779 -1.322
188 39.8 28.17 11.63 11.63 -0.3986 -0.4836 -1.271 -0.2568 -0.592 2.113 0.5121 -0.4928 -0.7414 -1.281 -0.3076 0.4338 -0.715
306 37.9 26.35 11.55 11.55 -0.3959 -0.4836 -1.271 -0.2568 -0.592 1.253 0.831 -0.5127 -0.7414 -1.281 -0.3076 0.4185 -1.093
137 36.2 24.72 11.48 11.48 -0.3983 -0.4836 -1.312 -0.2568 -0.8481 1.241 -0.5307 1.145 -0.7414 -1.107 0.1019 0.4481 -1.023
273 41.3 30.01 11.29 11.29 -0.2732 -0.4836 1.246 -0.2568 0.4067 0.9535 1.017 -0.9188 -0.5111 -0.01744 -1.718 0.09199 -1.125
263 42.3 31.22 11.08 11.08 -0.4034 2.992 -1.334 -0.2568 -1.215 1.894 -1.91 1.248 -0.8565 -0.3484 -1.718 0.4319 -1.329

Deployment parameters


In [12]:
#@markdown ---
model = 'xgboost_boston_housing' #@param {type:"string"}
version = 'v1' #@param {type:"string"}
#@markdown ---

In [ ]:
# the exact location of the model is in model_uri.txt
with tf.io.gfile.GFile(os.path.join(output_location, 'model_uri.txt')) as f:
  model_uri = f.read().replace('/model.bst', '')

# create a model
! gcloud ai-platform models create $model --regions $REGION

# create a version
! gcloud ai-platform versions create $version \
  --model $model \
  --runtime-version 1.15 \
  --origin $model_uri \
  --framework XGBOOST \
  --project $PROJECT_ID

Use the endpoint for online predictions


In [14]:
# format the data for serving
instances = validation.drop(columns='target').values.tolist()
validation_targets = validation['target']
display(instances[:2])

service = discovery.build('ml', 'v1')
name = 'projects/{project}/models/{model}/versions/{version}'.format(project=PROJECT_ID,
                                                                     model=model,
                                                                     version=version)
body = {'instances': instances}

response = service.projects().predict(name=name, body=body).execute()
if 'error' in response:
    raise RuntimeError(response['error'])

predictions = [row for row in response['predictions']]

R2 = r2_score(validation_targets, predictions)
print('Coefficient of dermination for the predictions: {}'.format(R2))


[[1.5536935453162368,
  -0.4836154708652843,
  1.0283257954396188,
  -0.2568327484687563,
  1.0383806679462964,
  0.23545815425123368,
  1.1104882815291357,
  -0.9397693559689599,
  1.6758857724016463,
  1.5652874992218142,
  0.7844763709927688,
  -3.484595532746983,
  2.2509207364548414],
 [-0.39242675047976094,
  -0.4836154708652843,
  -0.1608777304070192,
  -0.2568327484687563,
  -0.0884006055354238,
  -0.49947436060626255,
  0.8560632883792862,
  -0.6839623468736831,
  -0.3960355701527182,
  0.15707841264637773,
  -0.30759583189964385,
  0.4273312622567081,
  0.4788011914229041]]
/Users/evo/Library/Python/3.7/lib/python/site-packages/google/auth/_default.py:66: UserWarning:

Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/

Coefficient of dermination for the predictions: 0.3435016260520021

Inspect the ML model


In [15]:
import witwidget
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

config_builder = WitConfigBuilder(examples=validation.values.tolist(),
                                  feature_names=validation.columns.tolist())
config_builder.set_ai_platform_model(project=PROJECT_ID,
                                     model=model,
                                     version=version)
config_builder.set_model_type('regression')
config_builder.set_target_feature('target')
WitWidget(config_builder)


Cleaning up

To clean up all GCP resources used in this project, you can delete the GCP project you used for the tutorial.


In [ ]:
# Delete model version resource
! gcloud ai-platform versions delete $version --quiet --model $model 

# Delete model resource
! gcloud ai-platform models delete $model --quiet

# If training job is still running, cancel it
! gcloud ai-platform jobs cancel $job_name --quiet

# Delete Cloud Storage objects that were created
! gsutil -m rm -r $BUCKET_NAME