By deploying or using this software you agree to comply with the AI Hub Terms of Service and the Google APIs Terms of Service. To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.

Overview

This notebook provides an example workflow of using the Distributed XGBoost ML container for training a classification ML model.

Dataset

The notebook uses the Iris dataset. It consists of 3 different types of irises (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x5 table.

Objective

The goal of this notebook is to go through a common training workflow:

Create a dataset
Train an ML model using the AI Platform Training service
Monitor the training job with TensorBoard
Identify if the model was trained successfully by looking at the generated "Run Report"
Deploy the model for serving using the AI Platform Prediction service
Use the endpoint for online predictions
Interactively inspect the deployed ML model with the What-If Tool

Costs

This tutorial uses billable components of Google Cloud Platform (GCP):

Cloud AI Platform
Cloud Storage

Learn about Cloud AI Platform pricing and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Set up your local development environment

If you are using Colab or AI Platform Notebooks, your environment already meets all the requirements to run this notebook. You can skip this step.

Otherwise, make sure your environment meets this notebook's requirements. You need the following:

The Google Cloud SDK
Git
Python 3
virtualenv
Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to Setting up a Python development environment and the Jupyter installation guide provide detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:

Install and initialize the Cloud SDK.
Install Python 3.
Install virtualenv and create a virtual environment that uses Python 3.
Activate that environment and run pip install jupyter in a shell to install Jupyter.
Run jupyter notebook in a shell to launch Jupyter.
Open this notebook in the Jupyter Notebook Dashboard.

Set up your GCP project

The following steps are required, regardless of your notebook environment.

Select or create a GCP project.. When you first create an account, you get a $300 free credit towards your compute/storage costs.
Make sure that billing is enabled for your project.
Enable the AI Platform APIs and Compute Engine APIs.
Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

Note: Jupyter runs lines prefixed with ! as shell commands, and it interpolates Python variables prefixed with $ into these commands.



In [ ]:

    
PROJECT_ID = "[your-project-id]" #@param {type:"string"}
! gcloud config set project $PROJECT_ID

Authenticate your GCP account

If you are using AI Platform Notebooks, your environment is already authenticated. Skip this step.

If you are using Colab, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

Otherwise, follow these steps:

In the GCP Console, go to the Create service account key page.
From the Service account drop-down list, select New service account.
In the Service account name field, enter a name.
From the Role drop-down list, select Machine Learning Engine > AI Platform Admin and Storage > Storage Object Admin.
Click Create. A JSON file that contains your key downloads to your local environment.
Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.



In [ ]:

    
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
  %env GOOGLE_APPLICATION_CREDENTIALS ''

Create a Cloud Storage bucket

The following steps are required, regardless of your notebook environment.

You need to have a "workspace" bucket that will hold the dataset and the output from the ML Container. Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the REGION variable, which is used for operations throughout the rest of this notebook. Make sure to choose a region where Cloud AI Platform services are available. You may not use a Multi-Regional Storage bucket for training with AI Platform.



In [ ]:

    
BUCKET_NAME = "[your-bucket-name]" #@param {type:"string"}
REGION = 'us-central1' #@param {type:"string"}

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.



In [ ]:

    
! gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:



In [ ]:

    
! gsutil ls -al gs://$BUCKET_NAME

PIP Install Packages and dependencies



In [ ]:

    
! pip install witwidget

Import libraries and define constants



In [ ]:

    
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from IPython.core.display import HTML
from googleapiclient import discovery

Create a dataset



In [4]:

    
# load Iris dataset
iris = datasets.load_iris()
names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
data = pd.DataFrame(iris.data, columns=names)

# add target
data['target'] = iris.target

# split
training, validation = train_test_split(data, test_size=50, stratify=data['target'])

# standardization
training_targets = training.pop('target')
validation_targets = validation.pop('target')

data_mean = training.mean(axis=0)
data_std = training.std(axis=0)
training = (training - data_mean) / data_std
training['target'] = training_targets

validation = (validation - data_mean) / data_std
validation['target'] = validation_targets

print('Training data head')
display(training.head())

training_data = os.path.join('gs://', BUCKET_NAME, 'data/train.csv')
validation_data = os.path.join('gs://', BUCKET_NAME, 'data/valid.csv')

print('Copy the data in bucket ...')
with tf.io.gfile.GFile(training_data, 'w') as f:
  training.to_csv(f, index=False)
with tf.io.gfile.GFile(validation_data, 'w') as f:
  validation.to_csv(f, index=False)









    



Training data head






    







  
    
      
      sepal_length
      sepal_width
      petal_length
      petal_width
      target
    
  
  
    
      17
      -0.882999
      0.962660
      -1.356548
      -1.186407
      0
    
    
      107
      1.749248
      -0.341168
      1.432829
      0.782189
      2
    
    
      99
      -0.165114
      -0.558473
      0.180456
      0.125990
      1
    
    
      133
      0.552772
      -0.558473
      0.749716
      0.388470
      2
    
    
      21
      -0.882999
      1.397269
      -1.299622
      -1.055167
      0
    
  








    



Copy the data in bucket ...

	sepal_length	sepal_width	petal_length	petal_width	target
17	-0.882999	0.962660	-1.356548	-1.186407	0
107	1.749248	-0.341168	1.432829	0.782189	2
99	-0.165114	-0.558473	0.180456	0.125990	1
133	0.552772	-0.558473	0.749716	0.388470	2
21	-0.882999	1.397269	-1.299622	-1.055167	0

Cloud training

Accelerator and distribution support

GPU	Multi-GPU Node	TPU	Workers	Parameter Server
Yes	No	No	Yes	No

To have distribution and/or accelerators to your AI Platform training call, use parameters similar to the examples as shown below.

--master-machine-type standard_gpu
    --worker-machine-type standard_gpu \
    --worker-count 2 \

AI Platform training



In [ ]:

    
output_location = os.path.join('gs://', BUCKET_NAME, 'output')

job_name = "xgboost_classification_{}".format(time.strftime("%Y%m%d%H%M%S"))
!gcloud ai-platform jobs submit training $job_name \
    --master-image-uri gcr.io/aihub-c2t-containers/kfp-components/trainer/dist_xgboost:latest \
    --region $REGION \
    --scale-tier CUSTOM \
    --master-machine-type standard \
    -- \
    --output-location {output_location} \
    --training-data {training_data} \
    --validation-data {validation_data} \
    --target-column target \
    --data-type csv \
    --number-of-classes 3 \
    --fresh-start True \
    --objective multi:softprob \

Local training snippet

Note that the training can also be done locally with Docker

docker run \
    -v /tmp:/tmp \
    -it gcr.io/aihub-c2t-containers/kfp-components/trainer/dist_xgboost:latest \
    --output-location /tmp/fm_classification \
    --training-data /tmp/iris_train.csv \
    --validation-data /tmp/iris_valid.csv \
    --target-column target \
    --data-type csv \
    --number-of-classes 3 \
    --objective multi:softprob

Inspect the Run Report

The "Run Report" will help you identify if the model was successfully trained.



In [6]:

    
if not tf.io.gfile.exists(os.path.join(output_location, 'report.html')):
  raise RuntimeError('The file report.html was not found. Did the training job finish?')

with tf.io.gfile.GFile(os.path.join(output_location, 'report.html')) as f:
  display(HTML(f.read()))









    




        
        






    






temp_input_nb

















+ Table of Contents



















Runtime arguments¶











value




training_data
gs://aihub-content-test/xgboost_classification/data/train.csv


target_column
target


validation_data
gs://aihub-content-test/xgboost_classification/data/valid.csv


job_dir
None


output_location
gs://aihub-content-test/xgboost_classification/output


data_type
csv


fresh_start
True


weight_column
None


number_of_classes
3


num_round
10


early_stopping_rounds
-1


verbosity
1


eta
0.3


gamma
0.01


max_depth
6


min_child_weight
1


max_delta_step
0


subsample
1


colsample_bytree
1


colsample_bylevel
1


colsample_bynode
1


reg_lambda
1


alpha
0


scale_pos_weight
1


objective
multi:softprob


tree_method
auto


remainder
None









Tensorboard snippet¶





To see the training progress, you can need to install the latest tensorboard
with the command: pip install -U tensorboard
and then run one of the following commands.
Local tensorboard¶
tensorboard --logdir gs://aihub-content-test/xgboost_classification/output

Publicly shared tensorboard¶
tensorboard dev upload --logdir gs://aihub-content-test/xgboost_classification/output






Datasets¶





Data reading snippet¶





import tensorflow as tf
import pandas as pd

sample = pd.DataFrame()
for filename in tf.io.gfile.glob('gs://aihub-content-test/xgboost_classification/data/valid.csv'):
  with tf.io.gfile.GFile(filename, 'r') as f:
    sample = sample.append(
      pd.read_csv(f, nrows=sample_size-len(sample)))
  if len(sample) >= sample_size:
    break






Training set sample¶











sepal_length
sepal_width
petal_length
petal_width
target












0
-1.0026
0.5281
-1.3565
-1.3176
0.0


1
-1.1223
0.0934
-1.2996
-1.4489
0.0


...
...
...
...
...
...


98
2.4671
1.6146
1.4898
1.0447
2.0


99
0.4331
-1.8623
0.4082
0.3885
1.0



100 rows × 5 columns






Validation set sample¶











sepal_length
sepal_width
petal_length
petal_width
target












0
-1.1223
1.1800
-1.3565
-1.4489
0.0


1
2.2278
-0.1239
1.3190
1.4384
2.0


...
...
...
...
...
...


98
0.1938
-1.8623
0.1235
-0.2677
1.0


99
0.9117
-0.1239
0.3512
0.2572
1.0



100 rows × 5 columns






Dataset inspection¶





You can use AI Platform to create a detailed inspection report for your
dataset with the following console snippet:
DATA=gs://aihub-content-test/xgboost_classification/data/valid.csv
#DATA=gs://aihub-content-test/xgboost_classification/data/train.csv
OUTPUT_LOCATION=gs://aihub-content-test/xgboost_classification/output
# can be one of: tfrecord, parquet, avro, csv, json, bigquery
DATA_TYPE=csv
MAX_SAMPLE_SIZE=10000
JOB_NAME=tabular_data_inspection_$(date '+%Y%m%d_%H%M%S')

gcloud ai-platform jobs submit training $JOB_NAME \
  --stream-logs \
  --master-image-uri gcr.io/kf-pipeline-contrib/kfp-components/oob_algorithm/tabular_data_inspection:latest \
  -- \
  --output-location $OUTPUT_LOCATION \
  --data $DATA \
  --data-type $DATA_TYPE \
  --max-sample-size $MAX_SAMPLE_SIZE






Predictions¶





Local predictions snippet¶





Data reading snippet¶
Use the following python snippet if you need a sample of the input data locally.
import xgboost as xgb
import tensorflow as tf


tf.io.gfile.copy('gs://aihub-content-test/xgboost_classification/output/model.bst', '/tmp/model.bst')
bst = xgb.Booster({'nthread': 4})
bst.load_model('/tmp/model.bst')

predictions = bst.predict(xgb.DMatrix(data))






Training predictions¶











0
1
2




0
0.6043
0.1990
0.1967


1
0.6043
0.1990
0.1967


...
...
...
...


98
0.1982
0.2023
0.5995


99
0.1854
0.5682
0.2464



100 rows × 3 columns






Validation predictions¶











0
1
2




0
0.6043
0.1990
0.1967


1
0.1982
0.2023
0.5995


...
...
...
...


98
0.1973
0.6048
0.1979


99
0.1973
0.6048
0.1979



100 rows × 3 columns









Metrics¶









Training dataset¶





Confusion matrix¶





Count    Predicted  
    0  1  2  


Actual
0
33
0
0


1
0
32
1


2
0
2
32







Relative    Predicted  
    0  1  2  


Actual
0
1
0
0


1
0
1
0.03


2
0
0.06
0.9













Aggregated metrics



accuracy
f1-score
precision
recall




weighted value
0.97
0.97
0.9703
0.97









Classification metrics¶












Per class metrics




precision
recall
f1-score
support




Label
0
1.0000
1.0000
1.0000
33


1
0.9412
0.9697
0.9552
33


2
0.9697
0.9412
0.9552
34









Validation dataset¶





Confusion matrix¶





Count    Predicted  
    0  1  2  


Actual
0
33
0
0


1
0
32
1


2
0
2
32







Relative    Predicted  
    0  1  2  


Actual
0
1
0
0


1
0
1
0.03


2
0
0.06
0.9













Aggregated metrics



accuracy
f1-score
precision
recall




weighted value
0.97
0.97
0.9703
0.97









Classification metrics¶












Per class metrics




precision
recall
f1-score
support




Label
0
1.0000
1.0000
1.0000
33


1
0.9412
0.9697
0.9552
33


2
0.9697
0.9412
0.9552
34















ROC Curve¶





Training ROC Curve¶


























0
1
2
Mean AUC




Area Under Curve
1.0
0.9982
0.9982
0.9988









Validation ROC Curve¶




















0
1
2
Mean AUC




Area Under Curve
1.0
0.9982
0.9982
0.9988












Prediction tables¶









Training data and prediction¶





Best predictions¶





   target  predicted-target  log_loss  sepal_length  sepal_width  petal_length  petal_width  
          
  row         


64
1
1
0.3144
0.9117
-0.1239
0.3512
0.2572


62
1
1
0.3144
0.9117
-0.3412
0.4651
0.126


24
1
1
0.3144
-0.4044
-0.9931
0.3512
-0.00525


63
1
1
0.3144
-0.4044
-1.21
0.1235
0.126


22
1
1
0.3144
0.3135
-0.5585
0.1235
0.126


21
1
1
0.3144
-0.2848
-0.7758
0.2374
0.126


30
1
1
0.3144
0.4331
-0.3412
0.2943
0.126


72
1
1
0.3144
-0.1651
-0.9931
-0.1611
-0.2677


27
1
1
0.3144
-0.4044
-1.428
-0.04725
-0.2677


13
1
1
0.3144
0.1938
-1.862
0.1235
-0.2677


12
1
1
0.3144
-0.1651
-0.3412
0.2374
0.126


35
1
1
0.3144
-0.1651
-0.5585
0.1805
0.126


82
1
1
0.3144
1.39
0.3107
0.522
0.2572


96
1
1
0.3144
-0.4044
-1.645
0.1235
0.126


2
1
1
0.3144
-0.2848
-0.1239
0.1805
0.126


95
1
1
0.3144
-0.2848
-1.21
0.0666
-0.1365


38
1
1
0.3144
-0.04547
-0.7758
0.0666
-0.00525


11
1
1
0.3144
-0.4044
-1.428
0.009677
-0.1365


87
1
1
0.3144
-0.1651
-0.5585
0.4082
0.126


8
1
1
0.3144
0.5528
-1.645
0.3512
0.126


36
1
1
0.3144
0.3135
-0.5585
0.522
-0.00525


28
1
1
0.3144
0.3135
-0.1239
0.4651
0.2572


56
0
0
0.3149
-1.481
1.18
-1.584
-1.318


55
0
0
0.3149
-1.601
-1.645
-1.413
-1.186


54
0
0
0.3149
-1.481
0.7454
-1.357
-1.186


50
0
0
0.3149
-0.6437
1.397
-1.3
-1.318


53
0
0
0.3149
-1.122
0.09344
-1.3
-1.318


58
0
0
0.3149
-0.883
0.9627
-1.357
-1.318


51
0
0
0.3149
-0.883
1.615
-1.072
-1.055


0
0
0
0.3149
-1.003
0.5281
-1.357
-1.318







Worst predictions¶





   target  predicted-target  log_loss  sepal_length  sepal_width  petal_length  petal_width  
          
  row         


25
1
2
0.6778
0.07418
0.3107
0.5789
0.7822


47
2
1
0.6323
1.63
-0.1239
1.148
0.5197


4
2
1
0.6243
-1.122
-1.21
0.4082
0.6509


69
1
1
0.5047
0.1938
-0.7758
0.7497
0.5197


89
1
1
0.5047
1.031
-0.1239
0.6928
0.6509


9
2
2
0.4526
0.1938
-0.1239
0.5789
0.7822


31
2
2
0.4526
0.4331
-0.5585
0.5789
0.7822


77
1
1
0.4388
-1.122
-1.428
-0.275
-0.2677


83
1
1
0.4388
-0.7634
-0.7758
0.0666
0.2572


71
2
2
0.3968
0.1938
-1.862
0.6928
0.3885


76
2
2
0.3968
0.3135
-0.9931
1.034
0.2572


84
2
2
0.3968
0.5528
-0.5585
0.7497
0.3885


70
1
1
0.3511
0.1938
-0.3412
0.4082
0.3885


99
1
1
0.3511
0.4331
-1.862
0.4082
0.3885


7
1
1
0.3508
-0.5241
-0.1239
0.4082
0.3885


97
1
1
0.3508
0.07418
-0.1239
0.2374
0.3885


39
1
1
0.331
0.6724
0.3107
0.4082
0.3885


66
1
1
0.331
1.031
0.09344
0.522
0.3885


46
2
2
0.3195
0.5528
0.5281
1.262
1.701


40
2
2
0.3195
1.031
-0.1239
0.8066
1.438


98
2
2
0.3195
2.467
1.615
1.49
1.045


44
2
2
0.3195
0.6724
0.09344
0.9774
0.7822


52
2
2
0.3195
0.5528
-0.7758
0.6359
0.7822


67
2
2
0.3195
1.63
1.18
1.319
1.701


33
2
2
0.3195
0.3135
-0.1239
0.6359
0.7822


32
2
2
0.3195
0.5528
-1.21
0.6928
0.9134


57
2
2
0.3195
-0.04547
-0.7758
0.7497
0.9134


60
2
2
0.3195
0.7921
-0.1239
0.8066
1.045


61
2
2
0.3195
0.07418
-0.1239
0.7497
0.7822


3
2
2
0.3195
2.228
-0.9931
1.774
1.438







Validation data and prediction¶





Best predictions¶





   target  predicted-target  log_loss  sepal_length  sepal_width  petal_length  petal_width  
          
  row         


99
1
1
0.3144
0.9117
-0.1239
0.3512
0.2572


43
1
1
0.3144
-0.1651
-0.5585
0.4082
0.126


44
1
1
0.3144
-0.1651
-0.3412
0.2374
0.126


47
1
1
0.3144
0.3135
-0.1239
0.4651
0.2572


98
1
1
0.3144
0.1938
-1.862
0.1235
-0.2677


31
1
1
0.3144
-0.1651
-0.5585
0.1805
0.126


59
1
1
0.3144
-0.2848
-0.7758
0.2374
0.126


25
1
1
0.3144
-0.4044
-1.645
0.1235
0.126


63
1
1
0.3144
0.3135
-0.5585
0.522
-0.00525


22
1
1
0.3144
0.9117
-0.3412
0.4651
0.126


21
1
1
0.3144
0.3135
-0.5585
0.1235
0.126


42
1
1
0.3144
-0.1651
-0.9931
-0.1611
-0.2677


18
1
1
0.3144
1.39
0.3107
0.522
0.2572


71
1
1
0.3144
-0.4044
-0.9931
0.3512
-0.00525


19
1
1
0.3144
0.4331
-0.3412
0.2943
0.126


73
1
1
0.3144
-0.4044
-1.21
0.1235
0.126


11
1
1
0.3144
-0.04547
-0.7758
0.0666
-0.00525


3
1
1
0.3144
0.5528
-1.645
0.3512
0.126


90
1
1
0.3144
-0.2848
-0.1239
0.1805
0.126


79
1
1
0.3144
-0.4044
-1.428
0.009677
-0.1365


6
1
1
0.3144
-0.2848
-1.21
0.0666
-0.1365


14
1
1
0.3144
-0.4044
-1.428
-0.04725
-0.2677


96
0
0
0.3149
-0.6437
1.397
-1.3
-1.318


95
0
0
0.3149
-1.242
0.7454
-1.072
-1.318


45
0
0
0.3149
-1.362
0.3107
-1.413
-1.318


50
0
0
0.3149
-1.003
0.5281
-1.357
-1.318


86
0
0
0.3149
-0.5241
0.7454
-1.186
-1.318


51
0
0
0.3149
-1.481
0.09344
-1.3
-1.318


52
0
0
0.3149
-1.481
0.7454
-1.357
-1.186


57
0
0
0.3149
-0.5241
1.832
-1.413
-1.055







Worst predictions¶





   target  predicted-target  log_loss  sepal_length  sepal_width  petal_length  petal_width  
          
  row         


76
1
2
0.6778
0.07418
0.3107
0.5789
0.7822


61
2
1
0.6323
1.63
-0.1239
1.148
0.5197


10
2
1
0.6243
-1.122
-1.21
0.4082
0.6509


16
1
1
0.5047
1.031
-0.1239
0.6928
0.6509


94
1
1
0.5047
0.1938
-0.7758
0.7497
0.5197


35
2
2
0.4526
0.4331
-0.5585
0.5789
0.7822


4
2
2
0.4526
0.1938
-0.1239
0.5789
0.7822


82
1
1
0.4388
-1.122
-1.428
-0.275
-0.2677


55
1
1
0.4388
-0.7634
-0.7758
0.0666
0.2572


58
2
2
0.3968
0.5528
-0.5585
0.7497
0.3885


37
2
2
0.3968
0.1938
-1.862
0.6928
0.3885


26
2
2
0.3968
0.3135
-0.9931
1.034
0.2572


80
1
1
0.3511
0.1938
-0.3412
0.4082
0.3885


48
1
1
0.3511
0.4331
-1.862
0.4082
0.3885


89
1
1
0.3508
0.07418
-0.1239
0.2374
0.3885


60
1
1
0.3508
-0.5241
-0.1239
0.4082
0.3885


72
1
1
0.331
1.031
0.09344
0.522
0.3885


77
1
1
0.331
0.6724
0.3107
0.4082
0.3885


62
2
2
0.3195
1.031
-0.1239
0.8066
1.438


24
2
2
0.3195
0.6724
-0.5585
1.034
1.307


49
2
2
0.3195
1.749
-0.3412
1.433
0.7822


28
2
2
0.3195
2.228
1.615
1.661
1.307


27
2
2
0.3195
0.6724
0.3107
0.8636
1.438


66
2
2
0.3195
1.031
0.5281
1.091
1.176


54
2
2
0.3195
1.51
-0.1239
1.205
1.176


53
2
2
0.3195
0.07418
-0.1239
0.7497
0.7822


29
2
2
0.3195
0.5528
0.5281
1.262
1.701


34
2
2
0.3195
0.4331
0.7454
0.9205
1.438


46
2
2
0.3195
0.6724
0.09344
0.9774
0.7822


36
2
2
0.3195
0.5528
-1.21
0.6928
0.9134

	value
training_data	gs://aihub-content-test/xgboost_classification/data/train.csv
target_column	target
validation_data	gs://aihub-content-test/xgboost_classification/data/valid.csv
job_dir	None
output_location	gs://aihub-content-test/xgboost_classification/output
data_type	csv
fresh_start	True
weight_column	None
number_of_classes	3
num_round	10
early_stopping_rounds	-1
verbosity	1
eta	0.3
gamma	0.01
max_depth	6
min_child_weight	1
max_delta_step	0
subsample	1
colsample_bytree	1
colsample_bylevel	1
colsample_bynode	1
reg_lambda	1
alpha	0
scale_pos_weight	1
objective	multi:softprob
tree_method	auto
remainder	None

	sepal_length	sepal_width	petal_length	petal_width	target
0	-1.0026	0.5281	-1.3565	-1.3176	0.0
1	-1.1223	0.0934	-1.2996	-1.4489	0.0
...	...	...	...	...	...
98	2.4671	1.6146	1.4898	1.0447	2.0
99	0.4331	-1.8623	0.4082	0.3885	1.0

	sepal_length	sepal_width	petal_length	petal_width	target
0	-1.1223	1.1800	-1.3565	-1.4489	0.0
1	2.2278	-0.1239	1.3190	1.4384	2.0
...	...	...	...	...	...
98	0.1938	-1.8623	0.1235	-0.2677	1.0
99	0.9117	-0.1239	0.3512	0.2572	1.0

	0	1	2
0	0.6043	0.1990	0.1967
1	0.6043	0.1990	0.1967
...	...	...	...
98	0.1982	0.2023	0.5995
99	0.1854	0.5682	0.2464

	0	1	2
0	0.6043	0.1990	0.1967
1	0.1982	0.2023	0.5995
...	...	...	...
98	0.1973	0.6048	0.1979
99	0.1973	0.6048	0.1979

Count
	Predicted
Actual	33	0	0
1	32	1
2	2	32

Relative
	Predicted
Actual	1	0	0
1	1	0.03
2	0.06	0.9

	Aggregated metrics
weighted value	0.97	0.97	0.9703	0.97

		Per class metrics
Label	0	1.0000	1.0000	1.0000	33
1	0.9412	0.9697	0.9552	33
2	0.9697	0.9412	0.9552	34

Count
	Predicted
Actual	33	0	0
1	32	1
2	2	32

Relative
	Predicted
Actual	1	0	0
1	1	0.03
2	0.06	0.9

	Aggregated metrics
weighted value	0.97	0.97	0.9703	0.97

		Per class metrics
Label	0	1.0000	1.0000	1.0000	33
1	0.9412	0.9697	0.9552	33
2	0.9697	0.9412	0.9552	34

	0	1	2	Mean AUC
Area Under Curve	1.0	0.9982	0.9982	0.9988

	0	1	2	Mean AUC
Area Under Curve	1.0	0.9982	0.9982	0.9988

	target	predicted-target	log_loss	sepal_length	sepal_width	petal_length	petal_width
64	1	1	0.3144	0.9117	-0.1239	0.3512	0.2572
62	1	1	0.3144	0.9117	-0.3412	0.4651	0.126
24	1	1	0.3144	-0.4044	-0.9931	0.3512	-0.00525
63	1	1	0.3144	-0.4044	-1.21	0.1235	0.126
22	1	1	0.3144	0.3135	-0.5585	0.1235	0.126
21	1	1	0.3144	-0.2848	-0.7758	0.2374	0.126
30	1	1	0.3144	0.4331	-0.3412	0.2943	0.126
72	1	1	0.3144	-0.1651	-0.9931	-0.1611	-0.2677
27	1	1	0.3144	-0.4044	-1.428	-0.04725	-0.2677
13	1	1	0.3144	0.1938	-1.862	0.1235	-0.2677
12	1	1	0.3144	-0.1651	-0.3412	0.2374	0.126
35	1	1	0.3144	-0.1651	-0.5585	0.1805	0.126
82	1	1	0.3144	1.39	0.3107	0.522	0.2572
96	1	1	0.3144	-0.4044	-1.645	0.1235	0.126
2	1	1	0.3144	-0.2848	-0.1239	0.1805	0.126
95	1	1	0.3144	-0.2848	-1.21	0.0666	-0.1365
38	1	1	0.3144	-0.04547	-0.7758	0.0666	-0.00525
11	1	1	0.3144	-0.4044	-1.428	0.009677	-0.1365
87	1	1	0.3144	-0.1651	-0.5585	0.4082	0.126
8	1	1	0.3144	0.5528	-1.645	0.3512	0.126
36	1	1	0.3144	0.3135	-0.5585	0.522	-0.00525
28	1	1	0.3144	0.3135	-0.1239	0.4651	0.2572
56	0	0	0.3149	-1.481	1.18	-1.584	-1.318
55	0	0	0.3149	-1.601	-1.645	-1.413	-1.186
54	0	0	0.3149	-1.481	0.7454	-1.357	-1.186
50	0	0	0.3149	-0.6437	1.397	-1.3	-1.318
53	0	0	0.3149	-1.122	0.09344	-1.3	-1.318
58	0	0	0.3149	-0.883	0.9627	-1.357	-1.318
51	0	0	0.3149	-0.883	1.615	-1.072	-1.055
0	0	0	0.3149	-1.003	0.5281	-1.357	-1.318

	target	predicted-target	log_loss	sepal_length	sepal_width	petal_length	petal_width
25	1	2	0.6778	0.07418	0.3107	0.5789	0.7822
47	2	1	0.6323	1.63	-0.1239	1.148	0.5197
4	2	1	0.6243	-1.122	-1.21	0.4082	0.6509
69	1	1	0.5047	0.1938	-0.7758	0.7497	0.5197
89	1	1	0.5047	1.031	-0.1239	0.6928	0.6509
9	2	2	0.4526	0.1938	-0.1239	0.5789	0.7822
31	2	2	0.4526	0.4331	-0.5585	0.5789	0.7822
77	1	1	0.4388	-1.122	-1.428	-0.275	-0.2677
83	1	1	0.4388	-0.7634	-0.7758	0.0666	0.2572
71	2	2	0.3968	0.1938	-1.862	0.6928	0.3885
76	2	2	0.3968	0.3135	-0.9931	1.034	0.2572
84	2	2	0.3968	0.5528	-0.5585	0.7497	0.3885
70	1	1	0.3511	0.1938	-0.3412	0.4082	0.3885
99	1	1	0.3511	0.4331	-1.862	0.4082	0.3885
7	1	1	0.3508	-0.5241	-0.1239	0.4082	0.3885
97	1	1	0.3508	0.07418	-0.1239	0.2374	0.3885
39	1	1	0.331	0.6724	0.3107	0.4082	0.3885
66	1	1	0.331	1.031	0.09344	0.522	0.3885
46	2	2	0.3195	0.5528	0.5281	1.262	1.701
40	2	2	0.3195	1.031	-0.1239	0.8066	1.438
98	2	2	0.3195	2.467	1.615	1.49	1.045
44	2	2	0.3195	0.6724	0.09344	0.9774	0.7822
52	2	2	0.3195	0.5528	-0.7758	0.6359	0.7822
67	2	2	0.3195	1.63	1.18	1.319	1.701
33	2	2	0.3195	0.3135	-0.1239	0.6359	0.7822
32	2	2	0.3195	0.5528	-1.21	0.6928	0.9134
57	2	2	0.3195	-0.04547	-0.7758	0.7497	0.9134
60	2	2	0.3195	0.7921	-0.1239	0.8066	1.045
61	2	2	0.3195	0.07418	-0.1239	0.7497	0.7822
3	2	2	0.3195	2.228	-0.9931	1.774	1.438

	target	predicted-target	log_loss	sepal_length	sepal_width	petal_length	petal_width
99	1	1	0.3144	0.9117	-0.1239	0.3512	0.2572
43	1	1	0.3144	-0.1651	-0.5585	0.4082	0.126
44	1	1	0.3144	-0.1651	-0.3412	0.2374	0.126
47	1	1	0.3144	0.3135	-0.1239	0.4651	0.2572
98	1	1	0.3144	0.1938	-1.862	0.1235	-0.2677
31	1	1	0.3144	-0.1651	-0.5585	0.1805	0.126
59	1	1	0.3144	-0.2848	-0.7758	0.2374	0.126
25	1	1	0.3144	-0.4044	-1.645	0.1235	0.126
63	1	1	0.3144	0.3135	-0.5585	0.522	-0.00525
22	1	1	0.3144	0.9117	-0.3412	0.4651	0.126
21	1	1	0.3144	0.3135	-0.5585	0.1235	0.126
42	1	1	0.3144	-0.1651	-0.9931	-0.1611	-0.2677
18	1	1	0.3144	1.39	0.3107	0.522	0.2572
71	1	1	0.3144	-0.4044	-0.9931	0.3512	-0.00525
19	1	1	0.3144	0.4331	-0.3412	0.2943	0.126
73	1	1	0.3144	-0.4044	-1.21	0.1235	0.126
11	1	1	0.3144	-0.04547	-0.7758	0.0666	-0.00525
3	1	1	0.3144	0.5528	-1.645	0.3512	0.126
90	1	1	0.3144	-0.2848	-0.1239	0.1805	0.126
79	1	1	0.3144	-0.4044	-1.428	0.009677	-0.1365
6	1	1	0.3144	-0.2848	-1.21	0.0666	-0.1365
14	1	1	0.3144	-0.4044	-1.428	-0.04725	-0.2677
96	0	0	0.3149	-0.6437	1.397	-1.3	-1.318
95	0	0	0.3149	-1.242	0.7454	-1.072	-1.318
45	0	0	0.3149	-1.362	0.3107	-1.413	-1.318
50	0	0	0.3149	-1.003	0.5281	-1.357	-1.318
86	0	0	0.3149	-0.5241	0.7454	-1.186	-1.318
51	0	0	0.3149	-1.481	0.09344	-1.3	-1.318
52	0	0	0.3149	-1.481	0.7454	-1.357	-1.186
57	0	0	0.3149	-0.5241	1.832	-1.413	-1.055

	target	predicted-target	log_loss	sepal_length	sepal_width	petal_length	petal_width
76	1	2	0.6778	0.07418	0.3107	0.5789	0.7822
61	2	1	0.6323	1.63	-0.1239	1.148	0.5197
10	2	1	0.6243	-1.122	-1.21	0.4082	0.6509
16	1	1	0.5047	1.031	-0.1239	0.6928	0.6509
94	1	1	0.5047	0.1938	-0.7758	0.7497	0.5197
35	2	2	0.4526	0.4331	-0.5585	0.5789	0.7822
4	2	2	0.4526	0.1938	-0.1239	0.5789	0.7822
82	1	1	0.4388	-1.122	-1.428	-0.275	-0.2677
55	1	1	0.4388	-0.7634	-0.7758	0.0666	0.2572
58	2	2	0.3968	0.5528	-0.5585	0.7497	0.3885
37	2	2	0.3968	0.1938	-1.862	0.6928	0.3885
26	2	2	0.3968	0.3135	-0.9931	1.034	0.2572
80	1	1	0.3511	0.1938	-0.3412	0.4082	0.3885
48	1	1	0.3511	0.4331	-1.862	0.4082	0.3885
89	1	1	0.3508	0.07418	-0.1239	0.2374	0.3885
60	1	1	0.3508	-0.5241	-0.1239	0.4082	0.3885
72	1	1	0.331	1.031	0.09344	0.522	0.3885
77	1	1	0.331	0.6724	0.3107	0.4082	0.3885
62	2	2	0.3195	1.031	-0.1239	0.8066	1.438
24	2	2	0.3195	0.6724	-0.5585	1.034	1.307
49	2	2	0.3195	1.749	-0.3412	1.433	0.7822
28	2	2	0.3195	2.228	1.615	1.661	1.307
27	2	2	0.3195	0.6724	0.3107	0.8636	1.438
66	2	2	0.3195	1.031	0.5281	1.091	1.176
54	2	2	0.3195	1.51	-0.1239	1.205	1.176
53	2	2	0.3195	0.07418	-0.1239	0.7497	0.7822
29	2	2	0.3195	0.5528	0.5281	1.262	1.701
34	2	2	0.3195	0.4331	0.7454	0.9205	1.438
46	2	2	0.3195	0.6724	0.09344	0.9774	0.7822
36	2	2	0.3195	0.5528	-1.21	0.6928	0.9134

Deployment parameters



In [ ]:

    
#@markdown ---
model = 'xgboost_iris' #@param {type:"string"}
version = 'v2' #@param {type:"string"}
#@markdown ---

Deploy the model for serving

https://cloud.google.com/ai-platform/prediction/docs/deploying-models



In [ ]:

    
# the exact location of the model is in model_uri.txt
with tf.io.gfile.GFile(os.path.join(output_location, 'model_uri.txt')) as f:
  model_uri = f.read().replace('/model.bst', '')

# create a model
! gcloud ai-platform models create $model --regions $REGION

# create a version
! gcloud ai-platform versions create $version \
  --model $model \
  --runtime-version 1.15 \
  --origin $model_uri \
  --framework XGBOOST \
  --project $PROJECT_ID

Use the endpoint for online predictions



In [8]:

    
# format the data for serving
instances = validation.drop(columns='target').values.tolist()
validation_targets = validation['target']
display(instances[:2])

service = discovery.build('ml', 'v1')
name = 'projects/{project}/models/{model}/versions/{version}'.format(project=PROJECT_ID,
                                                                     model=model,
                                                                     version=version)
body = {'instances': instances}

response = service.projects().predict(name=name, body=body).execute()
if 'error' in response:
    raise RuntimeError(response['error'])

class_probabilties = [row for row in response['predictions']]
predicted_classes = np.array(class_probabilties).argmax(axis=1)
accuracy = (predicted_classes == validation_targets).mean()
print('Accuracy of the predictions: {}'.format(accuracy))









    





[[-1.282911366184447,
  0.7644647138392315,
  -1.2387288279409092,
  -1.3596382404799041],
 [0.28014840517429507,
  -0.5349087374455808,
  0.5203455498038544,
  -0.013461764757226784]]






    



/Users/evo/Library/Python/3.7/lib/python/site-packages/google/auth/_default.py:66: UserWarning:

Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/







    



Accuracy of the predictions: 0.98

Inspect the ML model



In [ ]:

    
import witwidget
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

config_builder = WitConfigBuilder(examples=validation.values.tolist(),
                                  feature_names=validation.columns.tolist())
config_builder.set_ai_platform_model(project=PROJECT_ID,
                                     model=model,
                                     version=version)
config_builder.set_model_type('classification')
config_builder.set_target_feature('target')
WitWidget(config_builder)

Cleaning up

To clean up all GCP resources used in this project, you can delete the GCP project you used for the tutorial.



In [ ]:

    
# Delete model version resource
! gcloud ai-platform versions delete $version --quiet --model $model 

# Delete model resource
! gcloud ai-platform models delete $model --quiet

# If training job is still running, cancel it
! gcloud ai-platform jobs cancel $job_name --quiet

# Delete Cloud Storage objects that were created
! gsutil -m rm -r $BUCKET_NAME

	Aggregated metrics
	accuracy	f1-score	precision	recall
weighted value	0.97	0.97	0.9703	0.97