By deploying or using this software you agree to comply with the AI Hub Terms of Service and the Google APIs Terms of Service. To the extent of a direct conflict of terms, the AI Hub Terms of Service will control.
This notebook provides an example workflow of using the ResNet ML container for training an Image classification ML model.
The flower dataset container 3670 images with 5 classes: daisy, dandelion, roses, sunflowers, and tulips. The target is an integer variable with class IDs from 1 to 5. They are split to 3300 training and 370 validation images. It is publically available at: gs://cloud-training-demos/tpu/resnet/data
.
The goal of this notebook is to go through a common training workflow:
This tutorial uses billable components of Google Cloud Platform (GCP):
Learn about Cloud AI Platform pricing and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.
Otherwise, make sure your environment meets this notebook's requirements. You need the following:
The Google Cloud guide to Setting up a Python development environment and the Jupyter installation guide provide detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:
Install virtualenv and create a virtual environment that uses Python 3.
Activate that environment and run pip install jupyter
in a shell to install
Jupyter.
Run jupyter notebook
in a shell to launch Jupyter.
Open this notebook in the Jupyter Notebook Dashboard.
The following steps are required, regardless of your notebook environment.
Select or create a GCP project.. When you first create an account, you get a $300 free credit towards your compute/storage costs.
Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.
Note: Jupyter runs lines prefixed with !
as shell commands, and it interpolates Python variables prefixed with $
into these commands.
In [ ]:
PROJECT_ID = "[your-project-id]" #@param {type:"string"}
! gcloud config set project $PROJECT_ID
If you are using Colab, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.
Otherwise, follow these steps:
In the GCP Console, go to the Create service account key page.
From the Service account drop-down list, select New service account.
In the Service account name field, enter a name.
From the Role drop-down list, select Machine Learning Engine > AI Platform Admin and Storage > Storage Object Admin.
Click Create. A JSON file that contains your key downloads to your local environment.
Enter the path to your service account key as the
GOOGLE_APPLICATION_CREDENTIALS
variable in the cell below and run the cell.
In [ ]:
import sys
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.
if 'google.colab' in sys.modules:
from google.colab import auth as google_auth
google_auth.authenticate_user()
# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
%env GOOGLE_APPLICATION_CREDENTIALS ''
The following steps are required, regardless of your notebook environment.
You need to have a "workspace" bucket that will hold the dataset and the output from the ML Container. Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.
You may also change the REGION
variable, which is used for operations
throughout the rest of this notebook. Make sure to choose a region where Cloud AI Platform services are available. You may not use a Multi-Regional Storage bucket for training with AI Platform.
In [ ]:
BUCKET_NAME = "[your-bucket-name]" #@param {type:"string"}
REGION = 'us-central1' #@param {type:"string"}
Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.
In [ ]:
! gsutil mb -l $REGION gs://$BUCKET_NAME
Finally, validate access to your Cloud Storage bucket by examining its contents:
In [ ]:
! gsutil ls -al gs://$BUCKET_NAME
In [ ]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import base64
import requests
import tensorflow as tf
from IPython.core.display import HTML
from googleapiclient import discovery
In [ ]:
output_location = os.path.join('gs://', BUCKET_NAME, 'output')
job_name = "resnet_{}".format(time.strftime("%Y%m%d%H%M%S"))
!gcloud beta ai-platform jobs submit training $job_name \
--master-image-uri gcr.io/aihub-c2t-containers/kfp-components/oob_algorithm/resnet:latest \
--region $REGION \
--scale-tier CUSTOM \
--master-machine-type standard \
--worker-machine-type cloud_tpu \
--worker-count 1 \
--tpu-tf-version 1.14 \
-- \
--output-location $output_location \
--data gs://cloud-training-demos/tpu/resnet/data \
--number-of-classes 5 \
--training-steps 100000
In [ ]:
try:
%load_ext tensorboard
%tensorboard --logdir {output_location}
except:
!tensorboard --logdir {output_location}
In [11]:
if not tf.io.gfile.exists(os.path.join(output_location, 'report.html')):
raise RuntimeError('The file report.html was not found. Did the training job finish?')
with tf.io.gfile.GFile(os.path.join(output_location, 'report.html')) as f:
display(HTML(f.read()))
In [ ]:
#@markdown ---
model = 'resnet' #@param {type:"string"}
version = 'v1' #@param {type:"string"}
#@markdown ---
In [ ]:
# the exact location of the model is in model_uri.txt
with tf.io.gfile.GFile(os.path.join(output_location, 'model_uri.txt')) as f:
model_uri = f.read()
# create a model
! gcloud ai-platform models create $model --regions $REGION
# create a version
! gcloud ai-platform versions create $version \
--model $model \
--runtime-version 1.15 \
--origin $model_uri \
--project $PROJECT_ID
In [18]:
!wget --output-document /tmp/image.jpeg \
https://fyf.tac-cdn.net/images/products/large/F-395.jpg
# read the image, decode, resize and base 64 encode it
with tf.Session() as sess:
img = tf.io.read_file('/tmp/image.jpeg')
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, [192, 192])
img = tf.image.convert_image_dtype(img, tf.uint8)
img = tf.image.encode_jpeg(img)
encoded_image = sess.run(img)
encoded_image = base64.b64encode(encoded_image).decode()
encoded_image[:200]
Out[18]:
In [20]:
# make a REST call for online inference
service = discovery.build('ml', 'v1')
name = 'projects/{project}/models/{model}/versions/{version}'.format(project=PROJECT_ID,
model=model,
version=version)
body = {'instances': {'input': {'b64': encoded_image}}}
response = service.projects().predict(name=name, body=body).execute()
if 'error' in response:
raise RuntimeError(response['error'])
print('predicted probabilities: {}'.format(response['predictions'][0]['probabilities']))
print('predicted class: {}'.format(response['predictions'][0]['classes']+1))