In this notebook you will modify an implementation of a convolutional neural network to prepare it for training, validate the implementation by training it in your local environment, and then train and deploy the model using Cloud ML Engine.
In [ ]:
import os
PROJECT = 'my-project-id' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'my-bucket-name' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1
MODEL_TYPE='cnn' # 'dnn' or 'cnn'
# do not change these
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['MODEL_TYPE'] = MODEL_TYPE
os.environ['TFVERSION'] = '1.8' # Tensorflow version
In [ ]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION
Complete the TODOs before executing the next cell to train the model in your local environment
Modify the model.py
file containing the convolutional neural network layer definitions in the cnn_model
method per the instructions in the TODOS. Make sure to use the hyperparameter values specified by nfil2
and ksize2
variables. Open fashionmodel/trainer to find the model.py
file.
In [ ]:
%bash
rm -rf fashionmodel.tar.gz fashion_trained
gcloud ml-engine local train \
--module-name=trainer.task \
--package-path=${PWD}/fashionmodel/trainer \
-- \
--output_dir=${PWD}/fashion_trained \
--train_steps=1 \
--learning_rate=0.01 \
--model=$MODEL_TYPE
Make sure that local training completed successfully before training using Cloud ML Engine
Note that GPU speed up depends on the model type. You'll notice that more complex models train substantially faster on GPUs. When you are working with simple models that take just seconds to minutes to train on a single node, keep in mind that Cloud ML Engine introduces a few minutes of overhead for training job setup & teardown.
In [ ]:
%bash
OUTDIR=gs://${BUCKET}/fashion/trained_${MODEL_TYPE}
JOBNAME=fashion_${MODEL_TYPE}_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
--region=$REGION \
--module-name=trainer.task \
--package-path=${PWD}/fashionmodel/trainer \
--job-dir=$OUTDIR \
--staging-bucket=gs://$BUCKET \
--scale-tier=BASIC_GPU \
--runtime-version=$TFVERSION \
-- \
--output_dir=$OUTDIR \
--train_steps=10000 --learning_rate=0.01 --train_batch_size=512 \
--model=$MODEL_TYPE
In [ ]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://{}/fashion/trained_{}'.format(BUCKET, MODEL_TYPE))
In [ ]:
for pid in TensorBoard.list()['pid']:
TensorBoard().stop(pid)
print 'Stopped TensorBoard with pid {}'.format(pid)
In [ ]:
%bash
MODEL_NAME="fashion"
MODEL_VERSION=${MODEL_TYPE}
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/fashion/trained_${MODEL_TYPE}/export/exporter | tail -1)
echo "Deleting and deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes"
#gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ml-engine models delete ${MODEL_NAME}
gcloud ml-engine models create ${MODEL_NAME} --regions $REGION
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=$TFVERSION
The previous step of deploying the model can take a few minutes. If it is successful, you should see an output similar to this one:
Created ml engine model [projects/qwiklabs-gcp-27eb45524d98e9a5/models/fashion]. Creating version (this might take a few minutes)...... ...................................................................................................................done.
Next, download a local copy of the Fashion MNIST dataset to use with Cloud ML Engine for predictions.
In [ ]:
import tensorflow as tf
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()
LABELS = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
To predict with the model, save one of the test images as a JavaScript Object Notation (JSON) file. Also, take a look at it as a graphic and notice the expected class value in the title.
In [ ]:
HEIGHT=28
WIDTH=28
IMGNO=12 #CHANGE THIS to get different images
#Convert raw image data to a test.json file and persist it to disk
import json, codecs
jsondata = {'image': test_images[IMGNO].reshape(HEIGHT, WIDTH).tolist()}
json.dump(jsondata, codecs.open('test.json', 'w', encoding='utf-8'))
#Take a look at a sample image and the correct label from the test dataset
import matplotlib.pyplot as plt
plt.imshow(test_images[IMGNO].reshape(HEIGHT, WIDTH))
title = plt.title('{} / Class #{}'.format(LABELS[test_labels[IMGNO]], test_labels[IMGNO]))
Here's how the same image looks when it saved in the test.json file for use with the prediction API.
In [ ]:
%bash
cat test.json
Send the file to the prediction service and check whether the model you trained returns the correct prediction.
In [ ]:
%bash
gcloud ml-engine predict \
--model=fashion \
--version=${MODEL_TYPE} \
--json-instances=./test.json
Here is what my prediction service returned based on a model that I trained with Cloud ML Engine. Notice that the model predicts a probability of roughly 0.87 for the sneaker (class #7).
CLASSES PROBABILITIES 7 [2.627301398661075e-08, 1.843199037843135e-09, 6.106044111220399e-06, 5.581538289334276e-07, 9.015485602503759e-08, 0.09837665408849716, 3.8953788816797896e-07, 0.8761420249938965, 0.0034074552822858095, 0.02206665836274624]
# Copyright 2017 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.
In [ ]: