In this notebook, we take a previously developed TensorFlow model to predict taxifare rides and package it up so that it can be run in Cloud AI Platform. For now, we'll run this on a small dataset. The model that was developed is rather simplistic, and therefore, the accuracy of the model is not great either. However, this notebook illustrates how to package up a TensorFlow model to run it within Cloud AI Platform.
Later in the course, we will look at ways to make a more effective machine learning model.
Note that:
In [ ]:
import os
PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1
In [ ]:
# for bash
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TFVERSION'] = '1.14' # Tensorflow version
In [ ]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION
Allow the Cloud AI Platform service account to read/write to the bucket containing training data.
In [ ]:
%%bash
PROJECT_ID=$PROJECT
AUTH_TOKEN=$(gcloud auth print-access-token)
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
| python -c "import json; import sys; response = json.load(sys.stdin); \
print(response['serviceAccount'])")
echo "Authorizing the Cloud AI Platform account $SVC_ACCOUNT to access files in $BUCKET"
gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET
gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET # error message (if bucket is empty) can be ignored
gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET
Take your code and put into a standard Python package structure. model.py and task.py containing the Tensorflow code from earlier (explore the directory structure).
In [ ]:
%%bash
## check whether there are anymore TODOs
## exit with 0 to avoid notebook process error
grep TODO taxifare/trainer/*.py; rc=$?
case $rc in
0) ;;
1) echo "No more TODOs!"; exit 0;;
esac
Note the absolute paths below. /content is mapped in Datalab to where the home icon takes you
In [ ]:
%%bash
echo $PWD
rm -rf $PWD/taxi_trained
head -1 $PWD/taxi-train.csv
head -1 $PWD/taxi-valid.csv
In [ ]:
%%bash
rm -rf taxifare.tar.gz taxi_trained
export PYTHONPATH=${PYTHONPATH}:${PWD}/taxifare
python -m trainer.task \
--train_data_paths="${PWD}/taxi-train*" \
--eval_data_paths=${PWD}/taxi-valid.csv \
--output_dir=${PWD}/taxi_trained \
--train_steps=100 --job-dir=./tmp
In [ ]:
%%bash
ls $PWD/taxi_trained/export/exporter/
In [ ]:
%%writefile ./test.json
{"pickuplon": -73.885262,"pickuplat": 40.773008,"dropofflon": -73.987232,"dropofflat": 40.732403,"passengers": 2}
In [ ]:
%%bash
model_dir=$(ls ${PWD}/taxi_trained/export/exporter)
gcloud ai-platform local predict \
--model-dir=${PWD}/taxi_trained/export/exporter/${model_dir} \
--json-instances=./test.json
To activate TensorBoard within the JupyterLab UI navigate to "File" - "New Launcher". Then double-click the 'Tensorboard' icon on the bottom row.
TensorBoard 1 will appear in the new tab. Navigate through the three tabs to see the active TensorBoard. The 'Graphs' and 'Projector' tabs offer very interesting information including the ability to replay the tests.
You may close the TensorBoard tab when you are finished exploring.
In [ ]:
%%bash
rm -rf taxifare.tar.gz taxi_trained
gcloud ai-platform local train \
--module-name=trainer.task \
--package-path=${PWD}/taxifare/trainer \
-- \
--train_data_paths=${PWD}/taxi-train.csv \
--eval_data_paths=${PWD}/taxi-valid.csv \
--train_steps=1000 \
--output_dir=${PWD}/taxi_trained
When I ran it (due to random seeds, your results will be different), the average_loss
(Mean Squared Error) on the evaluation dataset was 187, meaning that the RMSE was around 13.
In [ ]:
!ls $PWD/taxi_trained
First copy the training data to the cloud. Then, launch a training job.
After you submit the job, go to the cloud console (http://console.cloud.google.com) and select AI Platform | Jobs to monitor progress.
Note: Don't be concerned if the notebook stalls (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud. Use the Cloud Console link (above) to monitor the job.
In [ ]:
%%bash
echo $BUCKET
gsutil -m rm -rf gs://${BUCKET}/taxifare/smallinput/
gsutil -m cp ${PWD}/*.csv gs://${BUCKET}/taxifare/smallinput/
In [ ]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/smallinput/taxi_trained
JOBNAME=lab3a_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ai-platform jobs submit training $JOBNAME \
--region=$REGION \
--module-name=trainer.task \
--package-path=${PWD}/taxifare/trainer \
--job-dir=$OUTDIR \
--staging-bucket=gs://$BUCKET \
--scale-tier=BASIC \
--runtime-version=$TFVERSION \
-- \
--train_data_paths="gs://${BUCKET}/taxifare/smallinput/taxi-train*" \
--eval_data_paths="gs://${BUCKET}/taxifare/smallinput/taxi-valid*" \
--output_dir=$OUTDIR \
--train_steps=10000
Don't be concerned if the notebook appears stalled (with a blue progress bar) or returns with an error about being unable to refresh auth tokens. This is a long-lived Cloud job and work is going on in the cloud.
Use the Cloud Console link to monitor the job and do NOT proceed until the job is done.
Find out the actual name of the subdirectory where the model is stored and use it to deploy the model. Deploying model will take up to 5 minutes.
In [ ]:
%%bash
gsutil ls gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/exporter
In [ ]:
%%bash
MODEL_NAME="taxifare"
MODEL_VERSION="v1"
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/exporter | tail -1)
echo "Run these commands one-by-one (the very first time, you'll create a model and then create a version)"
#gcloud ai-platform versions delete ${MODEL_VERSION} --model ${MODEL_NAME}
#gcloud ai-platform models delete ${MODEL_NAME}
gcloud ai-platform models create ${MODEL_NAME} --regions $REGION
gcloud ai-platform versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version $TFVERSION
In [ ]:
%%bash
gcloud ai-platform predict --model=taxifare --version=v1 --json-instances=./test.json
In [ ]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json
credentials = GoogleCredentials.get_application_default()
api = discovery.build('ml', 'v1', credentials=credentials,
discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')
request_data = {'instances':
[
{
'pickuplon': -73.885262,
'pickuplat': 40.773008,
'dropofflon': -73.987232,
'dropofflat': 40.732403,
'passengers': 2,
}
]
}
parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'taxifare', 'v1')
response = api.projects().predict(body=request_data, name=parent).execute()
print ("response={0}".format(response))
I have already followed the steps below and the files are already available. You don't need to do the steps in this comment. In the next chapter (on feature engineering), we will avoid all this manual processing by using Cloud Dataflow.
Go to http://bigquery.cloud.google.com/ and type the query:
SELECT (tolls_amount + fare_amount) AS fare_amount, pickup_longitude AS pickuplon, pickup_latitude AS pickuplat, dropoff_longitude AS dropofflon, dropoff_latitude AS dropofflat, passenger_count*1.0 AS passengers, 'nokeyindata' AS key FROM [nyc-tlc:yellow.trips] WHERE trip_distance > 0 AND fare_amount >= 2.5 AND pickup_longitude > -78 AND pickup_longitude < -70 AND dropoff_longitude > -78 AND dropoff_longitude < -70 AND pickup_latitude > 37 AND pickup_latitude < 45 AND dropoff_latitude > 37 AND dropoff_latitude < 45 AND passenger_count > 0 AND ABS(HASH(pickup_datetime)) % 1000 == 1
Note that this is now 1,000,000 rows (i.e. 100x the original dataset). Export this to CSV using the following steps (Note that I have already done this and made the resulting GCS data publicly available, so you don't need to do it.):
This took 60 minutes and uses as input 1-million rows. The model is exactly the same as above. The only changes are to the input (to use the larger dataset) and to the Cloud MLE tier (to use STANDARD_1 instead of BASIC -- STANDARD_1 is approximately 10x more powerful than BASIC). At the end of the training the loss was 32, but the RMSE (calculated on the validation dataset) was stubbornly at 9.03. So, simply adding more data doesn't help.
In [ ]:
%%bash
XXXXX this takes 60 minutes. if you are sure you want to run it, then remove this line.
OUTDIR=gs://${BUCKET}/taxifare/ch3/taxi_trained
JOBNAME=lab3a_$(date -u +%y%m%d_%H%M%S)
CRS_BUCKET=cloud-training-demos # use the already exported data
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ai-platform jobs submit training $JOBNAME \
--region=$REGION \
--module-name=trainer.task \
--package-path=${PWD}/taxifare/trainer \
--job-dir=$OUTDIR \
--staging-bucket=gs://$BUCKET \
--scale-tier=STANDARD_1 \
--runtime-version=$TFVERSION \
-- \
--train_data_paths="gs://${CRS_BUCKET}/taxifare/ch3/train.csv" \
--eval_data_paths="gs://${CRS_BUCKET}/taxifare/ch3/valid.csv" \
--output_dir=$OUTDIR \
--train_steps=100000
Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License