BigQuery ML provides a user-friendly interface to easily create ML models from any size data sets as well as make batch predictions at scale. To make online predictions, BQML provides an export function to export TensorFlow SavedModel. This tutorial will cover the steps in the following three parts:
In [ ]:
PROJECT_ID=[] #enter your project name
In [ ]:
!gcloud config set project $PROJECT_ID
In [ ]:
!bq show natality || bq mk natality
In [ ]:
!bq ls -a
In [ ]:
!bq show --format=pretty natality
In [ ]:
%load_ext google.cloud.bigquery
In [ ]:
%%bigquery
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age
FROM
`bigquery-public-data.samples.natality`
WHERE
weight_pounds IS NOT NULL
LIMIT 5
The following query creates a view that adds a computed column. This will be used for splitting the data into three parts. Training will comprise of the largest part and will be used by BQML to train the model. Internally, BQML will split this into training and validation. Evaluation is the holdout set to test the model performance. Prediction will be used by AI Platform Prediction after the model is extracted from BQML and hosted in AI Platform Prediction.
In [ ]:
%%bigquery
CREATE OR REPLACE VIEW
`natality.input_view` AS
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age,
CASE
WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) < 8 THEN 'training'
WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) = 8 THEN 'evaluation'
WHEN MOD(CAST(ROUND(weight_pounds*100) as int64), 10) = 9 THEN 'prediction'
END AS datasplit
FROM
`bigquery-public-data.samples.natality`
WHERE
weight_pounds IS NOT NULL
In [ ]:
%%bigquery
SELECT *
FROM
`natality.input_view`
LIMIT 100
In [ ]:
%%bigquery
CREATE OR REPLACE MODEL
`natality.natality_model`
OPTIONS
(model_type='linear_reg',
input_label_cols=['weight_pounds']) AS
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age
FROM
`natality.input_view`
WHERE
datasplit = 'training'
List models with the bq ls -m option.
In [ ]:
!bq ls -m --format=pretty natality
Get details about natality_model.
In [ ]:
!bq show -m --format=pretty natality.natality_model
In [ ]:
%%bigquery
SELECT
*
FROM
ML.EVALUATE(MODEL `natality.natality_model`,
( SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age
FROM
`natality.input_view`
WHERE
datasplit = 'evaluation'))
In [ ]:
%%bigquery
SELECT
predicted_weight_pounds
FROM
ML.PREDICT(MODEL `natality.natality_model`,
(
SELECT
is_male,
gestation_weeks,
mother_age
FROM
`natality.input_view`
WHERE
datasplit = 'prediction'
LIMIT 100))
AI Platform Prediction supports a model/version struture where one model can have multiple versions. BQML model structure does not include versions. In this part of the tutorial, the BQML model will be extracted into a version directory in GCS. Later, AI Platform will pick this up as a version of a model.
In [ ]:
import os
import random
REGION=[] #e.g. 'us-central1'
MODEL_VERSION=[] #e.g. 'v1'
MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,str(random.randrange(1000,10000)))
MODEL_PATH=os.path.join(MODEL_BUCKET,'export/natality_model',MODEL_VERSION)
In [ ]:
!gsutil mb $MODEL_BUCKET
In [ ]:
import os
stream = os.popen("bq extract -m natality.natality_model {}".format(MODEL_PATH))
In [ ]:
stream.read()
In [ ]:
!bq ls -m natality
In [ ]:
!bq extract -m natality.natality_model $MODEL_PATH
The saved_model_cli output shows the expected input tensors to the model. This will become important in structuring the prediction request packets. As seen below, the model will expect requests in a format of: {'is_male': [DT_BOOL], 'gestation_weeks': [DT_FLOAT], 'mother_age': [DT_FLOAT], 'mother_race': [DT_STRING]}
In [ ]:
!saved_model_cli show --dir $MODEL_PATH --tag_set serve --signature_def serving_default
In [ ]:
!gcloud ai-platform models create natality_model --regions=$REGION
In [ ]:
!gcloud ai-platform versions create v1 --model=natality_model --framework=tensorflow --runtime-version=1.15 --origin=$MODEL_PATH
In [ ]:
from google.cloud import bigquery
client = bigquery.Client()
In [ ]:
sql = """
SELECT
is_male,
gestation_weeks,
mother_age
FROM
`natality.input_view`
WHERE
datasplit = 'prediction'
LIMIT 100
"""
df = client.query(sql).to_dataframe()
In [ ]:
request=df.to_dict(orient='records')[2:7]
In [ ]:
request
In [ ]:
!pip install --upgrade google-api-python-client
In [ ]:
import googleapiclient.discovery
In [ ]:
service = googleapiclient.discovery.build('ml','v1')
In [ ]:
name = 'projects/{}/models/natality_model/versions/{}'.format(PROJECT_ID, MODEL_VERSION)
In [ ]:
response=service.projects().predict(
name=name,
body={'instances':request}
).execute()
In [ ]:
response
In [ ]:
!gcloud ai-platform versions delete v1 --model=natality_model --quiet
In [ ]:
!gcloud ai-platform models delete natality_model --quiet
In [ ]:
!bq rm -r -f natality
In [ ]:
!gsutil rm -r $MODEL_BUCKET
In [ ]: