LAB 3c: BigQuery ML Model Deep Neural Network.

Learning Objectives

  1. Create and evaluate DNN model with BigQuery ML
  2. Create and evaluate DNN model with feature engineering with ML.TRANSFORM.
  3. Calculate predictions with BigQuery's ML.PREDICT

Introduction

In this notebook, we will create multiple deep neural network models to predict the weight of a baby before it is born, using first no feature engineering and then the feature engineering from the previous lab using BigQuery ML.

We will create and evaluate a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculate predictions with BigQuery's ML.PREDICT. If you need a refresher, you can go back and look how we made a baseline model in the notebook BQML Baseline Model or how we combined linear models with feature engineering in the notebook BQML Linear Models with Feature Engineering.

Load necessary libraries

Check that the Google BigQuery library is installed and if not, install it.


In [ ]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

In [1]:
%%bash
sudo pip freeze | grep google-cloud-bigquery==1.6.1 || \
sudo pip install google-cloud-bigquery==1.6.1


google-cloud-bigquery==1.6.1

Verify tables exist

Run the following cells to verify that we have previously created the dataset and data tables. If not, go back to lab 1b_prepare_data_babyweight to create them.


In [2]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_train
LIMIT 0


Out[2]:
weight_pounds is_male mother_age plurality gestation_weeks

In [3]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_eval
LIMIT 0


Out[3]:
weight_pounds is_male mother_age plurality gestation_weeks

Model 4: Increase complexity of model using DNN_REGRESSOR

DNN_REGRESSOR is a new regression model_type vs. the LINEAR_REG that we have been using in previous labs.

  • MODEL_TYPE="DNN_REGRESSOR"

  • hidden_units: List of hidden units per layer; all layers are fully connected. Number of elements in the array will be the number of hidden layers. The default value for hidden_units is [Min(128, N / (𝜶(Ni+No)))] (1 hidden layer), with N the training data size, Ni, No the input layer and output layer units, respectively, 𝜶 is constant with value 10. The upper bound of the rule will make sure the model won’t be over fitting. Note that, we currently have a model size limitation to 256MB.

  • dropout: Probability to drop a given coordinate during training; dropout is a very common technique to avoid overfitting in DNNs. The default value is zero, which means we will not drop out any coordinate during training.

  • batch_size: Number of samples that will be served to train the network for each sub iteration. The default value is Min(1024, num_examples) to balance the training speed and convergence. Serving all training data in each sub-iteration may lead to convergence issues, and is not advised.

Create DNN_REGRESSOR model

Let's train a DNN regressor model in BQ using MODEL_TYPE=DNN_REGRESSOR with 2 hidden layers with 64 and 32 neurons each (HIDDEN_UNITS=[64, 32]) and a batch size of 32 (BATCH_SIZE=32):


In [4]:
%%bigquery
CREATE OR REPLACE MODEL
    babyweight.model_4
OPTIONS (
    MODEL_TYPE="DNN_REGRESSOR",
    HIDDEN_UNITS=[64, 32],
    BATCH_SIZE=32,
    INPUT_LABEL_COLS=["weight_pounds"],
    DATA_SPLIT_METHOD="NO_SPLIT") AS

SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks
FROM
    babyweight.babyweight_data_train


Out[4]:

Get training information and evaluate

Let's first look at our training statistics.


In [5]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.model_4)


Out[5]:
training_run iteration loss eval_loss learning_rate duration_ms
0 0 4 1.738382 2.713294 0.1 644721
1 0 3 2.405464 2.604605 0.1 606809
2 0 2 0.732892 2.641865 0.1 599237
3 0 1 1.014293 2.191636 0.1 618610
4 0 0 1.001003 2.943201 0.1 830424

Now let's evaluate our trained model on our eval dataset.


In [6]:
%%bigquery
SELECT
    *
FROM
    ML.EVALUATE(MODEL babyweight.model_4,
    (
    SELECT
        weight_pounds,
        is_male,
        mother_age,
        plurality,
        gestation_weeks
    FROM
        babyweight.babyweight_data_eval
    ))


Out[6]:
mean_absolute_error mean_squared_error mean_squared_log_error median_absolute_error r2_score explained_variance
0 0.803307 1.065722 0.018036 0.658235 0.387528 0.390166

Let's use our evaluation's mean_squared_error to calculate our model's RMSE.


In [7]:
%%bigquery
SELECT
    SQRT(mean_squared_error) AS rmse
FROM
    ML.EVALUATE(MODEL babyweight.model_4,
    (
    SELECT
        weight_pounds,
        is_male,
        mother_age,
        plurality,
        gestation_weeks
    FROM
        babyweight.babyweight_data_eval
    ))


Out[7]:
rmse
0 1.032338

Final Model: Apply the TRANSFORM clause

Before we perform our prediction, we should encapsulate the entire feature set in a TRANSFORM clause as we did in the last notebook. This way we can have the same transformations applied for training and prediction without modifying the queries.

Let's apply the TRANSFORM clause to the final model and run the query.


In [1]:
%%bigquery
CREATE OR REPLACE MODEL
    babyweight.final_model

TRANSFORM(
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks,
    ML.FEATURE_CROSS(
        STRUCT(
            is_male,
            ML.BUCKETIZE(
                mother_age,
                GENERATE_ARRAY(15, 45, 1)
            ) AS bucketed_mothers_age,
            plurality,
            ML.BUCKETIZE(
                gestation_weeks,
                GENERATE_ARRAY(17, 47, 1)
            ) AS bucketed_gestation_weeks)
    ) AS crossed)

OPTIONS (
    MODEL_TYPE="DNN_REGRESSOR",
    HIDDEN_UNITS=[64, 32],
    BATCH_SIZE=32,
    INPUT_LABEL_COLS=["weight_pounds"],
    DATA_SPLIT_METHOD="NO_SPLIT") AS

SELECT
    *
FROM
    babyweight.babyweight_data_train


Out[1]:

Let's first look at our training statistics.


In [2]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.final_model)


Out[2]:
training_run iteration loss eval_loss learning_rate duration_ms
0 0 3 0.681415 2.200278 0.1 147373
1 0 2 0.970840 1.803952 0.1 148089
2 0 1 0.704043 1.594089 0.1 146653
3 0 0 1.400467 2.130462 0.1 216121

Now let's evaluate our trained model on our eval dataset.


In [3]:
%%bigquery
SELECT
    *
FROM
    ML.EVALUATE(MODEL babyweight.final_model,
    (
    SELECT
        *
    FROM
        babyweight.babyweight_data_eval
    ))


Out[3]:
mean_absolute_error mean_squared_error mean_squared_log_error median_absolute_error r2_score explained_variance
0 1.230722 2.346158 0.037347 1.055479 -0.34834 0.215212

Let's use our evaluation's mean_squared_error to calculate our model's RMSE.


In [4]:
%%bigquery
SELECT
    SQRT(mean_squared_error) AS rmse
FROM
    ML.EVALUATE(MODEL babyweight.final_model,
    (
    SELECT
        *
    FROM
        babyweight.babyweight_data_eval
    ))


Out[4]:
rmse
0 1.531717

Predict with final model

Now that you have evaluated your model, the next step is to use it to predict the weight of a baby before it is born, using BigQuery ML.PREDICT function.

Predict from final model using an example from original dataset


In [5]:
%%bigquery
SELECT
    *
FROM
    ML.PREDICT(MODEL babyweight.final_model,
    (
    SELECT
        "true" AS is_male,
        32 AS mother_age,
        "Twins(2)" AS plurality,
        30 AS gestation_weeks
    ))


Out[5]:
predicted_weight_pounds is_male mother_age plurality gestation_weeks
0 3.675936 true 32 Twins(2) 30

Modify above prediction query using example from simulated dataset

Use the feature values you made up above, however set is_male to "Unknown" and plurality to "Multiple(2+)". This is simulating us not knowing the gender or the exact plurality.


In [6]:
%%bigquery
SELECT
    *
FROM
    ML.PREDICT(MODEL babyweight.final_model,
    (
    SELECT
        "Unknown" AS is_male,
        32 AS mother_age,
        "Multiple(2+)" AS plurality,
        30 AS gestation_weeks
    ))


Out[6]:
predicted_weight_pounds is_male mother_age plurality gestation_weeks
0 3.477121 Unknown 32 Multiple(2+) 30

Lab Summary:

In this lab, we created and evaluated a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculated predictions with BigQuery's ML.PREDICT.

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License


In [ ]: