LAB 3c: BigQuery ML Model Deep Neural Network.

Learning Objectives

  1. Create and evaluate DNN model with BigQuery ML
  2. Create and evaluate DNN model with feature engineering with ML.TRANSFORM.
  3. Calculate predictions with BigQuery's ML.PREDICT

Introduction

In this notebook, we will create multiple deep neural network models to predict the weight of a baby before it is born, using first no feature engineering and then the feature engineering from the previous lab using BigQuery ML.

We will create and evaluate a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculate predictions with BigQuery's ML.PREDICT. If you need a refresher, you can go back and look how we made a baseline model in the notebook BQML Baseline Model or how we combined linear models with feature engineering in the notebook BQML Linear Models with Feature Engineering.

Each learning objective will correspond to a #TODO in this student lab notebook -- try to complete this notebook first and then review the solution notebook.

Load necessary libraries

Check that the Google BigQuery library is installed and if not, install it.


In [ ]:
%%bash
sudo pip freeze | grep google-cloud-bigquery==1.6.1 || \
sudo pip install google-cloud-bigquery==1.6.1

Verify tables exist

Run the following cells to verify that we have previously created the dataset and data tables. If not, go back to lab 1b_prepare_data_babyweight to create them.


In [ ]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_train
LIMIT 0

In [ ]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_eval
LIMIT 0

Lab Task #1: Model 4: Increase complexity of model using DNN_REGRESSOR

DNN_REGRESSOR is a new regression model_type vs. the LINEAR_REG that we have been using in previous labs.

  • MODEL_TYPE="DNN_REGRESSOR"

  • hidden_units: List of hidden units per layer; all layers are fully connected. Number of elements in the array will be the number of hidden layers. The default value for hidden_units is [Min(128, N / (𝜶(Ni+No)))] (1 hidden layer), with N the training data size, Ni, No the input layer and output layer units, respectively, 𝜶 is constant with value 10. The upper bound of the rule will make sure the model won’t be over fitting. Note that, we currently have a model size limitation to 256MB.

  • dropout: Probability to drop a given coordinate during training; dropout is a very common technique to avoid overfitting in DNNs. The default value is zero, which means we will not drop out any coordinate during training.

  • batch_size: Number of samples that will be served to train the network for each sub iteration. The default value is Min(1024, num_examples) to balance the training speed and convergence. Serving all training data in each sub-iteration may lead to convergence issues, and is not advised.

Create DNN_REGRESSOR model

Change model type to use DNN_REGRESSOR, add a list of integer HIDDEN_UNITS, and add an integer BATCH_SIZE.

  • Hint: Create a model_4.

In [ ]:
%%bigquery
CREATE OR REPLACE MODEL
    babyweight.model_4
OPTIONS (
    # TODO: Add DNN options
    INPUT_LABEL_COLS=["weight_pounds"],
    DATA_SPLIT_METHOD="NO_SPLIT") AS

SELECT
    # TODO: Add base features and label
FROM
    babyweight.babyweight_data_train

Get training information and evaluate

Let's first look at our training statistics.


In [ ]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.model_4)

Now let's evaluate our trained model on our eval dataset.


In [ ]:
%%bigquery
SELECT
    *
FROM
    ML.EVALUATE(MODEL babyweight.model_4,
    (
    SELECT
        weight_pounds,
        is_male,
        mother_age,
        plurality,
        gestation_weeks
    FROM
        babyweight.babyweight_data_eval
    ))

Let's use our evaluation's mean_squared_error to calculate our model's RMSE.


In [ ]:
%%bigquery
SELECT
    SQRT(mean_squared_error) AS rmse
FROM
    ML.EVALUATE(MODEL babyweight.model_4,
    (
    SELECT
        weight_pounds,
        is_male,
        mother_age,
        plurality,
        gestation_weeks
    FROM
        babyweight.babyweight_data_eval
    ))

Lab Task #2: Final Model: Apply the TRANSFORM clause

Before we perform our prediction, we should encapsulate the entire feature set in a TRANSFORM clause as we did in the last notebook. This way we can have the same transformations applied for training and prediction without modifying the queries.

Let's apply the TRANSFORM clause to the final model and run the query.


In [ ]:
%%bigquery
CREATE OR REPLACE MODEL
    babyweight.final_model

TRANSFORM(
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks,
    # TODO: Add FEATURE CROSS of:
    # is_male, bucketed_mother_age, plurality, and bucketed_gestation_weeks

OPTIONS (
    # TODO: Add DNN options
    INPUT_LABEL_COLS=["weight_pounds"],
    DATA_SPLIT_METHOD="NO_SPLIT") AS

SELECT
    *
FROM
    babyweight.babyweight_data_train

Let's first look at our training statistics.


In [ ]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.final_model)

Now let's evaluate our trained model on our eval dataset.


In [ ]:
%%bigquery
SELECT
    *
FROM
    ML.EVALUATE(MODEL babyweight.final_model,
    (
    SELECT
        *
    FROM
        babyweight.babyweight_data_eval
    ))

Let's use our evaluation's mean_squared_error to calculate our model's RMSE.


In [ ]:
%%bigquery
SELECT
    SQRT(mean_squared_error) AS rmse
FROM
    ML.EVALUATE(MODEL babyweight.final_model,
    (
    SELECT
        *
    FROM
        babyweight.babyweight_data_eval
    ))

Lab Task #3: Predict with final model.

Now that you have evaluated your model, the next step is to use it to predict the weight of a baby before it is born, using BigQuery ML.PREDICT function.

Predict from final model using an example from original dataset


In [ ]:
%%bigquery
SELECT
    *
FROM
    ML.PREDICT(MODEL babyweight.final_model,
    (
    SELECT
        # TODO Add base features example from original dataset
    ))

Modify above prediction query using example from simulated dataset

Use the feature values you made up above, however set is_male to "Unknown" and plurality to "Multiple(2+)". This is simulating us not knowing the gender or the exact plurality.


In [ ]:
%%bigquery
SELECT
    *
FROM
    ML.PREDICT(MODEL babyweight.final_model,
    (
    SELECT
        # TODO Add base features example from simulated dataset
    ))

Lab Summary:

In this lab, we created and evaluated a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculated predictions with BigQuery's ML.PREDICT.

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License


In [ ]: