Learning Objectives
In this notebook, we will create multiple deep neural network models to predict the weight of a baby before it is born, using first no feature engineering and then the feature engineering from the previous lab using BigQuery ML.
We will create and evaluate a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculate predictions with BigQuery's ML.PREDICT. If you need a refresher, you can go back and look how we made a baseline model in the notebook BQML Baseline Model or how we combined linear models with feature engineering in the notebook BQML Linear Models with Feature Engineering.
Each learning objective will correspond to a #TODO in this student lab notebook -- try to complete this notebook first and then review the solution notebook.
Check that the Google BigQuery library is installed and if not, install it.
In [ ]:
%%bash
sudo pip freeze | grep google-cloud-bigquery==1.6.1 || \
sudo pip install google-cloud-bigquery==1.6.1
Run the following cells to verify that we have previously created the dataset and data tables. If not, go back to lab 1b_prepare_data_babyweight to create them.
In [ ]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_train
LIMIT 0
In [ ]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_data_eval
LIMIT 0
DNN_REGRESSOR is a new regression model_type vs. the LINEAR_REG that we have been using in previous labs.
MODEL_TYPE="DNN_REGRESSOR"
hidden_units: List of hidden units per layer; all layers are fully connected. Number of elements in the array will be the number of hidden layers. The default value for hidden_units is [Min(128, N / (𝜶(Ni+No)))] (1 hidden layer), with N the training data size, Ni, No the input layer and output layer units, respectively, 𝜶 is constant with value 10. The upper bound of the rule will make sure the model won’t be over fitting. Note that, we currently have a model size limitation to 256MB.
dropout: Probability to drop a given coordinate during training; dropout is a very common technique to avoid overfitting in DNNs. The default value is zero, which means we will not drop out any coordinate during training.
batch_size: Number of samples that will be served to train the network for each sub iteration. The default value is Min(1024, num_examples) to balance the training speed and convergence. Serving all training data in each sub-iteration may lead to convergence issues, and is not advised.
In [ ]:
%%bigquery
CREATE OR REPLACE MODEL
babyweight.model_4
OPTIONS (
# TODO: Add DNN options
INPUT_LABEL_COLS=["weight_pounds"],
DATA_SPLIT_METHOD="NO_SPLIT") AS
SELECT
# TODO: Add base features and label
FROM
babyweight.babyweight_data_train
Let's first look at our training statistics.
In [ ]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.model_4)
Now let's evaluate our trained model on our eval dataset.
In [ ]:
%%bigquery
SELECT
*
FROM
ML.EVALUATE(MODEL babyweight.model_4,
(
SELECT
weight_pounds,
is_male,
mother_age,
plurality,
gestation_weeks
FROM
babyweight.babyweight_data_eval
))
Let's use our evaluation's mean_squared_error
to calculate our model's RMSE.
In [ ]:
%%bigquery
SELECT
SQRT(mean_squared_error) AS rmse
FROM
ML.EVALUATE(MODEL babyweight.model_4,
(
SELECT
weight_pounds,
is_male,
mother_age,
plurality,
gestation_weeks
FROM
babyweight.babyweight_data_eval
))
Let's apply the TRANSFORM clause to the final model and run the query.
In [ ]:
%%bigquery
CREATE OR REPLACE MODEL
babyweight.final_model
TRANSFORM(
weight_pounds,
is_male,
mother_age,
plurality,
gestation_weeks,
# TODO: Add FEATURE CROSS of:
# is_male, bucketed_mother_age, plurality, and bucketed_gestation_weeks
OPTIONS (
# TODO: Add DNN options
INPUT_LABEL_COLS=["weight_pounds"],
DATA_SPLIT_METHOD="NO_SPLIT") AS
SELECT
*
FROM
babyweight.babyweight_data_train
Let's first look at our training statistics.
In [ ]:
%%bigquery
SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.final_model)
Now let's evaluate our trained model on our eval dataset.
In [ ]:
%%bigquery
SELECT
*
FROM
ML.EVALUATE(MODEL babyweight.final_model,
(
SELECT
*
FROM
babyweight.babyweight_data_eval
))
Let's use our evaluation's mean_squared_error
to calculate our model's RMSE.
In [ ]:
%%bigquery
SELECT
SQRT(mean_squared_error) AS rmse
FROM
ML.EVALUATE(MODEL babyweight.final_model,
(
SELECT
*
FROM
babyweight.babyweight_data_eval
))
In [ ]:
%%bigquery
SELECT
*
FROM
ML.PREDICT(MODEL babyweight.final_model,
(
SELECT
# TODO Add base features example from original dataset
))
In [ ]:
%%bigquery
SELECT
*
FROM
ML.PREDICT(MODEL babyweight.final_model,
(
SELECT
# TODO Add base features example from simulated dataset
))
Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License
In [ ]: