LAB 2: AutoML Tables Babyweight Training.

Learning Objectives

  1. Setup AutoML Tables
  2. Create and import AutoML Tables dataset from BigQuery
  3. Analyze AutoML Tables dataset
  4. Train AutoML Tables model
  5. Check evaluation metrics
  6. Deploy model
  7. Make batch predictions
  8. Make online predictions

Introduction

In this notebook, we will use AutoML Tables to train a model to predict the weight of a baby before it is born. We will use the AutoML Tables UI to create a training dataset from BigQuery and will then train, evaluate, and predict with a Auto ML Tables model.

In this lab, we will setup AutoML Tables, create and import an AutoML Tables dataset from BigQuery, analyze AutoML Tables dataset, train an AutoML Tables model, check evaluation metrics of trained model, deploy trained model, and then finally make both batch and online predictions using the trained model.

Each learning objective will correspond to a series of steps to complete in this student lab notebook.

Verify tables exist

Run the following cells to verify that we previously created the dataset and data tables. If not, go back to lab 1b_prepare_data_babyweight to create them.


In [ ]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

In [ ]:
%%bigquery
-- LIMIT 0 is a free query; this allows us to check that the table exists.
SELECT * FROM babyweight.babyweight_augmented_data
LIMIT 0

Setup AutoML Tables

Step 1: Open AutoML Tables

Go the GCP console and open the console menu in the upper left corner. Then scroll down to the bottom to get to the Artificial Intelligence section. Click on Tables to open AutoML Tables.

Step 2: Enable API

If you haven't already enabled the AutoML Tables API, then you'll see the screen below. Make sure to click the ENABLE API button.

Step 3: Get started

If this is your first time using AutoML Tables, then you'll see the screen below. Make sure to click the GET STARTED button.

Create and import AutoML Tables dataset from BigQuery

Step 4: Datasets

You should now be on AutoML Table's Datasets page. This is where all imported datasets are shown. We'll want to add our babyweight dataset, so click the + NEW DATASET button.

Step 5: Create new dataset

We need to give our new dataset a unique name. I named mine babyweight_automl but feel free to name yours whatever you want. When you are done choosing a unique name, click the CREATE DATASET button.

Step 6: Import your data

Now that we've created a dataset, let's import our data so that AutoML Tables can use it for training. Our data is currently already in BigQuery, so we will select the radio button Import data from BigQuery. This will give us some text boxes to fill in with our data's BigQuery Project ID, BigQuery Dataset ID, and BigQuery Table or View ID. Once you are done entering those in, click the IMPORT button.

Step 7: Wait for your data to be imported

AutoML Tables should now be importing your data from BigQuery. Depending on the size of your dataset, this could take a while, so this step is about just waiting and being patient.

Step 8: Select target column

Awesome! Our dataset has been successfully imported! You can now look at the dataset's schema which will show for each column the column name, the data type, and its nullability. Out of these columns we need to select which column is we want to be our target or label column. Clikc the drop down for Target column and choose weight_pounds.

Step 9: Approve target and schema

When you successfully choose your target column you will see a green checkmark and the tag target added to the column row on the right. It will also disable its nullability since machine learning doesn't do too well with null labels. Once you've verified everything is correct with your target column and the schema, then click the CONTINUE button.

Analyze AutoML Tables dataset

Step 10: Analyze

The next tab we're brought to is ANALYZE. This is where some basic statistics are shown. We can see that we have 6 features, 4 of which are numeric and 2 of which are categorical. We can also see that there are 0% missing and 0 invalid values across all of our columns, which is great! We can also see the number of distinct values which we can compare with our expectations. Additionally, the linear correlation with the target column, weight_pounds in this instance, is shown as well as the mean and standard deviation for each column. Once you are satisfied with the analysis, then click the TRAIN tab.

Train AutoML Tables model

Step 11: Setup training

We are almost ready to train our model. It took a lot of steps to get here but those were mainly to import the data and make sure the data is alright. As we all know, data is extremely important for ML and if it is not what we expect then our model will also not perfom as we expect. Garbage in, garbage out. We need to set the Training budget which is the maximum number of node hours to spend training our model. Thankfully, if improvement stops before that, then the training will stop and you'll only be charged for the actual node hours you used. For this dataset, I got decent results with a budget of just 1 to 3 node hours. We also need to select which features we want to use in our model out of the superset of features by selecting the Input feature selection dropdown where we will see details in the next step below. Once all of that is set the click the TRAIN MODEL button.

Step 12: Input feature selection

We imported six columns, one of which, weight_pounds, we have set aside to be our target or label column. This leaves five columns leftover. Clicking the Input feature selection dropdown provides you with a list of all of the remaining columns. We want is_male, mother_age, plurality, and gestation_weeks as our four features. hashmonth is leftover from when we did our repeatable splitting in the 2_prepare_babyweight lab. Whatever is selected will be trained with, so please click the checkbox to de-select it.

Step 13: Wait for model to train

Woohoo! Our model is training! We are going to have an awesome model when it finishes! And now we wait. Depending on the size of your dataset, your training budget, and other factors, this could take a while, anywhere from a couple hours to over a day, so this step is about just waiting and being patient. A good thing to do while you are waiting is to keep going through the next labs in this series and then come back to this once lab training completes.

Check evaluation metrics

Step 14: Evaluate model

Yay! Our model is done training! Now we can check the EVALUATE tab and see how well we did. It reminds you what the target was, weight_pounds, what the training was optimized for, RMSE, and then many evaluation metrics like MAE, RMSE, etc. My training run did great with an RMSE of 1.030 after only an hour of training! It really shows you the amazing power of AutoML! Below you can see a feature importance bar chart. gestation_weeks is by far the most important which makes sense because usually the longer someone has been pregnant, the longer the baby has had time to grow, and therefore the heavier the baby weighs.

Deploy model

Step 15: Deploy model for predictions

So if you are satisified with how well our brand new AutoML Tables model trained and evaluated, then you'll probably want to do next what ML is all about; making great predictions! To do that, we'll have to deploy our trained model. If you go to the main Models page for AutoML Tables you'll see your trained model listed. It gives the model name, the dataset used, the problem type, the time of creation, the model size, and whether the model is deployed or not. Since we just finished training our model, Deployed should say No. Click the three vertical dots to the right and then click Deploy model.

Step 16: Deploy model confirmation

You should now see a confirmation box pop up on your screen. This is just a confirmation making sure you really want to deploy your model because then there will be charges depending on the model size and the number of machines used. Please click the DEPLOY button.

Make batch predictions

Step 17: Create batch prediction job

Great! Once it is done deploying, Deployed should say Yes and you can now click your model name and then the PREDICT tab. You'll start out with batch prediction. To make these easy, we can for now just predict on the BigQuery table that we used to train and evaluate on. To do that, select the radio button Data from BigQuery and then enter your BigQuery Project Id, BigQuery Dataset Id, and BigQuery Table or View Id. We could have also used CSVs from Google Cloud Storage. Then we need to select where we want to put our Result. Let's select the radio button BigQuery project and then enter our BigQuery Project Id. We also could have written the results to Google Cloud Storage. Once all of that is set, please click SEND BATCH PREDICTION which will submit a batch prediction job using our trained AutoML Tables model and the data at the location we chose above.

Step 18: Batch prediction job finished

After just a little bit of waiting, your batch predictions should be done. For me with my dataset it took just over 15 minutes. At the bottom of the BATCH PREDICTION page you should see a section labeled Recent Predictions. It shows the data input, where the results are stored, when it was created, and how long it took to process. Let's now move to the BigQuery Console UI to have a look.

Step 19: Batch prediction dataset

On your list of projects on the far left, you will see the project you have been working in. Click the arrow to expand the dropdown list of all of the BigQuery datasets within the project. You'll see a new dataset there which is the same as what was shown for the Results directory from the last step. Expanding that dataset dropdown list you will see two BigQuery tables that have been created: predictions and errors. Let's first look at the predictions table.

Step 20: Batch prediction predictions

The predictions BigQuery table has essentially taken your input data to the batch prediction job and appended three new columns to it. Notice even columns that you did not use as features in your model are still here such as hashmonth. You should see the two prediction_inteval columns for start and end. The last column is the prediction value which for us is our predicted weight_pounds that was calculated by our trained AutoML Tables model uses the corresponding features in the row.

Step 21: Batch prediction errors

We can also look at the errors table for any possible errors. When I ran my batch prediction job, thankfully I didn't have any errors, but this is definitely the place to check in case you did. Since my errors table was empty, below you'll see the schema. Once again it has essentially taken your input data to the batch prediction job and appended three new columns to it. There is a record stored as well as an error code and error message. These could be helpful in debugging any unwanted behavior.

Make online predictions

Step 22: Online prediction setup

We can also perform online prediction with our trained AutoML Tables model. To do that, in the PREDICT tab, click ONLINE PREDICTION. You'll see on your screen something similar to below with a table our model's features. Each feature has the column name, the column ID, the data type, the status (whether it is required or not), and a prepopulated value. You can leave those values as is or enter values. For Categorical features, make sure to use valid values or else they will just end up in the OOV (out of vocabulary) spill-over and not take full advantage of the training. When you're done setting your values, click the PREDICT button.

Step 23: Online prediction result

After just a moment, you should see your online predictions appear on your screen. There will be a Prediction result as well as a 95% prediction interval returned. You can try other values for each feature and see what predictions they result in!

Lab Summary:

In this lab, we setup AutoML Tables, created and imported an AutoML Tables dataset from BigQuery, analyzed AutoML Tables dataset, trained an AutoML Tables model, checked evaluation metrics of trained model, deployed trained model, and then finally made both batch and online predictions using the trained model.

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License


In [ ]: