In [ ]:
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Getting started: Training and prediction with Keras in AI Platform

View on GitHub

Overview

This tutorial shows how to train a neural network on AI Platform using the Keras sequential API and how to serve predictions from that model.

Keras is a high-level API for building and training deep learning models. tf.keras is TensorFlow’s implementation of this API.

The first two parts of the tutorial walk through training a model on Cloud AI Platform using prewritten Keras code, deploying the trained model to AI Platform, and serving online predictions from the deployed model.

The last part of the tutorial digs into the training code used for this model and ensuring it's compatible with AI Platform. To learn more about building machine learning models in Keras more generally, read TensorFlow's Keras tutorials.

Dataset

This tutorial uses the United States Census Income Dataset provided by the UC Irvine Machine Learning Repository. This dataset contains information about people from a 1994 Census database, including age, education, marital status, occupation, and whether they make more than $50,000 a year.

Objective

The goal is to train a deep neural network (DNN) using Keras that predicts whether a person makes more than $50,000 a year (target label) based on other Census information about the person (features).

This tutorial focuses more on using this model with AI Platform than on the design of the model itself. However, it's always important to think about potential problems and unintended consequences when building machine learning systems. See the Machine Learning Crash Course exercise about fairness to learn about sources of bias in the Census dataset, as well as machine learning fairness more generally.

Costs

This tutorial uses billable components of Google Cloud Platform (GCP):

  • AI Platform
  • Cloud Storage

Learn about AI Platform pricing and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

You must do several things before you can train and deploy a model in AI Platform:

  • Set up your local development environment.
  • Set up a GCP project with billing and the necessary APIs enabled.
  • Authenticate your GCP account in this notebook.
  • Create a Cloud Storage bucket to store your training package and your trained model.

Set up your local development environment

If you are using Colab or AI Platform Notebooks, your environment already meets all the requirements to run this notebook. You can skip this step.

Otherwise, make sure your environment meets this notebook's requirements. You need the following:

  • The Google Cloud SDK
  • Git
  • Python 3
  • virtualenv
  • Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to Setting up a Python development environment and the Jupyter installation guide provide detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:

  1. Install and initialize the Cloud SDK.

  2. Install Python 3.

  3. Install virtualenv and create a virtual environment that uses Python 3.

  4. Activate that environment and run pip install jupyter in a shell to install Jupyter.

  5. Run jupyter notebook in a shell to launch Jupyter.

  6. Open this notebook in the Jupyter Notebook Dashboard.

Set up your GCP project

The following steps are required, regardless of your notebook environment.

  1. Select or create a GCP project.

  2. Make sure that billing is enabled for your project.

  3. Enable the AI Platform ("Cloud Machine Learning Engine") and Compute Engine APIs.

  4. Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

Note: Jupyter runs lines prefixed with ! as shell commands, and it interpolates Python variables prefixed with $ into these commands.


In [ ]:
PROJECT_ID = '[your-project-id]' #@param {type:"string"}
! gcloud config set project $PROJECT_ID

Authenticate your GCP account

If you are using AI Platform Notebooks, your environment is already authenticated. Skip this step.

If you are using Colab, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

Otherwise, follow these steps:

  1. In the GCP Console, go to the Create service account key page.

  2. From the Service account drop-down list, select New service account.

  3. In the Service account name field, enter a name.

  4. From the Role drop-down list, select Machine Learning Engine > AI Platform Admin and Storage > Storage Object Admin.

  5. Click Create. A JSON file that contains your key downloads to your local environment.

  6. Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.


In [ ]:
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
  %env GOOGLE_APPLICATION_CREDENTIALS ''

Create a Cloud Storage bucket

The following steps are required, regardless of your notebook environment.

When you submit a training job using the Cloud SDK, you upload a Python package containing your training code to a Cloud Storage bucket. AI Platform runs the code from this package. In this tutorial, AI Platform also saves the trained model that results from your job in the same bucket. You can then create an AI Platform model version based on this output in order to serve online predictions.

Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the REGION variable, which is used for operations throughout the rest of this notebook. Make sure to choose a region where Cloud AI Platform services are available.


In [ ]:
BUCKET_NAME = '[your-bucket-name]' #@param {type:"string"}
REGION = 'us-central1' #@param {type:"string"}

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.


In [ ]:
! gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:


In [ ]:
! gsutil ls -al gs://$BUCKET_NAME

Part 1. Quickstart for training in AI Platform

This section of the tutorial walks you through submitting a training job to Cloud AI Platform. This job runs sample code that uses Keras to train a deep neural network on the United States Census data. It outputs the trained model as a TensorFlow SavedModel directory in your Cloud Storage bucket.

Get training code and dependencies

First, download the training code and change the notebook's working directory:


In [ ]:
# Clone the repository of AI Platform samples
! git clone --depth 1 https://github.com/GoogleCloudPlatform/ai-platform-samples

# Set the working directory to the sample code directory
%cd ai-platform-samples/training/tensorflow/census/tf-keras

Notice that the training code is structured as a Python package in the trainer/ subdirectory:


In [ ]:
# `ls` shows the working directory's contents. The `p` flag adds trailing 
# slashes to subdirectory names. The `R` flag lists subdirectories recursively.
! ls -pR

Run the following cell to install Python dependencies needed to train the model locally. When you run the training job in AI Platform, dependencies are preinstalled based on the runtime version you choose.


In [ ]:
! pip install -r requirements.txt --user

Train your model locally

Before training on AI Platform, train the job locally to verify the file structure and packaging is correct.

For a complex or resource-intensive job, you may want to train locally on a small sample of your dataset to verify your code. Then you can run the job on AI Platform to train on the whole dataset.

This sample runs a relatively quick job on a small dataset, so the local training and the AI Platform job run the same code on the same data.

Run the following cell to train a model locally:


In [ ]:
# Explicitly tell `gcloud ai-platform local train` to use Python 3 
! gcloud config set ml_engine/local_python $(which python3)

# This is similar to `python -m trainer.task --job-dir local-training-output`
# but it better replicates the AI Platform environment, especially for
# distributed training (not applicable here).
! gcloud ai-platform local train \
  --package-path trainer \
  --module-name trainer.task \
  --job-dir local-training-output

Train your model using AI Platform

Next, submit a training job to AI Platform. This runs the training module in the cloud and exports the trained model to Cloud Storage.

First, give your training job a name and choose a directory within your Cloud Storage bucket for saving intermediate and output files:


In [ ]:
import uuid

JOB_NAME = "my_first_keras_job_" + uuid.uuid4().hex[:10]
JOB_DIR = 'gs://' + BUCKET_NAME + '/keras-job-dir'

Run the following command to package the trainer/ directory, upload it to the specified --job-dir, and instruct AI Platform to run the trainer.task module from that package.

The --stream-logs flag lets you view training logs in the cell below. You can also see logs and other job details in the GCP Console.

Hyperparameter tuning

You can optionally perform hyperparameter tuning by using the included hptuning_config.yaml configuration file. This file tells AI Platform to tune the batch size and learning rate for training over multiple trials to maximize accuracy.

In this example, the training code uses a TensorBoard callback, which creates TensorFlow Summary Events during training. AI Platform uses these events to track the metric you want to optimize. Learn more about hyperparameter tuning in AI Platform Training.


In [ ]:
! gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 2.1 \
  --job-dir $JOB_DIR \
  --stream-logs

Part 2. Quickstart for online predictions in AI Platform

This section shows how to use AI Platform and your trained model from Part 1 to predict a person's income bracket from other Census information about them.

Create model and version resources in AI Platform

To serve online predictions using the model you trained and exported in Part 1, create a model resource in AI Platform and a version resource within it. The version resource is what actually uses your trained model to serve predictions. This structure lets you adjust and retrain your model many times and organize all the versions together in AI Platform. Learn more about models and versions.

First, name and create the model resource:


In [ ]:
MODEL_NAME = 'my_first_keras_model'
MODEL_VERSION = 'v1'

In [ ]:
# Delete model version resource
! gcloud ai-platform versions delete $MODEL_VERSION --quiet --model $MODEL_NAME 

# Delete model resource
! gcloud ai-platform models delete $MODEL_NAME --quiet

In [ ]:
# Create Model
! gcloud ai-platform models create $MODEL_NAME --regions $REGION

Next, create the model version. The training job from Part 1 exported a timestamped TensorFlow SavedModel directory to your Cloud Storage bucket. AI Platform uses this directory to create a model version. Learn more about SavedModel and AI Platform.

You may be able to find the path to this directory in your training job's logs. Look for a line like:

Model exported to:  gs://<your-bucket-name>/keras-job-dir/keras_export/1545439782

Execute the following command to identify your SavedModel directory and use it to create a model version resource:


In [ ]:
# Get a list of directories in the `keras_export` parent directory
! gsutil ls $JOB_DIR/keras_export/

In [ ]:
# Pick the directory with the latest timestamp, in case you've trained
# multiple times
SAVED_MODEL_PATH = 'gs://{}/keras-job-dir/keras_export/'.format(BUCKET_NAME)

In [ ]:
# Create model version based on that SavedModel directory
! gcloud ai-platform versions create $MODEL_VERSION \
  --model $MODEL_NAME \
  --runtime-version 2.1 \
  --python-version 3.7 \
  --framework tensorflow \
  --origin $SAVED_MODEL_PATH

Prepare input for prediction

To receive valid and useful predictions, you must preprocess input for prediction in the same way that training data was preprocessed. In a production system, you may want to create a preprocessing pipeline that can be used identically at training time and prediction time.

For this exercise, use the training package's data-loading code to select a random sample from the evaluation data. This data is in the form that was used to evaluate accuracy after each epoch of training, so it can be used to send test predictions without further preprocessing:


In [ ]:
from trainer import util

_, _, eval_x, eval_y = util.load_data()

prediction_input = eval_x.sample(20)
prediction_targets = eval_y[prediction_input.index]

prediction_input

Notice that categorical fields, like occupation, have already been converted to integers (with the same mapping that was used for training). Numerical fields, like age, have been scaled to a z-score. Some fields have been dropped from the original data. Compare the prediction input with the raw data for the same examples:


In [ ]:
import pandas as pd

_, eval_file_path = util.download(util.DATA_DIR)
raw_eval_data = pd.read_csv(eval_file_path,
                            names=util._CSV_COLUMNS,
                            na_values='?')

raw_eval_data.iloc[prediction_input.index]

Export the prediction input to a newline-delimited JSON file:


In [ ]:
import json

with open('prediction_input.json', 'w') as json_file:
  for row in prediction_input.values.tolist():
    json.dump(row, json_file)
    json_file.write('\n')

! cat prediction_input.json

The gcloud command-line tool accepts newline-delimited JSON for online prediction, and this particular Keras model expects a flat list of numbers for each input example.

AI Platform requires a different format when you make online prediction requests to the REST API without using the gcloud tool. The way you structure your model may also change how you must format data for prediction. Learn more about formatting data for online prediction.

Submit the online prediction request

Use gcloud to submit your online prediction request.


In [ ]:
! gcloud ai-platform predict \
  --model $MODEL_NAME \
  --version $MODEL_VERSION \
  --json-instances prediction_input.json

Since the model's last layer uses a sigmoid function for its activation, outputs between 0 and 0.5 represent negative predictions ("<=50K") and outputs between 0.5 and 1 represent positive ones (">50K").

Do the predicted income brackets match the actual ones? Run the following cell to see the true labels.


In [ ]:
prediction_targets

Part 3. Developing the Keras model from scratch

At this point, you have trained a machine learning model on AI Platform, deployed the trained model as a version resource on AI Platform, and received online predictions from the deployment. The next section walks through recreating the Keras code used to train your model. It covers the following parts of developing a machine learning model for use with AI Platform:

  • Downloading and preprocessing data
  • Designing and training the model
  • Visualizing training and exporting the trained model

While this section provides more detailed insight to the tasks completed in previous parts, to learn more about using tf.keras, read TensorFlow's guide to Keras. To learn more about structuring code as a training packge for AI Platform, read Packaging a training application and reference the complete training code, which is structured as a Python package.

Import libraries and define constants

First, import Python libraries required for training:


In [ ]:
import os
from six.moves import urllib
import tempfile

import numpy as np
import pandas as pd
import tensorflow as tf

# Examine software versions
print(__import__('sys').version)
print(tf.__version__)
print(tf.keras.__version__)

Then, define some useful constants:

  • Information for downloading training and evaluation data
  • Information required for Pandas to interpret the data and convert categorical fields into numeric features
  • Hyperparameters for training, such as learning rate and batch size

In [ ]:
### For downloading data ###

# Storage directory
DATA_DIR = os.path.join(tempfile.gettempdir(), 'census_data')

# Download options.
DATA_URL = 'https://storage.googleapis.com/cloud-samples-data/ai-platform' \
           '/census/data'
TRAINING_FILE = 'adult.data.csv'
EVAL_FILE = 'adult.test.csv'
TRAINING_URL = '%s/%s' % (DATA_URL, TRAINING_FILE)
EVAL_URL = '%s/%s' % (DATA_URL, EVAL_FILE)

### For interpreting data ###

# These are the features in the dataset.
# Dataset information: https://archive.ics.uci.edu/ml/datasets/census+income
_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income_bracket'
]

_CATEGORICAL_TYPES = {
    'workclass': pd.api.types.CategoricalDtype(categories=[
        'Federal-gov', 'Local-gov', 'Never-worked', 'Private', 'Self-emp-inc',
        'Self-emp-not-inc', 'State-gov', 'Without-pay'
    ]),
    'marital_status': pd.api.types.CategoricalDtype(categories=[
        'Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
        'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'
    ]),
    'occupation': pd.api.types.CategoricalDtype([
        'Adm-clerical', 'Armed-Forces', 'Craft-repair', 'Exec-managerial',
        'Farming-fishing', 'Handlers-cleaners', 'Machine-op-inspct',
        'Other-service', 'Priv-house-serv', 'Prof-specialty', 'Protective-serv',
        'Sales', 'Tech-support', 'Transport-moving'
    ]),
    'relationship': pd.api.types.CategoricalDtype(categories=[
        'Husband', 'Not-in-family', 'Other-relative', 'Own-child', 'Unmarried',
        'Wife'
    ]),
    'race': pd.api.types.CategoricalDtype(categories=[
        'Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'
    ]),
    'native_country': pd.api.types.CategoricalDtype(categories=[
        'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba', 'Dominican-Republic',
        'Ecuador', 'El-Salvador', 'England', 'France', 'Germany', 'Greece',
        'Guatemala', 'Haiti', 'Holand-Netherlands', 'Honduras', 'Hong', 'Hungary',
        'India', 'Iran', 'Ireland', 'Italy', 'Jamaica', 'Japan', 'Laos', 'Mexico',
        'Nicaragua', 'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
        'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan', 'Thailand',
        'Trinadad&Tobago', 'United-States', 'Vietnam', 'Yugoslavia'
    ]),
    'income_bracket': pd.api.types.CategoricalDtype(categories=[
        '<=50K', '>50K'
    ])
}

# This is the label (target) we want to predict.
_LABEL_COLUMN = 'income_bracket'

### Hyperparameters for training ###

# This the training batch size
BATCH_SIZE = 128

# This is the number of epochs (passes over the full training data)
NUM_EPOCHS = 20

# Define learning rate.
LEARNING_RATE = .01

Download and preprocess data

Download the data

Next, define functions to download training and evaluation data. These functions also fix minor irregularities in the data's formatting.


In [ ]:
def _download_and_clean_file(filename, url):
    """Downloads data from url, and makes changes to match the CSV format.
  
    The CSVs may use spaces after the comma delimters (non-standard) or include
    rows which do not represent well-formed examples. This function strips out
    some of these problems.
  
    Args:
      filename: filename to save url to
      url: URL of resource to download
    """
    temp_file, _ = urllib.request.urlretrieve(url)
    with tf.io.gfile.GFile(temp_file, 'r') as temp_file_object:
        with tf.io.gfile.GFile(filename, 'w') as file_object:
            for line in temp_file_object:
                line = line.strip()
                line = line.replace(', ', ',')
                if not line or ',' not in line:
                    continue
                if line[-1] == '.':
                    line = line[:-1]
                line += '\n'
                file_object.write(line)
    tf.io.gfile.remove(temp_file)


def download(data_dir):
    """Downloads census data if it is not already present.
  
    Args:
      data_dir: directory where we will access/save the census data
    """
    tf.io.gfile.makedirs(data_dir)

    training_file_path = os.path.join(data_dir, TRAINING_FILE)
    if not tf.io.gfile.exists(training_file_path):
        _download_and_clean_file(training_file_path, TRAINING_URL)

    eval_file_path = os.path.join(data_dir, EVAL_FILE)
    if not tf.io.gfile.exists(eval_file_path):
        _download_and_clean_file(eval_file_path, EVAL_URL)

    return training_file_path, eval_file_path

Use those functions to download the data for training and verify that you have CSV files for training and evaluation:


In [ ]:
training_file_path, eval_file_path = download(DATA_DIR)

# You should see 2 files: adult.data.csv and adult.test.csv
!ls -l $DATA_DIR

Next, load these files using Pandas and examine the data:


In [ ]:
# This census data uses the value '?' for fields (column) that are missing data. 
# We use na_values to find ? and set it to NaN values.
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

train_df = pd.read_csv(training_file_path, names=_CSV_COLUMNS, na_values='?')
eval_df = pd.read_csv(eval_file_path, names=_CSV_COLUMNS, na_values='?')

# Here's what the data looks like before we preprocess the data.
train_df.head()

Preprocess the data

The first preprocessing step removes certain features from the data and converts categorical features to numerical values for use with Keras.

Learn more about feature engineering and bias in data.


In [ ]:
UNUSED_COLUMNS = ['fnlwgt', 'education', 'gender']


def preprocess(dataframe):
    """Converts categorical features to numeric. Removes unused columns.
  
    Args:
      dataframe: Pandas dataframe with raw data
  
    Returns:
      Dataframe with preprocessed data
    """
    dataframe = dataframe.drop(columns=UNUSED_COLUMNS)

    # Convert integer valued (numeric) columns to floating point
    numeric_columns = dataframe.select_dtypes(['int64']).columns
    dataframe[numeric_columns] = dataframe[numeric_columns].astype('float32')

    # Convert categorical columns to numeric
    cat_columns = dataframe.select_dtypes(['object']).columns
    dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.astype(
        _CATEGORICAL_TYPES[x.name]))
    dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.cat.codes)
    return dataframe


prepped_train_df = preprocess(train_df)
prepped_eval_df = preprocess(eval_df)

Run the following cell to see how preprocessing changed the data. Notice in particular that income_bracket, the label that you're training the model to predict, has changed from <=50K and >50K to 0 and 1:


In [ ]:
prepped_train_df.head()

Next, separate the data into features ("x") and labels ("y"), and reshape the label arrays into a format for use with tf.data.Dataset later:


In [ ]:
# Split train and test data with labels.
# The pop() method will extract (copy) and remove the label column from the dataframe
train_x, train_y = prepped_train_df, prepped_train_df.pop(_LABEL_COLUMN)
eval_x, eval_y = prepped_eval_df, prepped_eval_df.pop(_LABEL_COLUMN)

# Reshape label columns for use with tf.data.Dataset
train_y = np.asarray(train_y).astype('float32').reshape((-1, 1))
eval_y = np.asarray(eval_y).astype('float32').reshape((-1, 1))

Scaling training data so each numerical feature column has a mean of 0 and a standard deviation of 1 can improve your model.

In a production system, you may want to save the means and standard deviations from your training set and use them to perform an identical transformation on test data at prediction time. For convenience in this exercise, temporarily combine the training and evaluation data to scale all of them:


In [ ]:
def standardize(dataframe):
    """Scales numerical columns using their means and standard deviation to get
    z-scores: the mean of each numerical column becomes 0, and the standard
    deviation becomes 1. This can help the model converge during training.
  
    Args:
      dataframe: Pandas dataframe
  
    Returns:
      Input dataframe with the numerical columns scaled to z-scores
    """
    dtypes = list(zip(dataframe.dtypes.index, map(str, dataframe.dtypes)))
    # Normalize numeric columns.
    for column, dtype in dtypes:
        if dtype == 'float32':
            dataframe[column] -= dataframe[column].mean()
            dataframe[column] /= dataframe[column].std()
    return dataframe


# Join train_x and eval_x to normalize on overall means and standard
# deviations. Then separate them again.
all_x = pd.concat([train_x, eval_x], keys=['train', 'eval'])
all_x = standardize(all_x)
train_x, eval_x = all_x.xs('train'), all_x.xs('eval')

Finally, examine some of your fully preprocessed training data:


In [ ]:
# Verify dataset features
# Note how only the numeric fields (not categorical) have been standardized
train_x.head()

Design and train the model

Create training and validation datasets

Create an input function to convert features and labels into a tf.data.Dataset for training or evaluation:


In [ ]:
def input_fn(features, labels, shuffle, num_epochs, batch_size):
    """Generates an input function to be used for model training.
  
    Args:
      features: numpy array of features used for training or inference
      labels: numpy array of labels for each example
      shuffle: boolean for whether to shuffle the data or not (set True for
        training, False for evaluation)
      num_epochs: number of epochs to provide the data for
      batch_size: batch size for training
  
    Returns:
      A tf.data.Dataset that can provide data to the Keras model for training or
        evaluation
    """
    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)
    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    if shuffle:
        dataset = dataset.shuffle(buffer_size=len(features))

    # We call repeat after shuffling, rather than before, to prevent separate
    # epochs from blending together.
    dataset = dataset.repeat(num_epochs)
    dataset = dataset.batch(batch_size)
    return dataset

Next, create these training and evaluation datasets.Use the NUM_EPOCHS and BATCH_SIZE hyperparameters defined previously to define how the training dataset provides examples to the model during training. Set up the validation dataset to provide all its examples in one batch, for a single validation step at the end of each training epoch.


In [ ]:
# Pass a numpy array by using DataFrame.values
training_dataset = input_fn(features=train_x.values, 
                        labels=train_y, 
                        shuffle=True, 
                        num_epochs=NUM_EPOCHS, 
                        batch_size=BATCH_SIZE)

num_eval_examples = eval_x.shape[0]

# Pass a numpy array by using DataFrame.values
validation_dataset = input_fn(features=eval_x.values, 
                        labels=eval_y, 
                        shuffle=False, 
                        num_epochs=NUM_EPOCHS, 
                        batch_size=num_eval_examples)

Design a Keras Model

Design your neural network using the Keras Sequential API.

This deep neural network (DNN) has several hidden layers, and the last layer uses a sigmoid activation function to output a value between 0 and 1:

  • The input layer has 100 units using the ReLU activation function.
  • The hidden layer has 75 units using the ReLU activation function.
  • The hidden layer has 50 units using the ReLU activation function.
  • The hidden layer has 25 units using the ReLU activation function.
  • The output layer has 1 units using a sigmoid activation function.
  • The optimizer uses the binary cross-entropy loss function, which is appropriate for a binary classification problem like this one.

Feel free to change these layers to try to improve the model:


In [ ]:
def create_keras_model(input_dim, learning_rate):
    """Creates Keras Model for Binary Classification.
  
    Args:
      input_dim: How many features the input has
      learning_rate: Learning rate for training
  
    Returns:
      The compiled Keras model (still needs to be trained)
    """
    Dense = tf.keras.layers.Dense
    model = tf.keras.Sequential(
      [
          Dense(100, activation=tf.nn.relu, kernel_initializer='uniform',
                  input_shape=(input_dim,)),
          Dense(75, activation=tf.nn.relu),
          Dense(50, activation=tf.nn.relu),
          Dense(25, activation=tf.nn.relu),
          Dense(1, activation=tf.nn.sigmoid)
      ])
    # Custom Optimizer:
    # https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer
    optimizer = tf.keras.optimizers.RMSprop(
        lr=learning_rate)

    # Compile Keras model
    model.compile(
        loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

Next, create the Keras model object and examine its structure:


In [ ]:
num_train_examples, input_dim = train_x.shape
print('Number of features: {}'.format(input_dim))
print('Number of examples: {}'.format(num_train_examples))

keras_model = create_keras_model(
    input_dim=input_dim,
    learning_rate=LEARNING_RATE)

# Take a detailed look inside the model
keras_model.summary()

Train and evaluate the model

Define a learning rate decay to encourage model paramaters to make smaller changes as training goes on:


In [ ]:
class CustomCallback(tf.keras.callbacks.TensorBoard):
    """Callback to write out a custom metric used by CAIP for HP Tuning."""

    def on_epoch_end(self, epoch, logs=None):  # pylint: disable=no-self-use
        """Write tf.summary.scalar on epoch end."""
        tf.summary.scalar('epoch_accuracy', logs['accuracy'], epoch)
        

# Setup TensorBoard callback.
tensorboard_cb = CustomCallback(os.path.join(JOB_DIR, 'keras_tensorboard'),
                               histogram_freq=1)

# Setup Learning Rate decay.
lr_decay_cb = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: LEARNING_RATE + 0.02 * (0.5 ** (1 + epoch)),
    verbose=True)

Finally, train the model. Provide the appropriate steps_per_epoch for the model to train on the entire training dataset (with BATCH_SIZE examples per step) during each epoch. And instruct the model to calculate validation accuracy with one big validation batch at the end of each epoch.


In [ ]:
history = keras_model.fit(training_dataset, 
                          epochs=NUM_EPOCHS, 
                          steps_per_epoch=int(num_train_examples/BATCH_SIZE), 
                          validation_data=validation_dataset, 
                          validation_steps=1, 
                          callbacks=[tensorboard_cb, lr_decay_cb],
                          verbose=1)

Visualize training and export the trained model

Visualize training

Import matplotlib to visualize how the model learned over the training period.


In [ ]:
from matplotlib import pyplot as plt

%matplotlib inline

Plot the model's loss (binary cross-entropy) and accuracy, as measured at the end of each training epoch:


In [ ]:
# Visualize History for Loss.
plt.title('Keras model loss')
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='upper right')
plt.show()

# Visualize History for Accuracy.
plt.title('Keras model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training', 'validation'], loc='lower right')
plt.show()

Over time, loss decreases and accuracy increases. But do they converge to a stable level? Are there big differences between the training and validation metrics (a sign of overfitting)?

Learn about how to improve your machine learning model. Then, feel free to adjust hyperparameters or the model architecture and train again.

Export the model for serving

AI Platform requires when you create a model version resource.

Since not all optimizers can be exported to the SavedModel format, you may see warnings during the export process. As long you successfully export a serving graph, AI Platform can used the SavedModel to serve predictions.


In [ ]:
# Export the model to a local SavedModel directory 
tf.keras.models.save_model(keras_model, 'keras_export')

You may export a SavedModel directory to your local filesystem or to Cloud Storage, as long as you have the necessary permissions. In your current environment, you granted access to Cloud Storage by authenticating your GCP account and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. AI Platform training jobs can also export directly to Cloud Storage, because AI Platform service accounts have access to Cloud Storage buckets in their own project.

Try exporting directly to Cloud Storage:


In [ ]:
# Export the model to a SavedModel directory in Cloud Storage
export_path = os.path.join(JOB_DIR, 'keras_export')
tf.keras.models.save_model(keras_model, export_path)
print('Model exported to: ', export_path)

You can now deploy this model to AI Platform and serve predictions by following the steps from Part 2.

Cleaning up

To clean up all GCP resources used in this project, you can delete the GCP project you used for the tutorial.

Alternatively, you can clean up individual resources by running the following commands:


In [ ]:
# Delete model version resource
! gcloud ai-platform versions delete $MODEL_VERSION --quiet --model $MODEL_NAME 

# Delete model resource
! gcloud ai-platform models delete $MODEL_NAME --quiet

# Delete Cloud Storage objects that were created
! gsutil -m rm -r $JOB_DIR

# If the training job is still running, cancel it
! gcloud ai-platform jobs cancel $JOB_NAME --quiet --verbosity critical

If your Cloud Storage bucket doesn't contain any other objects and you would like to delete it, run gsutil rm -r gs://$BUCKET_NAME.

What's next?