Learning Objectives
In the previous notebook we achieved an RMSE of 4.13. Let's see if we can improve upon that by tuning our hyperparameters.
Hyperparameters are parameters that are set prior to training a model, as opposed to parameters which are learned during training.
These include learning rate and batch size, but also model design parameters such as type of activation function and number of hidden units.
Here are the four most common ways to finding the ideal hyperparameters:
1. Manual
Traditionally, hyperparameter tuning is a manual trial and error process. A data scientist has some intution about suitable hyperparameters which they use as a starting point, then they observe the result and use that information to try a new set of hyperparameters to try to beat the existing performance.
Pros
Cons
2. Grid Search
On the other extreme we can use grid search. Define a discrete set of values to try for each hyperparameter then try every possible combination.
Pros
Cons
3. Random Search
Alternatively define a range for each hyperparamter (e.g. 0-256) and sample uniformly at random from that range.
Pros
Cons
4. Bayesian Optimization
Unlike Grid Search and Random Search, Bayesian Optimization takes into account information from past trials to select parameters for future trials. The details of how this is done is beyond the scope of this notebook, but if you're interested you can read how it works here here.
Pros
Cons
AI Platform HyperTune
AI Platform HyperTune, powered by Google Vizier, uses Bayesian Optimization by default, but also supports Grid Search and Random Search.
When tuning just a few hyperparameters (say less than 4), Grid Search and Random Search work well, but when tunining several hyperparameters and the search space is large Bayesian Optimization is best.
In [ ]:
PROJECT = "cloud-training-demos" # Replace with your PROJECT
BUCKET = "cloud-training-bucket" # Replace with your BUCKET
REGION = "us-central1" # Choose an available region for AI Platform
TFVERSION = "1.14" # TF version for AI Platform
In [ ]:
import os
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["TFVERSION"] = TFVERSION
In [ ]:
%%bash
mkdir taxifaremodel
touch taxifaremodel/__init__.py
In [ ]:
%%writefile taxifaremodel/model.py
import tensorflow as tf
import numpy as np
import shutil
print(tf.__version__)
#1. Train and Evaluate Input Functions
CSV_COLUMN_NAMES = ["fare_amount","dayofweek","hourofday","pickuplon","pickuplat","dropofflon","dropofflat"]
CSV_DEFAULTS = [[0.0],[1],[0],[-74.0],[40.0],[-74.0],[40.7]]
def read_dataset(csv_path):
def _parse_row(row):
# Decode the CSV row into list of TF tensors
fields = tf.decode_csv(records = row, record_defaults = CSV_DEFAULTS)
# Pack the result into a dictionary
features = dict(zip(CSV_COLUMN_NAMES, fields))
# NEW: Add engineered features
features = add_engineered_features(features)
# Separate the label from the features
label = features.pop("fare_amount") # remove label from features and store
return features, label
# Create a dataset containing the text lines.
dataset = tf.data.Dataset.list_files(file_pattern = csv_path) # (i.e. data_file_*.csv)
dataset = dataset.flat_map(map_func = lambda filename:tf.data.TextLineDataset(filenames = filename).skip(count = 1))
# Parse each CSV row into correct (features,label) format for Estimator API
dataset = dataset.map(map_func = _parse_row)
return dataset
def train_input_fn(csv_path, batch_size = 128):
#1. Convert CSV into tf.data.Dataset with (features,label) format
dataset = read_dataset(csv_path)
#2. Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(buffer_size = 1000).repeat(count = None).batch(batch_size = batch_size)
return dataset
def eval_input_fn(csv_path, batch_size = 128):
#1. Convert CSV into tf.data.Dataset with (features,label) format
dataset = read_dataset(csv_path)
#2.Batch the examples.
dataset = dataset.batch(batch_size = batch_size)
return dataset
#2. Feature Engineering
# One hot encode dayofweek and hourofday
fc_dayofweek = tf.feature_column.categorical_column_with_identity(key = "dayofweek", num_buckets = 7)
fc_hourofday = tf.feature_column.categorical_column_with_identity(key = "hourofday", num_buckets = 24)
# Cross features to get combination of day and hour
fc_day_hr = tf.feature_column.crossed_column(keys = [fc_dayofweek, fc_hourofday], hash_bucket_size = 24 * 7)
# Bucketize latitudes and longitudes
NBUCKETS = 16
latbuckets = np.linspace(start = 38.0, stop = 42.0, num = NBUCKETS).tolist()
lonbuckets = np.linspace(start = -76.0, stop = -72.0, num = NBUCKETS).tolist()
fc_bucketized_plat = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "pickuplon"), boundaries = lonbuckets)
fc_bucketized_plon = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "pickuplat"), boundaries = latbuckets)
fc_bucketized_dlat = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "dropofflon"), boundaries = lonbuckets)
fc_bucketized_dlon = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "dropofflat"), boundaries = latbuckets)
def add_engineered_features(features):
features["dayofweek"] = features["dayofweek"] - 1 # subtract one since our days of week are 1-7 instead of 0-6
features["latdiff"] = features["pickuplat"] - features["dropofflat"] # East/West
features["londiff"] = features["pickuplon"] - features["dropofflon"] # North/South
features["euclidean_dist"] = tf.sqrt(features["latdiff"]**2 + features["londiff"]**2)
return features
feature_cols = [
#1. Engineered using tf.feature_column module
tf.feature_column.indicator_column(categorical_column = fc_day_hr),
fc_bucketized_plat,
fc_bucketized_plon,
fc_bucketized_dlat,
fc_bucketized_dlon,
#2. Engineered in input functions
tf.feature_column.numeric_column(key = "latdiff"),
tf.feature_column.numeric_column(key = "londiff"),
tf.feature_column.numeric_column(key = "euclidean_dist")
]
#3. Serving Input Receiver Function
def serving_input_receiver_fn():
receiver_tensors = {
'dayofweek' : tf.placeholder(dtype = tf.int32, shape = [None]), # shape is vector to allow batch of requests
'hourofday' : tf.placeholder(dtype = tf.int32, shape = [None]),
'pickuplon' : tf.placeholder(dtype = tf.float32, shape = [None]),
'pickuplat' : tf.placeholder(dtype = tf.float32, shape = [None]),
'dropofflat' : tf.placeholder(dtype = tf.float32, shape = [None]),
'dropofflon' : tf.placeholder(dtype = tf.float32, shape = [None]),
}
features = add_engineered_features(receiver_tensors) # 'features' is what is passed on to the model
return tf.estimator.export.ServingInputReceiver(features = features, receiver_tensors = receiver_tensors)
#4. Train and Evaluate
def train_and_evaluate(params):
OUTDIR = params["output_dir"]
model = tf.estimator.DNNRegressor(
hidden_units = params["hidden_units"].split(","), # NEW: paramaterize architecture
feature_columns = feature_cols,
model_dir = OUTDIR,
config = tf.estimator.RunConfig(
tf_random_seed = 1, # for reproducibility
save_checkpoints_steps = max(100, params["train_steps"] // 10) # checkpoint every N steps
)
)
# Add custom evaluation metric
def my_rmse(labels, predictions):
pred_values = tf.squeeze(input = predictions["predictions"], axis = -1)
return {"rmse": tf.metrics.root_mean_squared_error(labels = labels, predictions = pred_values)}
model = tf.contrib.estimator.add_metrics(model, my_rmse)
train_spec = tf.estimator.TrainSpec(
input_fn = lambda: train_input_fn(params["train_data_path"]),
max_steps = params["train_steps"])
exporter = tf.estimator.FinalExporter(name = "exporter", serving_input_receiver_fn = serving_input_receiver_fn) # export SavedModel once at the end of training
# Note: alternatively use tf.estimator.BestExporter to export at every checkpoint that has lower loss than the previous checkpoint
eval_spec = tf.estimator.EvalSpec(
input_fn = lambda: eval_input_fn(params["eval_data_path"]),
steps = None,
start_delay_secs = 1, # wait at least N seconds before first evaluation (default 120)
throttle_secs = 1, # wait at least N seconds before each subsequent evaluation (default 600)
exporters = exporter) # export SavedModel once at the end of training
tf.logging.set_verbosity(v = tf.logging.INFO) # so loss is printed during training
shutil.rmtree(path = OUTDIR, ignore_errors = True) # start fresh each time
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
The code cell below has two TODOs for you to complete.
Firstly, in model.py above we set the number of hidden units in our model to be a hyperparameter. This means hidden_units
must be exposed as a command line argument when we submit our training job to Cloud ML Engine. Modify the code below to add an flag for hidden_units
. Be sure to include a description for the help
field and specify the data type
that the model should expect to receive. You can also include a default
value. Look to the other parser arguments to make sure you have the formatting corret.
Second, when doing hyperparameter tuning we need to make sure the output directory is different for each run, otherwise successive runs will overwrite previous runs. In task.py
below, add some code to append the trial_id to the output direcroty of the training job.
Hint: You can use json.loads(os.environ.get('TF_CONFIG', '{}')).get('task', {}).get('trial', '')
to extract the trial id of the training job. You will want to append this quanity to the output directory args['output_dir']
to make sure the output directory is different for each run.
In [ ]:
%%writefile taxifaremodel/task.py
import argparse
import json
import os
from . import model
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
# TODO: Your code goes here
)
parser.add_argument(
"--train_data_path",
help = "GCS or local path to training data",
required = True
)
parser.add_argument(
"--train_steps",
help = "Steps to run the training job for (default: 1000)",
type = int,
default = 1000
)
parser.add_argument(
"--eval_data_path",
help = "GCS or local path to evaluation data",
required = True
)
parser.add_argument(
"--output_dir",
help = "GCS location to write checkpoints and export models",
required = True
)
parser.add_argument(
"--job-dir",
help="This is not used by our model, but it is required by gcloud",
)
args = parser.parse_args().__dict__
# Append trial_id to path so trials don"t overwrite each other
# This code can be removed if you are not using hyperparameter tuning
args["output_dir"] = os.path.join(
# TODO: Your code goes here
)
# Run the training job
model.train_and_evaluate(args)
We specify:
maxTrials
) and how many of those trials can be run in parrallel (maxParallelTrials
) GRID_SEARCH
)hyperparameterMetricTag
)Full specification options here.
Here we are just tuning one parameter, the number of hidden units, and we'll run all trials in parrallel. However more commonly you would tune multiple hyperparameters.
In [ ]:
%%writefile hyperparam.yaml
trainingInput:
scaleTier: BASIC
hyperparameters:
goal: MINIMIZE
maxTrials: 10
maxParallelTrials: 10
hyperparameterMetricTag: rmse
enableTrialEarlyStopping: True
algorithm: GRID_SEARCH
params:
- parameterName: hidden_units
type: CATEGORICAL
categoricalValues:
- 10,10
- 64,32
- 128,64,32
- # TODO: Your code goes here
Same as before with the addition of --config=hyperpam.yaml
to reference the file we just created.
This will take about 20 minutes. Go to cloud console and click on the job id. Once the job is completed, the choosen hyperparameters and resulting objective value (RMSE in this case) will be shown. Trials will sorted from best to worst.
In [ ]:
OUTDIR="gs://{}/taxifare/trained_hp_tune".format(BUCKET)
!gsutil -m rm -rf # TODO: Your code goes here
!gcloud ai-platform # TODO: Your code goes here
--package-path= # TODO: Your code goes here
--module-name= # TODO: Your code goes here
--config= # TODO: Your code goes here
--job-dir= # TODO: Your code goes here
--python-version= # TODO: Your code goes here
--runtime-version= # TODO: Your code goes here
--region= # TODO: Your code goes here
-- \
--train_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-train.csv \
--eval_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-valid.csv \
--train_steps=5000 \
--output_dir={OUTDIR}
In [ ]:
OUTDIR="gs://{}/taxifare/trained_large_tuned".format(BUCKET)
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ai-platform jobs submit training taxifare_large_$(date -u +%y%m%d_%H%M%S) \
--package-path=taxifaremodel \
--module-name=taxifaremodel.task \
--job-dir=gs://{BUCKET}/taxifare \
--python-version=3.5 \
--runtime-version={TFVERSION} \
--region={REGION} \
--scale-tier=STANDARD_1 \
-- \
--train_data_path=gs://cloud-training-demos/taxifare/large/taxi-train*.csv \
--eval_data_path=gs://cloud-training-demos/taxifare/small/taxi-valid.csv \
--train_steps=200000 \
--output_dir={OUTDIR} \
--hidden_units="128,64,32"
Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License