Learning Objectives
In the previous notebook we achieved an RMSE of 4.13. Let's see if we can improve upon that by tuning our hyperparameters.
Hyperparameters are parameters that are set prior to training a model, as opposed to parameters which are learned during training.
These include learning rate and batch size, but also model design parameters such as type of activation function and number of hidden units.
Here are the four most common ways to finding the ideal hyperparameters:
1. Manual
Traditionally, hyperparameter tuning is a manual trial and error process. A data scientist has some intution about suitable hyperparameters which they use as a starting point, then they observe the result and use that information to try a new set of hyperparameters to try to beat the existing performance.
Pros
Cons
2. Grid Search
On the other extreme we can use grid search. Define a discrete set of values to try for each hyperparameter then try every possible combination.
Pros
Cons
3. Random Search
Alternatively define a range for each hyperparamter (e.g. 0-256) and sample uniformly at random from that range.
Pros
Cons
4. Bayesian Optimization
Unlike Grid Search and Random Search, Bayesian Optimization takes into account information from past trials to select parameters for future trials. The details of how this is done is beyond the scope of this notebook, but if you're interested you can read how it works here here.
Pros
Cons
AI Platform HyperTune
AI Platform HyperTune, powered by Google Vizier, uses Bayesian Optimization by default, but also supports Grid Search and Random Search.
When tuning just a few hyperparameters (say less than 4), Grid Search and Random Search work well, but when tunining several hyperparameters and the search space is large Bayesian Optimization is best.
In [10]:
PROJECT = "cloud-training-demos" # Replace with your PROJECT
BUCKET = "cloud-training-bucket" # Replace with your BUCKET
REGION = "us-central1" # Choose an available region for AI Platform
TFVERSION = "1.14" # TF version for AI Platform to use
In [ ]:
import os
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["TFVERSION"] = TFVERSION
In [ ]:
%%bash
mkdir taxifaremodel
touch taxifaremodel/__init__.py
In [ ]:
%%writefile taxifaremodel/model.py
import tensorflow as tf
import numpy as np
import shutil
print(tf.__version__)
#1. Train and Evaluate Input Functions
CSV_COLUMN_NAMES = ["fare_amount","dayofweek","hourofday","pickuplon","pickuplat","dropofflon","dropofflat"]
CSV_DEFAULTS = [[0.0],[1],[0],[-74.0],[40.0],[-74.0],[40.7]]
def read_dataset(csv_path):
def _parse_row(row):
# Decode the CSV row into list of TF tensors
fields = tf.decode_csv(records = row, record_defaults = CSV_DEFAULTS)
# Pack the result into a dictionary
features = dict(zip(CSV_COLUMN_NAMES, fields))
# NEW: Add engineered features
features = add_engineered_features(features)
# Separate the label from the features
label = features.pop("fare_amount") # remove label from features and store
return features, label
# Create a dataset containing the text lines.
dataset = tf.data.Dataset.list_files(file_pattern = csv_path) # (i.e. data_file_*.csv)
dataset = dataset.flat_map(map_func = lambda filename:tf.data.TextLineDataset(filenames = filename).skip(count = 1))
# Parse each CSV row into correct (features,label) format for Estimator API
dataset = dataset.map(map_func = _parse_row)
return dataset
def train_input_fn(csv_path, batch_size = 128):
#1. Convert CSV into tf.data.Dataset with (features,label) format
dataset = read_dataset(csv_path)
#2. Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(buffer_size = 1000).repeat(count = None).batch(batch_size = batch_size)
return dataset
def eval_input_fn(csv_path, batch_size = 128):
#1. Convert CSV into tf.data.Dataset with (features,label) format
dataset = read_dataset(csv_path)
#2.Batch the examples.
dataset = dataset.batch(batch_size = batch_size)
return dataset
#2. Feature Engineering
# One hot encode dayofweek and hourofday
fc_dayofweek = tf.feature_column.categorical_column_with_identity(key = "dayofweek", num_buckets = 7)
fc_hourofday = tf.feature_column.categorical_column_with_identity(key = "hourofday", num_buckets = 24)
# Cross features to get combination of day and hour
fc_day_hr = tf.feature_column.crossed_column(keys = [fc_dayofweek, fc_hourofday], hash_bucket_size = 24 * 7)
# Bucketize latitudes and longitudes
NBUCKETS = 16
latbuckets = np.linspace(start = 38.0, stop = 42.0, num = NBUCKETS).tolist()
lonbuckets = np.linspace(start = -76.0, stop = -72.0, num = NBUCKETS).tolist()
fc_bucketized_plat = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "pickuplon"), boundaries = lonbuckets)
fc_bucketized_plon = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "pickuplat"), boundaries = latbuckets)
fc_bucketized_dlat = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "dropofflon"), boundaries = lonbuckets)
fc_bucketized_dlon = tf.feature_column.bucketized_column(source_column = tf.feature_column.numeric_column(key = "dropofflat"), boundaries = latbuckets)
def add_engineered_features(features):
features["dayofweek"] = features["dayofweek"] - 1 # subtract one since our days of week are 1-7 instead of 0-6
features["latdiff"] = features["pickuplat"] - features["dropofflat"] # East/West
features["londiff"] = features["pickuplon"] - features["dropofflon"] # North/South
features["euclidean_dist"] = tf.sqrt(features["latdiff"]**2 + features["londiff"]**2)
return features
feature_cols = [
#1. Engineered using tf.feature_column module
tf.feature_column.indicator_column(categorical_column = fc_day_hr),
fc_bucketized_plat,
fc_bucketized_plon,
fc_bucketized_dlat,
fc_bucketized_dlon,
#2. Engineered in input functions
tf.feature_column.numeric_column(key = "latdiff"),
tf.feature_column.numeric_column(key = "londiff"),
tf.feature_column.numeric_column(key = "euclidean_dist")
]
#3. Serving Input Receiver Function
def serving_input_receiver_fn():
receiver_tensors = {
'dayofweek' : tf.placeholder(dtype = tf.int32, shape = [None]), # shape is vector to allow batch of requests
'hourofday' : tf.placeholder(dtype = tf.int32, shape = [None]),
'pickuplon' : tf.placeholder(dtype = tf.float32, shape = [None]),
'pickuplat' : tf.placeholder(dtype = tf.float32, shape = [None]),
'dropofflat' : tf.placeholder(dtype = tf.float32, shape = [None]),
'dropofflon' : tf.placeholder(dtype = tf.float32, shape = [None]),
}
features = add_engineered_features(receiver_tensors) # 'features' is what is passed on to the model
return tf.estimator.export.ServingInputReceiver(features = features, receiver_tensors = receiver_tensors)
#4. Train and Evaluate
def train_and_evaluate(params):
OUTDIR = params["output_dir"]
model = tf.estimator.DNNRegressor(
hidden_units = params["hidden_units"].split(","), # NEW: paramaterize architecture
feature_columns = feature_cols,
model_dir = OUTDIR,
config = tf.estimator.RunConfig(
tf_random_seed = 1, # for reproducibility
save_checkpoints_steps = max(100, params["train_steps"] // 10) # checkpoint every N steps
)
)
# Add custom evaluation metric
def my_rmse(labels, predictions):
pred_values = tf.squeeze(input = predictions["predictions"], axis = -1)
return {"rmse": tf.metrics.root_mean_squared_error(labels = labels, predictions = pred_values)}
model = tf.contrib.estimator.add_metrics(model, my_rmse)
train_spec = tf.estimator.TrainSpec(
input_fn = lambda: train_input_fn(params["train_data_path"]),
max_steps = params["train_steps"])
exporter = tf.estimator.FinalExporter(name = "exporter", serving_input_receiver_fn = serving_input_receiver_fn) # export SavedModel once at the end of training
# Note: alternatively use tf.estimator.BestExporter to export at every checkpoint that has lower loss than the previous checkpoint
eval_spec = tf.estimator.EvalSpec(
input_fn = lambda: eval_input_fn(params["eval_data_path"]),
steps = None,
start_delay_secs = 1, # wait at least N seconds before first evaluation (default 120)
throttle_secs = 1, # wait at least N seconds before each subsequent evaluation (default 600)
exporters = exporter) # export SavedModel once at the end of training
tf.logging.set_verbosity(v = tf.logging.INFO) # so loss is printed during training
shutil.rmtree(path = OUTDIR, ignore_errors = True) # start fresh each time
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
In [ ]:
%%writefile taxifaremodel/task.py
import argparse
import json
import os
from . import model
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--hidden_units",
help = "Hidden layer sizes to use for DNN feature columns -- provide space-separated layers",
type = str,
default = "10,10"
)
parser.add_argument(
"--train_data_path",
help = "GCS or local path to training data",
required = True
)
parser.add_argument(
"--train_steps",
help = "Steps to run the training job for (default: 1000)",
type = int,
default = 1000
)
parser.add_argument(
"--eval_data_path",
help = "GCS or local path to evaluation data",
required = True
)
parser.add_argument(
"--output_dir",
help = "GCS location to write checkpoints and export models",
required = True
)
parser.add_argument(
"--job-dir",
help="This is not used by our model, but it is required by gcloud",
)
args = parser.parse_args().__dict__
# Append trial_id to path so trials don"t overwrite each other
args["output_dir"] = os.path.join(
args["output_dir"],
json.loads(
os.environ.get("TF_CONFIG", "{}")
).get("task", {}).get("trial", "")
)
# Run the training job
model.train_and_evaluate(args)
We specify:
maxTrials
) and how many of those trials can be run in parrallel (maxParallelTrials
) GRID_SEARCH
)hyperparameterMetricTag
)Full specification options here.
Here we are just tuning one parameter, the number of hidden units, and we'll run all trials in parrallel. However more commonly you would tune multiple hyperparameters.
In [ ]:
%%writefile hyperparam.yaml
trainingInput:
scaleTier: BASIC
hyperparameters:
goal: MINIMIZE
maxTrials: 10
maxParallelTrials: 10
hyperparameterMetricTag: rmse
enableTrialEarlyStopping: True
algorithm: GRID_SEARCH
params:
- parameterName: hidden_units
type: CATEGORICAL
categoricalValues:
- 10,10
- 64,32
- 128,64,32
- 32,64,128
- 128,128,128
- 32,32,32
- 256,128,64,32
- 256,256,256,32
- 256,256,256,256
- 512,256,128,64,32
Same as before with the addition of --config=hyperpam.yaml
to reference the file we just created.
This will take about 20 minutes. Go to cloud console and click on the job id. Once the job is completed, the choosen hyperparameters and resulting objective value (RMSE in this case) will be shown. Trials will sorted from best to worst.
In [ ]:
OUTDIR="gs://{}/taxifare/trained_hp_tune".format(BUCKET)
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ai-platform jobs submit training taxifare_$(date -u +%y%m%d_%H%M%S) \
--package-path=taxifaremodel \
--module-name=taxifaremodel.task \
--config=hyperparam.yaml \
--job-dir=gs://{BUCKET}/taxifare \
--python-version=3.5 \
--runtime-version={TFVERSION} \
--region={REGION} \
-- \
--train_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-train.csv \
--eval_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-valid.csv \
--train_steps=5000 \
--output_dir={OUTDIR}
In [ ]:
OUTDIR="gs://{}/taxifare/trained_large_tuned".format(BUCKET)
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ai-platform jobs submit training taxifare_large_$(date -u +%y%m%d_%H%M%S) \
--package-path=taxifaremodel \
--module-name=taxifaremodel.task \
--job-dir=gs://{BUCKET}/taxifare \
--python-version=3.5 \
--runtime-version={TFVERSION} \
--region={REGION} \
--scale-tier=STANDARD_1 \
-- \
--train_data_path=gs://cloud-training-demos/taxifare/large/taxi-train*.csv \
--eval_data_path=gs://cloud-training-demos/taxifare/small/taxi-valid.csv \
--train_steps=200000 \
--output_dir={OUTDIR} \
--hidden_units="128,64,32"
Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License