In [0]:
#@title Copyright 2020 Google LLC. Double-click here for license information.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
After doing this exercise, you'll know how to do the following:
Machine Learning Crash Course uses Colaboratories (Colabs) for all programming exercises. Colab is Google's implementation of Jupyter Notebook. Like all Jupyter Notebooks, a Colab consists of two kinds of components:
You read the text cells and run the code cells.
You must run code cells in order. In other words, you may only run a code cell once all the code cells preceding it have already been run.
To run a code cell:
Alternatively, you may invoke Runtime->Run all. Note, though, that some of the code cells will fail because not all the coding is complete. (You'll complete the coding as part of the exercise.)
We've hidden the code in code cells that don't advance the learning objectives. For example, we've hidden the code that plots graphs. However, you must still run code cells containing hidden code. You'll know that the code is hidden because you'll see a title (for example, "Load the functions that build and train a model") without seeing the code.
To view the hidden code, just double click the header.
In [0]:
#@title Run this Colab on TensorFlow 2.x
%tensorflow_version 2.x
In [0]:
import pandas as pd
import tensorflow as tf
from matplotlib import pyplot as plt
The following code defines two functions:
build_model(my_learning_rate)
, which builds an empty model.train_model(model, feature, label, epochs)
, which trains the model from the examples (feature and label) you pass. Since you don't need to understand model building code right now, we've hidden this code cell. You may optionally double-click the headline to explore this code.
In [0]:
#@title Define the functions that build and train a model
def build_model(my_learning_rate):
"""Create and compile a simple linear regression model."""
# Most simple tf.keras models are sequential.
# A sequential model contains one or more layers.
model = tf.keras.models.Sequential()
# Describe the topography of the model.
# The topography of a simple linear regression model
# is a single node in a single layer.
model.add(tf.keras.layers.Dense(units=1,
input_shape=(1,)))
# Compile the model topography into code that
# TensorFlow can efficiently execute. Configure
# training to minimize the model's mean squared error.
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=my_learning_rate),
loss="mean_squared_error",
metrics=[tf.keras.metrics.RootMeanSquaredError()])
return model
def train_model(model, feature, label, epochs, batch_size):
"""Train the model by feeding it data."""
# Feed the feature values and the label values to the
# model. The model will train for the specified number
# of epochs, gradually learning how the feature values
# relate to the label values.
history = model.fit(x=feature,
y=label,
batch_size=batch_size,
epochs=epochs)
# Gather the trained model's weight and bias.
trained_weight = model.get_weights()[0]
trained_bias = model.get_weights()[1]
# The list of epochs is stored separately from the
# rest of history.
epochs = history.epoch
# Gather the history (a snapshot) of each epoch.
hist = pd.DataFrame(history.history)
# Specifically gather the model's root mean
#squared error at each epoch.
rmse = hist["root_mean_squared_error"]
return trained_weight, trained_bias, epochs, rmse
print("Defined create_model and train_model")
We're using a popular Python library called Matplotlib to create the following two plots:
We hid the following code cell because learning Matplotlib is not relevant to the learning objectives. Regardless, you must still run all hidden code cells.
In [0]:
#@title Define the plotting functions
def plot_the_model(trained_weight, trained_bias, feature, label):
"""Plot the trained model against the training feature and label."""
# Label the axes.
plt.xlabel("feature")
plt.ylabel("label")
# Plot the feature values vs. label values.
plt.scatter(feature, label)
# Create a red line representing the model. The red line starts
# at coordinates (x0, y0) and ends at coordinates (x1, y1).
x0 = 0
y0 = trained_bias
x1 = my_feature[-1]
y1 = trained_bias + (trained_weight * x1)
plt.plot([x0, x1], [y0, y1], c='r')
# Render the scatter plot and the red line.
plt.show()
def plot_the_loss_curve(epochs, rmse):
"""Plot the loss curve, which shows loss vs. epoch."""
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Root Mean Squared Error")
plt.plot(epochs, rmse, label="Loss")
plt.legend()
plt.ylim([rmse.min()*0.97, rmse.max()])
plt.show()
print("Defined the plot_the_model and plot_the_loss_curve functions.")
In [0]:
my_feature = ([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0])
my_label = ([5.0, 8.8, 9.6, 14.2, 18.8, 19.5, 21.4, 26.8, 28.9, 32.0, 33.8, 38.2])
In [0]:
learning_rate=0.01
epochs=10
my_batch_size=12
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
Examine the top graph. The blue dots identify the actual data; the red line identifies the output of the trained model. Ideally, the red line should align nicely with the blue dots. Does it? Probably not.
A certain amount of randomness plays into training a model, so you'll get somewhat different results every time you train. That said, unless you are an extremely lucky person, the red line probably doesn't align nicely with the blue dots.
Examine the bottom graph, which shows the loss curve. Notice that the loss curve decreases but doesn't flatten out, which is a sign that the model hasn't trained sufficiently.
Training loss should steadily decrease, steeply at first, and then more slowly. Eventually, training loss should eventually stay steady (zero slope or nearly zero slope), which indicates that training has converged.
In Task 1, the training loss did not converge. One possible solution is to train for more epochs. Your task is to increase the number of epochs sufficiently to get the model to converge. However, it is inefficient to train past convergence, so don't just set the number of epochs to an arbitrarily high value.
Examine the loss curve. Does the model converge?
In [0]:
learning_rate=0.01
epochs= ? # Replace ? with an integer.
my_batch_size=12
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
In [0]:
#@title Double-click to view a possible solution
learning_rate=0.01
epochs=450
my_batch_size=12
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
# The loss curve suggests that the model does converge.
In Task 2, you increased the number of epochs to get the model to converge. Sometimes, you can get the model to converge more quickly by increasing the learning rate. However, setting the learning rate too high often makes it impossible for a model to converge. In Task 3, we've intentionally set the learning rate too high. Run the following code cell and see what happens.
In [0]:
# Increase the learning rate and decrease the number of epochs.
learning_rate=100
epochs=500
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
The resulting model is terrible; the red line doesn't align with the blue dots. Furthermore, the loss curve oscillates like a roller coaster. An oscillating loss curve strongly suggests that the learning rate is too high.
In [0]:
# Set the learning rate and number of epochs
learning_rate= ? # Replace ? with a floating-point number
epochs= ? # Replace ? with an integer
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
In [0]:
#@title Double-click to view a possible solution
learning_rate=0.14
epochs=70
my_batch_size=12
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
The system recalculates the model's loss value and adjusts the model's weights and bias after each iteration. Each iteration is the span in which the system processes one batch. For example, if the batch size is 6, then the system recalculates the model's loss value and adjusts the model's weights and bias after processing every 6 examples.
One epoch spans sufficient iterations to process every example in the dataset. For example, if the batch size is 12, then each epoch lasts one iteration. However, if the batch size is 6, then each epoch consumes two iterations.
It is tempting to simply set the batch size to the number of examples in the dataset (12, in this case). However, the model might actually train faster on smaller batches. Conversely, very small batches might not contain enough information to help the model converge.
Experiment with batch_size
in the following code cell. What's the smallest integer you can set for batch_size
and still have the model converge in a hundred epochs?
In [0]:
learning_rate=0.05
epochs=100
my_batch_size= ? # Replace ? with an integer.
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
In [0]:
#@title Double-click to view a possible solution
learning_rate=0.05
epochs=125
my_batch_size=1 # Wow, a batch size of 1 works!
my_model = build_model(learning_rate)
trained_weight, trained_bias, epochs, rmse = train_model(my_model, my_feature,
my_label, epochs,
my_batch_size)
plot_the_model(trained_weight, trained_bias, my_feature, my_label)
plot_the_loss_curve(epochs, rmse)
Most machine learning problems require a lot of hyperparameter tuning. Unfortunately, we can't provide concrete tuning rules for every model. Lowering the learning rate can help one model converge efficiently but make another model converge much too slowly. You must experiment to find the best set of hyperparameters for your dataset. That said, here are a few rules of thumb:
Remember: the ideal combination of hyperparameters is data dependent, so you must always experiment and verify.