In [ ]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Note: This is an archived TF1 notebook. These are configured to run in TF2's compatbility mode but will run in TF1 as well. To use TF1 in Colab, use the magic.
TensorFlow's eager execution is an imperative programming environment that
evaluates operations immediately, without building graphs: operations return
concrete values instead of constructing a computational graph to run later. This
makes it easy to get started with TensorFlow and debug models, and it
reduces boilerplate as well. To follow along with this guide, run the code
samples below in an interactive python
interpreter.
Eager execution is a flexible machine learning platform for research and experimentation, providing:
Eager execution supports most TensorFlow operations and GPU acceleration. For a collection of examples running in eager execution, see: tensorflow/contrib/eager/python/examples.
Note: Some models may experience increased overhead with eager execution enabled. Performance improvements are ongoing, but please file a bug if you find a problem and share your benchmarks.
To start eager execution, add `` to the beginning of the program or console session. Do not add this operation to other modules that the program calls.
In [ ]:
import tensorflow.compat.v1 as tf
Now you can run TensorFlow operations and the results will return immediately:
In [ ]:
tf.executing_eagerly()
In [ ]:
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))
Enabling eager execution changes how TensorFlow operations behave—now they
immediately evaluate and return their values to Python. tf.Tensor
objects
reference concrete values instead of symbolic handles to nodes in a computational
graph. Since there isn't a computational graph to build and run later in a
session, it's easy to inspect results using print()
or a debugger. Evaluating,
printing, and checking tensor values does not break the flow for computing
gradients.
Eager execution works nicely with NumPy. NumPy
operations accept tf.Tensor
arguments. TensorFlow
math operations convert
Python objects and NumPy arrays to tf.Tensor
objects. The
tf.Tensor.numpy
method returns the object's value as a NumPy ndarray
.
In [ ]:
a = tf.constant([[1, 2],
[3, 4]])
print(a)
In [ ]:
# Broadcasting support
b = tf.add(a, 1)
print(b)
In [ ]:
# Operator overloading is supported
print(a * b)
In [ ]:
# Use NumPy values
import numpy as np
c = np.multiply(a, b)
print(c)
In [ ]:
# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
# [3 4]]
A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. So, for example, it is easy to write fizzbuzz:
In [ ]:
def fizzbuzz(max_num):
counter = tf.constant(0)
max_num = tf.convert_to_tensor(max_num)
for num in range(1, max_num.numpy()+1):
num = tf.constant(num)
if int(num % 3) == 0 and int(num % 5) == 0:
print('FizzBuzz')
elif int(num % 3) == 0:
print('Fizz')
elif int(num % 5) == 0:
print('Buzz')
else:
print(num.numpy())
counter += 1
In [ ]:
fizzbuzz(15)
This has conditionals that depend on tensor values and it prints these values at runtime.
Many machine learning models are represented by composing layers. When
using TensorFlow with eager execution you can either write your own layers or
use a layer provided in the tf.keras.layers
package.
While you can use any Python object to represent a layer,
TensorFlow has tf.keras.layers.Layer
as a convenient base class. Inherit from
it to implement your own layer:
In [ ]:
class MySimpleLayer(tf.keras.layers.Layer):
def __init__(self, output_units):
super(MySimpleLayer, self).__init__()
self.output_units = output_units
def build(self, input_shape):
# The build method gets called the first time your layer is used.
# Creating variables on build() allows you to make their shape depend
# on the input shape and hence removes the need for the user to specify
# full shapes. It is possible to create variables during __init__() if
# you already know their full shapes.
self.kernel = self.add_variable(
"kernel", [input_shape[-1], self.output_units])
def call(self, input):
# Override call() instead of __call__ so we can perform some bookkeeping.
return tf.matmul(input, self.kernel)
Use tf.keras.layers.Dense
layer instead of MySimpleLayer
above as it has
a superset of its functionality (it can also add a bias).
When composing layers into models you can use tf.keras.Sequential
to represent
models which are a linear stack of layers. It is easy to use for basic models:
In [ ]:
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, input_shape=(784,)), # must declare input shape
tf.keras.layers.Dense(10)
])
Alternatively, organize models in classes by inheriting from tf.keras.Model
.
This is a container for layers that is a layer itself, allowing tf.keras.Model
objects to contain other tf.keras.Model
objects.
In [ ]:
class MNISTModel(tf.keras.Model):
def __init__(self):
super(MNISTModel, self).__init__()
self.dense1 = tf.keras.layers.Dense(units=10)
self.dense2 = tf.keras.layers.Dense(units=10)
def call(self, input):
"""Run the model."""
result = self.dense1(input)
result = self.dense2(result)
result = self.dense2(result) # reuse variables from dense2 layer
return result
model = MNISTModel()
It's not required to set an input shape for the tf.keras.Model
class since
the parameters are set the first time input is passed to the layer.
tf.keras.layers
classes create and contain their own model variables that
are tied to the lifetime of their layer objects. To share layer variables, share
their objects.
Automatic differentiation
is useful for implementing machine learning algorithms such as
backpropagation for training
neural networks. During eager execution, use tf.GradientTape
to trace
operations for computing gradients later.
tf.GradientTape
is an opt-in feature to provide maximal performance when
not tracing. Since different operations can occur during each call, all
forward-pass operations get recorded to a "tape". To compute the gradient, play
the tape backwards and then discard. A particular tf.GradientTape
can only
compute one gradient; subsequent calls throw a runtime error.
In [ ]:
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
loss = w * w
grad = tape.gradient(loss, w)
print(grad) # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)
In [ ]:
# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
(tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),
tf.cast(mnist_labels,tf.int64)))
dataset = dataset.shuffle(1000).batch(32)
In [ ]:
# Build the model
mnist_model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(10)
])
Even without training, call the model and inspect the output in eager execution:
In [ ]:
for images,labels in dataset.take(1):
print("Logits: ", mnist_model(images[0:1]).numpy())
While keras models have a builtin training loop (using the fit
method), sometimes you need more customization. Here's an example, of a training loop implemented with eager:
In [ ]:
optimizer = tf.train.AdamOptimizer()
loss_history = []
In [ ]:
for (batch, (images, labels)) in enumerate(dataset.take(400)):
if batch % 10 == 0:
print('.', end='')
with tf.GradientTape() as tape:
logits = mnist_model(images, training=True)
loss_value = tf.losses.sparse_softmax_cross_entropy(labels, logits)
loss_history.append(loss_value.numpy())
grads = tape.gradient(loss_value, mnist_model.trainable_variables)
optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables),
global_step=tf.train.get_or_create_global_step())
In [ ]:
import matplotlib.pyplot as plt
plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')
tf.Variable
objects store mutable tf.Tensor
values accessed during
training to make automatic differentiation easier. The parameters of a model can
be encapsulated in classes as variables.
Better encapsulate model parameters by using tf.Variable
with
tf.GradientTape
. For example, the automatic differentiation example above
can be rewritten:
In [ ]:
class Model(tf.keras.Model):
def __init__(self):
super(Model, self).__init__()
self.W = tf.Variable(5., name='weight')
self.B = tf.Variable(10., name='bias')
def call(self, inputs):
return inputs * self.W + self.B
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise
# The loss function to be optimized
def loss(model, inputs, targets):
error = model(inputs) - targets
return tf.reduce_mean(tf.square(error))
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return tape.gradient(loss_value, [model.W, model.B])
# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
# Training loop
for i in range(300):
grads = grad(model, training_inputs, training_outputs)
optimizer.apply_gradients(zip(grads, [model.W, model.B]),
global_step=tf.train.get_or_create_global_step())
if i % 20 == 0:
print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))
print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))
With graph execution, program state (such as the variables) is stored in global
collections and their lifetime is managed by the tf.Session
object. In
contrast, during eager execution the lifetime of state objects is determined by
the lifetime of their corresponding Python object.
During eager execution, variables persist until the last reference to the object is removed, and is then deleted.
In [ ]:
if tf.test.is_gpu_available():
with tf.device("gpu:0"):
v = tf.Variable(tf.random_normal([1000, 1000]))
v = None # v no longer takes up GPU memory
In [ ]:
x = tf.Variable(10.)
checkpoint = tf.train.Checkpoint(x=x)
In [ ]:
x.assign(2.) # Assign a new value to the variables and save.
checkpoint_path = './ckpt/'
checkpoint.save('./ckpt/')
In [ ]:
x.assign(11.) # Change the variable after saving.
# Restore values from the checkpoint
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path))
print(x) # => 2.0
To save and load models, tf.train.Checkpoint
stores the internal state of objects,
without requiring hidden variables. To record the state of a model
,
an optimizer
, and a global step, pass them to a tf.train.Checkpoint
:
In [ ]:
import os
import tempfile
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(10)
])
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
checkpoint_dir = tempfile.mkdtemp()
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
root = tf.train.Checkpoint(optimizer=optimizer,
model=model,
optimizer_step=tf.train.get_or_create_global_step())
root.save(checkpoint_prefix)
root.restore(tf.train.latest_checkpoint(checkpoint_dir))
In [ ]:
m = tf.keras.metrics.Mean("loss")
m(0)
m(5)
m.result() # => 2.5
m([8, 9])
m.result() # => 5.5
TensorBoard is a visualization tool for understanding, debugging and optimizing the model training process. It uses summary events that are written while executing the program.
TensorFlow 1 summaries only work in eager mode, but can be run with the compat.v2
module:
In [ ]:
from tensorflow.compat.v2 import summary
global_step = tf.train.get_or_create_global_step()
logdir = "./tb/"
writer = summary.create_file_writer(logdir)
writer.set_as_default()
for _ in range(10):
global_step.assign_add(1)
# your model code goes here
summary.scalar('global_step', global_step, step=global_step)
In [ ]:
!ls tb/
tf.GradientTape
can also be used in dynamic models. This example for a
backtracking line search
algorithm looks like normal NumPy code, except there are gradients and is
differentiable, despite the complex control flow:
In [ ]:
def line_search_step(fn, init_x, rate=1.0):
with tf.GradientTape() as tape:
# Variables are automatically recorded, but manually watch a tensor
tape.watch(init_x)
value = fn(init_x)
grad = tape.gradient(value, init_x)
grad_norm = tf.reduce_sum(grad * grad)
init_value = value
while value > init_value - rate * grad_norm:
x = init_x - rate * grad
value = fn(x)
rate /= 2.0
return x, value
In [ ]:
@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
y = tf.identity(x)
def grad_fn(dresult):
return [tf.clip_by_norm(dresult, norm), None]
return y, grad_fn
Custom gradients are commonly used to provide a numerically stable gradient for a sequence of operations:
In [ ]:
def log1pexp(x):
return tf.log(1 + tf.exp(x))
class Grad(object):
def __init__(self, f):
self.f = f
def __call__(self, x):
x = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
tape.watch(x)
r = self.f(x)
g = tape.gradient(r, x)
return g
In [ ]:
grad_log1pexp = Grad(log1pexp)
In [ ]:
# The gradient computation works fine at x = 0.
grad_log1pexp(0.).numpy()
In [ ]:
# However, x = 100 fails because of numerical instability.
grad_log1pexp(100.).numpy()
Here, the log1pexp
function can be analytically simplified with a custom
gradient. The implementation below reuses the value for tf.exp(x)
that is
computed during the forward pass—making it more efficient by eliminating
redundant calculations:
In [ ]:
@tf.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
grad_log1pexp = Grad(log1pexp)
In [ ]:
# As before, the gradient computation works fine at x = 0.
grad_log1pexp(0.).numpy()
In [ ]:
# And the gradient computation also works at x = 100.
grad_log1pexp(100.).numpy()
In [ ]:
import time
def measure(x, steps):
# TensorFlow initializes a GPU the first time it's used, exclude from timing.
tf.matmul(x, x)
start = time.time()
for i in range(steps):
x = tf.matmul(x, x)
# tf.matmul can return before completing the matrix multiplication
# (e.g., can return after enqueing the operation on a CUDA stream).
# The x.numpy() call below will ensure that all enqueued operations
# have completed (and will also copy the result to host memory,
# so we're including a little more than just the matmul operation
# time).
_ = x.numpy()
end = time.time()
return end - start
shape = (1000, 1000)
steps = 200
print("Time to multiply a {} matrix by itself {} times:".format(shape, steps))
# Run on CPU:
with tf.device("/cpu:0"):
print("CPU: {} secs".format(measure(tf.random_normal(shape), steps)))
# Run on GPU, if available:
if tf.test.is_gpu_available():
with tf.device("/gpu:0"):
print("GPU: {} secs".format(measure(tf.random_normal(shape), steps)))
else:
print("GPU: not found")
A tf.Tensor
object can be copied to a different device to execute its
operations:
In [ ]:
if tf.test.is_gpu_available():
x = tf.random_normal([10, 10])
x_gpu0 = x.gpu()
x_cpu = x.cpu()
_ = tf.matmul(x_cpu, x_cpu) # Runs on CPU
_ = tf.matmul(x_gpu0, x_gpu0) # Runs on GPU:0
For compute-heavy models, such as ResNet50 training on a GPU, eager execution performance is comparable to graph execution. But this gap grows larger for models with less computation and there is work to be done for optimizing hot code paths for models with lots of small operations.
While eager execution makes development and debugging more interactive, TensorFlow graph execution has advantages for distributed training, performance optimizations, and production deployment. However, writing graph code can feel different than writing regular Python code and more difficult to debug.
For building and training graph-constructed models, the Python program first
builds a graph representing the computation, then invokes Session.run
to send
the graph for execution on the C++-based runtime. This provides:
Deploying code written for eager execution is more difficult: either generate a graph from the model, or run the Python runtime and code directly on the server.
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.
Most TensorFlow operations work during eager execution, but there are some things to keep in mind:
tf.data
for input processing instead of queues. It's faster and easier.tf.keras.layers
and
tf.keras.Model
—since they have explicit storage for variables.tf.enable_eager_execution
, it
cannot be turned off. Start a new Python session to return to graph execution.It's best to write code for both eager execution and graph execution. This gives you eager's interactive experimentation and debuggability with the distributed performance benefits of graph execution.
Write, debug, and iterate in eager execution, then import the model graph for
production deployment. Use tf.train.Checkpoint
to save and restore model
variables, this allows movement between eager and graph execution environments.
See the examples in:
tensorflow/contrib/eager/python/examples.
In [ ]:
def my_py_func(x):
x = tf.matmul(x, x) # You can use tf ops
print(x) # but it's eager!
return x
with tf.Session() as sess:
x = tf.placeholder(dtype=tf.float32)
# Call eager function in graph!
pf = tf.py_func(my_py_func, [x], tf.float32)
sess.run(pf, feed_dict={x: [[2.0]]}) # [[4.0]]