This notebook introduces the eager execution for TensorFlow, a low-level interface allowing a more dynamic programming experience. Eager execution greatly simplifies how you can write and debug models, softening the complete separation between the definition of the operations and their execution in the standard TensorFlow interface.
Static and dynamic graph computation
Additional: experimenting with eager execution in the nightly branch
A neural network with eager execution
Variables and gradients with eager execution
Gradients' computation and model's optimization
Evaluating the model and plotting the results
To understand the need for eager execution, consider a simple example:
In [1]:
import tensorflow as tf
a = tf.constant(3.0)
b = a + 2.0
print(b)
If you never played with the low-level components of TensorFlow before, you probably would have expected the print operation to show the value of b
at this point. Instead, we have to fetch the value of the variable by running the operation inside a Session
object:
In [2]:
sess = tf.Session()
with sess.as_default():
print(sess.run(b))
Now, let's keep exploring and add some intermediate computations:
In [3]:
with tf.Session() as sess:
c = sess.run(1.5*b)
print(b)
Once again, you might have expected to obtain the value of b
, since it was clearly computed to obtain c
. While there are several methods to get it (such as calling its eval
method), none of them is as satisfying as writing pure NumPy code, and none of them lends itself to an immediate debug using a visual editor.
Now consider the same code written with the eager execution enabled:
a = tf.constant(3.0)
b = a + 2.0
print(b) # print tf.Tensor(5.0, shape=(), dtype=float32)
We can now get the value immediately! With eager execution enabled, definition and execution are tied, with one following the other automatically. Let us see how to enable it before moving on to more interesting examples.
Eager was introduced experimentally in TensorFlow v1.5, but in order to use all functionalities, we need to install the latest version (v1.7rc0 as of this writing). To upgrade from terminal (replace tensorflow with tensorflow-gpu for the version with GPU support):
In [10]:
!pip install tensorflow==v1.7rc0
Eager is enabled with a single line:
In [1]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tf.enable_eager_execution()
The previous instruction should always be run at the beginning of a program. If it fails, simply reset the runtime of the notebook from Runtime >> Restart runtime. If you are working with v1.5 or v1.6, replace tf.enable_eager_execution()
with tfe.enable_eager_execution()
Since eager is under development, new features are being constantly included (e.g., additional layers) and are not immediately available on the main branch. If you want to experiment with the latest eager module, the recommended way is to setup a virtual environment and install the nightly branch of TF.
The virtual environment can be created using either virtualenv or the Anaconda distribution, and the installation is specific to the OS. For example, using virtualenv on a Unix platform we can create and activate a new environment as follows:
virtualenv --system-site-packages -p python3 tfeager
source ~/tfeager/bin/activate
The official installation guide of TensorFlow describes the proper commands for other systems as well. Once a clean virtual environment is running, you can install the nightly build of TensorFlow as a standard Python package:
!pip install tf-nightly # replace with tf-nightly-gpu for GPU support
We are going to code a simple neural network on the Iris dataset to highlight some differences with the standard TF execution. For the purpose of this demo, we will load the scikit-learn version of Iris, preprocess it, and split it into two sets:
In [0]:
from sklearn import datasets, preprocessing, model_selection
data = datasets.load_iris()
# Feature normalization on the input
X = preprocessing.MinMaxScaler(feature_range=(-1,+1)).fit_transform(data['data'])
# Encode the output using the one-hot encoding
y = preprocessing.OneHotEncoder(sparse=False).fit_transform(data['target'].reshape(-1, 1))
# Split in train/test sets
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.25, stratify=y)
Before going straight to the model definition, let us see how variables and gradients are handled under the eager execution. First, eager comes with its own implementation of variables, which are automatically initalized when requested:
In [3]:
W = tfe.Variable(0.5, name='w')
print(W)
To further simplify development, in eager execution you can mix TF Tensors and NumPy arrays with automatic casting underneath:
In [4]:
print(W + np.asarray([1, 3]))
And you can extract the NumPy representation of a Tensor with a new numpy
method:
In [6]:
W.numpy()
Out[6]:
Finally, we can initialize an eager variable using the standard get_variable
method, by specifying a flag:
In [7]:
W2 = tf.get_variable('W2', shape=[1], use_resource=True)
print(W2)
Gradients computation is where the real simplicity the eager execution comes to fruition. When eager execution is enabled, functions can be defined using standard Python syntax, and the tfe.gradients_function
will return the handle to another Python function to compute the gradient:
In [8]:
def f(X):
return 1.0 + 2.0 * X
f_grad = tfe.gradients_function(f)
print(f_grad(0.3))
In programming terms, we are now dealing with an imperative definition, as opposed to a declarative interface.
We can also use tfe.value_and_gradients_function
to compute the value and the gradients simultaneously. By default, the gradient is computed with respect to all the parameters of the function. We can specify a list of parameters explicitly, either by name or by indexing:
In [9]:
a = tf.constant(0.3)
b = tf.constant(0.5)
def f(a, b):
return a*b
# Return the gradient for the first parameter only
print(tfe.gradients_function(f, params=[0])(1.0, 1.0))
# Alternative definition (by name)
# print(tfe.gradients_function(f, params=['a'])(1.0, 1.0))
And we can chain the use of tfe.gradients_function
to get higher-order derivatives:
In [10]:
def f(X):
return tf.square(X)
# Second-order derivative
f_gg = tfe.gradients_function(tfe.gradients_function(f))
f_gg(1.0)
Out[10]:
While we can use variables to directly define our model, this is clearly unpractical as soon as the model starts to get complicated. To simplify the development with eager execution enabled, there are a number of strategies we can adopt.
First, we can use all the blocks defined in the layers' module, which automatically detect we are working with eager enabled and initialize their variables accordingly. For example, this is a simple linear model:
In [0]:
lin = tf.layers.Dense(units=3, use_bias=True, activation=None)
We can stack multiple layers to create more complicated models. For example, a neural network with one hidden layer and dropout in the middle:
In [0]:
hid = tf.layers.Dense(units=10, activation=tf.nn.relu)
drop = tf.layers.Dropout()
out = tf.layers.Dense(units=3, activation=None)
def nn_model(x, training=False):
return out(drop(hid(x), training=training))
Note the training
flag we can use to differentiate between training and test (for the dropout layer). This is a standard practice for all layers that have a different behaviour in the two cases.
For more complicated models, it becomes useful to have an object-oriented interface to keep track of how the layers are organized. TF provides such a functionality with the Model object. Consider again a neural network with just one hidden layer:
In [0]:
class SingleHiddenLayerNetwork(tf.keras.Model):
def __init__(self):
super(SingleHiddenLayerNetwork, self).__init__()
self.hidden_layer = tf.layers.Dense(10, activation=tf.nn.tanh, use_bias=True)
self.output_layer = tf.layers.Dense(3, use_bias=True, activation=None)
def call(self, x):
return self.output_layer(self.hidden_layer(x))
net = SingleHiddenLayerNetwork()
It is relatively similar to the functional interface of Keras. If you are working with v1.5 or v1.6, you need to use tfe.Network
instead of tf.keras.Model
. Note that variables are not yet initialized:
In [14]:
len(net.variables)
Out[14]:
However, it is enough to use the model a single time to automatically trigger the initialization. Networks objects can be called as if they were functions:
In [15]:
net(tf.constant(X_train[0:1]))
len(net.variables)
Out[15]:
Networks have several additional utilities to handle the models. For example, we can count the number of adaptable parameters of the model:
In [16]:
net.count_params()
Out[16]:
For simple sequential models, eager execution also provides a short-hand with the Sequential
object; the following is equivalent to the previous model definition:
In [0]:
net = tfe.Sequential(layers_funcs=[
tf.layers.Dense(10, activation=tf.nn.tanh, use_bias=True),
tf.layers.Dense(3, use_bias=True, activation=None)
])
The next step is to define the input function to handle our dataset. Since the latest versions of TF, the preferred method is the use of the data pipeline with the Dataset object:
In [0]:
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
To cycle over the data, we can use the eager module tfe.Iterator
instead of tf.data.Iterator
, which acts as a standard Python iterator. For example, we can cycle over each batch in the dataset and print the proportion of the first class in each batch:
In [24]:
import numpy as np
for xb,yb in tfe.Iterator(train_dataset.batch(32)):
print('Percentage of class [0]: ', tf.reduce_mean(yb[:, 0]).numpy()*100, ' %')
from_tensor_slices
creates a dataset having one element for each row of our original tensors. If you don't need batching, from_tensors
will treat the entire tensor as a single element:
In [25]:
train_dataset_alt = tf.data.Dataset.from_tensors((X_train, y_train))
for xb, yb in tfe.Iterator(train_dataset_alt):
# Check that the batch is equivalent to the entire training array
assert(np.all(X_train == xb.numpy()))
# Compute the percentage of labels for the first class
print('Percentage of class [0] (entire dataset): ', tf.reduce_mean(yb[:, 0]).numpy()*100, ' %')
We can apply further transformations to the dataset before processing it, such as repeating the entire dataset twice:
In [26]:
for xb,yb in tfe.Iterator(train_dataset.repeat(2).batch(32)):
print('Percentage of class [0]: ', tf.reduce_mean(yb[:, 0]).numpy()*100, ' %')
Or shuffling the dataset each time we cycle over it:
In [27]:
for xb,yb in tfe.Iterator(train_dataset.shuffle(1000).batch(32)):
print('Percentage of class [0]: ', tf.reduce_mean(yb[:, 0]).numpy()*100, ' %')
The parameter for the shuffle
method is a buffer dimension: if we need to process very large datasets, it would be unfeasible to shuffle them uniformly. In this case, the buffer size specifies the dimension of blocks that we will randomly shuffle. If we set the buffer parameter larger than the size of the dataset, this is equivalent to a uniform shuffling.
It is now time to define our cost function:
In [0]:
def loss(net, inputs, labels):
return tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(
logits=net(inputs), labels=labels))
We are using softmax_cross_entropy_with_logits_v2
instead of the older softmax_cross_entropy_with_logits
, which would throw a warning as it will be deprecated in future releases. The two are identical, except that the new version allows to back-propagate through the labels by default: while this is entirely useless in this case, it is needed when the labels are coming from another neural model (as in generative networks).
Note again that the cost is defined explicitly as a Python function, instead of as a given node in a TensorFlow graph.
In order to better understand the eager execution, we are going to explicitly compute the gradients and apply them via the apply_gradients
function of the optimizer. In particular, tfe.implicit_gradients
works like tfe.gradients_function
, but automatically returns the gradients of all variables of our model:
In [0]:
loss_grad = tfe.implicit_gradients(loss)
Once again, another variant is the implicit_value_and_gradients
, which returns both the value of the function and its gradients:
In [0]:
loss_and_grads = tfe.implicit_value_and_gradients(loss)
Using the first syntax, the optimization cycle is now a trivial matter:
In [31]:
net = SingleHiddenLayerNetwork()
opt = tf.train.AdamOptimizer(learning_rate=0.01)
# Loop over the epochs
for epoch in range(50):
# For each epoch we shuffle the dataset
for (xb, yb) in tfe.Iterator(train_dataset.shuffle(1000).batch(32)):
opt.apply_gradients(loss_grad(net, xb, yb))
# Training accuracy at the end of each tenth epoch
if epoch % 10 == 0:
print("Epoch %d: Loss on training set : %f" %
(epoch, loss(net, X_train, y_train).numpy()))
Note how, when compared to the classical TensorFlow low-level interface, the previous code tends to be more readable, and closely resembles what the code would have looked like had we only used NumPy.
In the latest release, the inner optimization can also be simplified using the standard minimize
interface of the optimizer, by constructing a parameter-free anonymous function:
In [0]:
opt.minimize(lambda: loss(net, xb, yb))
Two common debug tools in TensorFlow are the summaries and the metrics, which have a corresponding new implementation suitable to the eager execution. Starting from the latter, a set of pre-defined metrics can be found in the tfe.metrics
module, which can be used in a standard object-oriented fashion:
In [0]:
accuracy = tfe.metrics.Accuracy()
We can accumulate values inside the metric and print an average result as follows:
In [34]:
accuracy(tf.argmax(net(tf.constant(X_test)), axis=1), tf.argmax(tf.constant(y_test), axis=1))
print('Final test accuracy is: ', accuracy.result().numpy())
Let us rewrite the optimization code, this time by accumulating the training accuracy at each epoch:
In [0]:
net = SingleHiddenLayerNetwork()
opt = tf.train.AdamOptimizer(learning_rate=0.01)
# Numpy array to keep track of the accuracy
acc_history = np.zeros(50)
# Loop over the epochs
for epoch in range(50):
# Initialize the metric
accuracy = tfe.metrics.Accuracy()
# For each epoch we shuffle the dataset
for (xb, yb) in tfe.Iterator(train_dataset.shuffle(1000).batch(32)):
opt.apply_gradients(loss_grad(net, xb, yb))
# Save the training accuracy on the batch
accuracy(tf.argmax(net(tf.constant(xb)), axis=1), tf.argmax(tf.constant(yb), axis=1))
# Save the overall accuracy in our vector
acc_history[epoch] = accuracy.result().numpy()
We can use Matplotlib to plot the resulting accuracy:
In [36]:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(acc_history)
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
Similarly, eager comes with its own implementation of the summaries. In order to work with them, we first define a writer on disk:
In [0]:
writer = tf.contrib.summary.create_file_writer('tmp')
In order to save a summary, we need to select the writer as the default one, and tell TF that we want to save summaries everytime they are computed:
In [0]:
with writer.as_default():
with tf.contrib.summary.always_record_summaries():
tf.contrib.summary.scalar('scalar_value', 0.5)
Alternatively, we can instruct TF to save summaries only every given number of steps:
In [0]:
with writer.as_default():
with tf.contrib.summary.record_summaries_every_n_global_steps(5):
tf.contrib.summary.scalar('scalar_value', 0.5) # Will only save every 5 steps
The 'global step' is a variable inside TF that keeps track of the iterations of our optimization algorithm:
In [40]:
tf.train.get_global_step()
Out[40]:
Note that the variable is currently set to 0. If we want to update it correctly, we need to provide the global step during the optimization cycle:
In [41]:
opt.apply_gradients(loss_grad(net, xb, yb), global_step=tf.train.get_or_create_global_step()) # Will apply gradients AND increase the step by one
tf.train.get_global_step()
Out[41]:
Alternatively, we can provide our own global step to the summary operation. This is particularly easy with eager execution enabled because we can work with standard int64 values:
In [0]:
with writer.as_default():
with tf.contrib.summary.record_summaries_every_n_global_steps(5):
tf.contrib.summary.scalar('scalar_value', 0.5, step=4) # This will save the value on disk
In the following example, we extend again our optimization routine to save the loss value on disk at every iteration:
In [0]:
net = SingleHiddenLayerNetwork()
opt = tf.train.AdamOptimizer(learning_rate=0.01)
with writer.as_default():
with tf.contrib.summary.always_record_summaries():
# Loop over the epochs
for epoch in range(50):
# For each epoch we shuffle the dataset
for (xb, yb) in tfe.Iterator(train_dataset.shuffle(1000).batch(32)):
tf.contrib.summary.scalar('loss_value', loss(net, xb,yb))
opt.minimize(lambda: loss(net, xb,yb), global_step=tf.train.get_or_create_global_step())
Now launch the tensorboard to visualize the loss:
In [21]:
!tensorboard --logdir=tmp
If you are running this notebook from a local machine, you can navigate to the address above to visualize the training details in the TensorBoard itself:
With eager execution enabled, computation on the GPU is not enabled by default (if a GPU is present). We can inspect if we have a GPU on the system easily:
In [44]:
tfe.num_gpus()
Out[44]:
In order to enable the GPU, we can specify a device explicitly for each operation that we want to run on that device:
In [0]:
with tf.device("/gpu:0"):
net(X_train[0:1, :])
Alternatively, we can move the data to the GPU before running the computation:
In [60]:
net = net.gpu()
net(tf.constant(X_train[0:1, :]).gpu())
Out[60]:
Given a variable x
on the GPU, we can perform the inverse operation similarly:
x = x.cpu()
Eager provides a simple interface for saving variables on disk:
In [45]:
checkpointer = tfe.Checkpoint(W=W)
checkpointer.save('tmp/')
checkpointer.restore('tmp/')
Out[45]:
In order to save a comprehensive snapshot, we can save all variables of the model together with the optimizer's state, e.g.:
In [47]:
checkpointer = tfe.Checkpoint(net=net,opt=opt, global_step=tf.train.get_or_create_global_step())
checkpointer.save('tmp/')
Out[47]:
We modify one again our training routine, this time by saving all variables at the end of every epoch:
In [0]:
net = SingleHiddenLayerNetwork()
opt = tf.train.AdamOptimizer(learning_rate=0.01)
checkpointer = tfe.Checkpoint(net=net,opt=opt, global_step=tf.train.get_or_create_global_step())
for epoch in range(50):
for (xb, yb) in tfe.Iterator(train_dataset.shuffle(1000).batch(32)):
opt.minimize(lambda: loss(net, xb,yb))
checkpointer.save('tmp/')
Eager provides a method to restore the latest checkpoint from the disk:
In [49]:
checkpointer.restore(tf.train.latest_checkpoint('tmp/'))
Out[49]:
In this tutorial, we have seen that developing at a low-level with the new eager execution can be easier (and more simple to debug) that with the standard low-level interface. Since the module is still experimental, some advanced features (such as distributed training of graphs) are not yet available when running with the eager execution enabled, and the interface is still prone to minor changes. We refer to the official guide for more information on all these aspects.