TensorBoard, basic visualization of training and the minds of models

Tensors can be logged and viewed in a browser-based learning visualization toolkit called TensorBoard. Scalar tensors are shown as graphs, multidimensional tensors are shown as histograms and images are simply displayed. The computational graph is also displayed. There are tools for visualizations of high dimensional data in lower dimensions.

TensorBoard generates visualizations from generated summary data saved at a log directory by a TensorFlow summary writer. A summary is a special TensorFlow operation that takes in a tensor from a graph and outputs protocol buffers that can be written to a drive. Specifically, in a TensorFlow run, summary operations are evaluated, saved at the log directory using the summary writer, and then read continuously by TensorBoard, which visualizes the information in a browser continuously. The summary writer saves to an append-only record dump that has "tfevents" in filename.

Current supported summary operations are as follows:

  • tf.summary.scalar
  • tf.summary.histogram
  • tf.summary.image
  • tf.summary.audio
  • tf.summary.text

Upcoming is tf.summary.tensor, which is a summary that can write out any type of value because everything in TensorFlow is a tensor.

scalar dashboard

The scalar dashboard visualizes scalar statistics that vary over time, such as a model's loss or learning rate.

histogram dashboard

The histogram dashboard visualizes statistical distributions of tensors that vary over time. Each plot displays temporal slices of data, where each slice is a histogram of the tensor at a given step. Earlier times are towards the back while more recent times are towards the front.

The the appearance of the histograms was designed by Shan Carter, who used to make interactive graphics in the New York Times and was inspired by the cover of the album How do you Feel? by Joywave.

distribution dashboard

The distribution dashboard is another way to visualize histogram data. It displays high-level statistics on distributions. Each line on a plot represents a percentile in the distribution over the data. For example, the bottom line shows how the minimum value has changed over time and the middle line shows how the median has changed. In effect, the lines are drawn such that they produce colored regions having widths ${\sigma}$, ${2\sigma}$ and ${3\sigma}$ respectively.

image dashboard

Each row corresponds to a different tag and each column corresponds to a run, with always the latest image shown for each tag. Custom visualizations could be displayed on this dashboard (e.g. matplotlib plots).

audio dashboard

Playable widgets can be embedded in this dashboard. Each row corresponds to a different tag and each column corresponds to a run, with always the latest image shown for each tag.

graph explorer

The graph explorer can visualize a TensorBoard graph. For reasonable use of the visualizer, name scopes should be used to group the graph operations hierarchically -- TensorFlow graphs can easily have many thousands of nodes, which can be far too many to see easily all at once, or even to lay out using standard tools.

embedding visualizer

The embedding visualizer takes high-dimensional data and projects it down to 3 or 2 dimensions. One interesting way of using this would be to take the input dataset and map it through the neural network to the final layer. That embedding is the learned representation of how the neural network is processing the information. So, the projection is visualizing the input data after it has been embedded in a high-dimensional space by the model.

The embedding visualizer reads data from a model checkpoint file and can be configured with additional metadata. By default, it features the PCA and t-SNE methods and can color points by label.

It is particularly well-suited to images and vocabularies.

name scoping and nodes

Typical TensorFlow graphs can have many thousands of nodes -- far too many to see all easily at once without cortical implants, or even to lay out using standard tools. To simplify, variable names can be scoped and the visualization uses this information to define a hierarchy on the nodes of the graph. By default, only the top of the hierarchy is shown.

TensorFlow graphs have two types of connections: data dependencies and control dependencies. Data dependencies show the flow of tensors between two operations and are displayed as solid arrows. Control dependencies are displayed as dotted lines.

usage

Some steps in using TensorBoard are as follows:

  1. Set a path for logging.
TB_SUMMARY_DIR = "/tmp/run"
  1. Set the tensors to log.
with tf.name_scope("input"):
    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)
tf.summary.histogram("input", x)
  1. Merge the summaries.
summary_operation = tf.summary.merge_all()
  1. Create a summary writer and add the TensorFlow graph.
writer = tf.summary.FileWriter(TB_SUMMARY_DIR)
writer.add_graph(sesh.graph)
  1. During training, run the summary merge operating and add the summary to the summary writer (save to log).
_, summary = sesh.run([optimizer, summary], feed_dict = feed_dict)
writer.add_summary(summary, step)
  1. Clear existing logs, launch TensorBoard and run the training.
rm -rf /tmp/run
tensorboard --logdir=/tmp/run

In [1]:
import subprocess
import tensorflow as tf

path_logs = "/tmp/run"
subprocess.Popen(["killall tensorboard"],                                            shell = True)
subprocess.Popen(["rm -rf {path_logs}".format(path_logs = path_logs)],               shell = True)
subprocess.Popen(["tensorboard --logdir={path_logs}".format(path_logs = path_logs)], shell = True)
subprocess.Popen(["xdg-open http://127.0.1.1:6006"],                                 shell = True)

tf.reset_default_graph()

with tf.name_scope("input"):
    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)
tf.summary.histogram("input", x)

with tf.name_scope("architecture"):
    W = tf.Variable([ .3], dtype = tf.float32)
    b = tf.Variable([-.3], dtype = tf.float32)
    linear_model = W * x + b
tf.summary.histogram("W", W)
tf.summary.histogram("b", b)
tf.summary.histogram("linear_model", linear_model)

with tf.name_scope("loss"):
    loss = tf.reduce_sum(tf.square(linear_model - y))
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01)
    train = optimizer.minimize(loss)
tf.summary.scalar("loss", loss)

x_train = [1,  2,  3,  4]
y_train = [0, -1, -2, -3]

summary_operation = tf.summary.merge_all()
writer = tf.summary.FileWriter(path_logs)

with tf.Session() as sesh:

    writer.add_graph(sesh.graph)
    
    sesh.run(tf.global_variables_initializer())

    for i in range(1000):

        _, summary = sesh.run(
            [train, summary_operation],
            {
                x: x_train,
                y: y_train
            }
        )

        writer.add_summary(summary, i)

    current_W, current_b, current_loss = sesh.run(
        [W, b, loss],
        {
            x: x_train,
            y: y_train
        }
    )
    print("W: {W}, b: {b}, loss: {loss}".format(W = current_W, b = current_b, loss = current_loss))

subprocess.Popen(["killall tensorboard"], shell = True);


W: [-0.9999969], b: [ 0.99999082], loss: 5.699973826267524e-11

TensorBoard example graphs

TensorBoard example histograms

Here, the linear model histogram can be seen approaching the defined target values (-0, -1, -2, -3).

embedding visualizer

To visualize embeddings, there are three main steps to take:

  • Create a 2D tensor to store the embeddings.
embedding_variable = tf.Variable(...)
  • Save model variables to a checkpoint periodically.
saver = tf.train.Saver()
saver.save(sesh, os.path.join(LOG_DIR, "model.ckpt"), step)
  • Optionally, associate metadata with the embedding. This could be labels or images associated with the embedding.
from tensorflow.contrib.tensorboard.plugins import projector

# Create randomly initialized embedding weights which will be trained.

N = 10000 # number of items (vocabulary size)
D = 200   # dimensionality of the embedding
embedding_variable = tf.Variable(
    tf.random_normal([N, D]),
    name = "word_embedding"
)

configuration = projector.ProjectorConfig()

# (Multiple embeddings could be added.)
embedding = configuration.embeddings.add()
embedding.tensor_name = embedding_variable.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = os.path.join(LOG_DIR, "metadata.tsv")

# Use the samecheckpoint LOG_DIR.
summary_writer = tf.summary.FileWriter(LOG_DIR)

# Save a file projector_config.pbtxt at the directory LOG_DIR for TensorBoard to read when launched.
projector.visualize_embeddings(summary_writer, configuration)

embedding visualizer example

This example involves projection of 100 data points to 10 dimension space.


In [2]:
import tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector

# Create a dummy embedding matrix filled with pseudorandom numbers.
embedding_variable = tf.Variable(tf.truncated_normal([100, 10]), name = "embedding")

# Create a list of 100 labels for the data points and save them to a metadata file.
labels = [str(i) for i in range(1, 101)]
with open("labels.csv", mode = "wt", encoding = "utf-8") as file_metadata:
    file_metadata.write("\n".join(labels))

with tf.Session() as sesh:

    # Create a summary writer and specify the graph.
    writer = tf.summary.FileWriter("./graphs/embedding_test", sesh.graph)

    # Initialize the embedding variable.
    sesh.run(embedding_variable.initializer)

    # Create a configuration for the projector.
    configuration = projector.ProjectorConfig()

    # Add the embedding visualizer.
    embedding = configuration.embeddings.add()

    # Set the name of the embedding to the variable name.
    embedding.tensor_name = embedding_variable.name

    # Set the path of the metadata in order to label data points.
    embedding.metadata_path = "./labels.csv"

    # Add the summary writer and the configuration to the projector.
    projector.visualize_embeddings(writer, configuration)

    # Save the model.
    saver_embed = tf.train.Saver([embedding_variable])
    saver_embed.save(sesh, "./graphs/embedding_test/embedding_test.ckpt", 1)

writer.close()

Launch TensorFlow:

tensorboard --logdir=graphs/embedding_test

The data points can be changed to display the labels instead of circles or a pointer can be hovered over a circle to display the label of the corresponding data point.