Training Visualization

In this part we’ll see how to create a simple but wrong model with Keras, and, gradually, how it can be improved with step-by-step debugging and understanding with TensorBoard.

Please, download log files first.

Let's start from importing all necessary components/layers for the future CNN.


In [1]:
from keras.models import Model
from keras.layers import Convolution2D, BatchNormalization, MaxPooling2D, Flatten, Dense
from keras.layers import Input, Dropout
from keras.layers.advanced_activations import ELU
from keras.regularizers import l2
from keras.optimizers import SGD

import tensorflow as tf

from settings import *


Using TensorFlow backend.

In [2]:
import numpy as np
import os
import dataset
from dataset import MyDataset

db=MyDataset(feature_dir=os.path.join('./IRMAS-Sample', 'features', 'Training'), batch_size=8, time_context=128, step=50, 
             suffix_in='_mel_',suffix_out='_label_',floatX=np.float32,train_percent=0.8)
val_data = db()

Toy convolutional model for classification

First, we create a skeleton for a model with one convolutional and one dense layer


In [3]:
def build_model(n_classes):

    input_shape = (N_MEL_BANDS, SEGMENT_DUR, 1)
    channel_axis = 3
    melgram_input = Input(shape=input_shape)

    m_size = 70
    n_size = 3
    n_filters = 64
    maxpool_const = 4

    x = Convolution2D(n_filters, (m_size, n_size),
                      padding='same',
                      kernel_initializer='zeros',
                      kernel_regularizer=l2(1e-5))(melgram_input)

    x = BatchNormalization(axis=channel_axis)(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(N_MEL_BANDS, SEGMENT_DUR/maxpool_const))(x)
    x = Flatten()(x)

    x = Dropout(0.5)(x)
    x = Dense(n_classes, kernel_initializer='zeros', kernel_regularizer=l2(1e-5), 
              activation='softmax', name='prediction')(x)

    model = Model(melgram_input, x)

    return model

model = build_model(IRMAS_N_CLASSES)

We can train the model on IRMAS data using the training procedure below.

First, we have to define the optimizer. We're using Stochastic Gradient Descent with Momentum


In [4]:
init_lr = 0.001
optimizer = SGD(lr=init_lr, momentum=0.9, nesterov=True)

Now we can check the model structure, specify which metrics we would like to keep eye on and compile the model.


In [5]:
model.summary()
model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              metrics=['accuracy'])


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 96, 128, 1)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 96, 128, 64)       13504     
_________________________________________________________________
batch_normalization_1 (Batch (None, 96, 128, 64)       256       
_________________________________________________________________
elu_1 (ELU)                  (None, 96, 128, 64)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 1, 4, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
prediction (Dense)           (None, 11)                2827      
=================================================================
Total params: 16,587
Trainable params: 16,459
Non-trainable params: 128
_________________________________________________________________

From the previous part, we have two generators which can provide us training samples and validation samples. We will use them during the training. We also specify the number of steps per epoch, the total number of epoch and the log verbosity level


In [6]:
model.fit_generator(db,
                    steps_per_epoch=4,
                    epochs=4,
                    verbose=2,
                    validation_data=val_data,
                    class_weight=None,
                    workers=1)


/Users/olya/user_env/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:2289: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
  warnings.warn('\n'.join(msg))
Epoch 1/4
5s - loss: 2.3979 - acc: 0.0625 - val_loss: 2.3980 - val_acc: 0.0000e+00
Epoch 2/4
5s - loss: 2.3979 - acc: 0.0938 - val_loss: 2.3978 - val_acc: 0.0000e+00
Epoch 3/4
4s - loss: 2.3977 - acc: 0.1250 - val_loss: 2.3977 - val_acc: 0.1250
Epoch 4/4
4s - loss: 2.3974 - acc: 0.0938 - val_loss: 2.3977 - val_acc: 0.0000e+00
Out[6]:
<keras.callbacks.History at 0x11eb1bf10>

As we can see, neither validation nor the training metrics have improved, so we need to explore that's wrong with the model. Keras Callbacks will help us in this.

Keras Callbacks

The Callback in Keras is a set of functions to be applied to a certain event during the training process. The typical triggers for events are:

  • on_epoch_begin
  • on_epoch_end
  • on_batch_begin
  • on_batch_end
  • on_train_begin
  • on_train_end

There are some useful callbacks:


In [7]:
from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping, TensorBoard

early_stopping = EarlyStopping(monitor='val_loss', patience=EARLY_STOPPING_EPOCH)
save_clb = ModelCheckpoint("{weights_basepath}/".format(weights_basepath=MODEL_WEIGHT_BASEPATH) +
                           "epoch.{epoch:02d}-val_loss.{val_loss:.3f}",
                           monitor='val_loss',
                           save_best_only=True)

Let's get acquainted with the TensorBoard Callback.

The parameters are:

  • log_dir - where to store the logs, metadata, and events of the model training process
  • write_graph - whether or not to write the graph of data and control dependencies
  • write_grads - whether or not to save the parameters of the model for visualisation
  • histogram_freq - how often to save the parameters of the model
  • write_images - whether or not to save the weight and visualise them as images

In [8]:
tb = TensorBoard(log_dir='./example_1',
                 write_graph=True, write_grads=True, 
                 write_images=True, histogram_freq=1)
# if we want to compute activations and weight histogram, we need to specify the validation data for that. 
tb.validation_data = val_data

Now we can add the callbacks to the training process and observe the corresponding events and obtain the corresponding logs.


In [9]:
model.fit_generator(db,
                    steps_per_epoch=1, # change to STEPS_PER_EPOCH
                    epochs=1, # change to MAX_EPOCH_NUM
                    verbose=2,
                    validation_data=val_data,
                    callbacks=[save_clb, early_stopping, tb],
                    class_weight=None,
                    workers=1)


INFO:tensorflow:Summary name conv2d_1/kernel:0 is illegal; using conv2d_1/kernel_0 instead.
INFO:tensorflow:Summary name conv2d_1/kernel:0_grad is illegal; using conv2d_1/kernel_0_grad instead.
INFO:tensorflow:Summary name conv2d_1/kernel:0 is illegal; using conv2d_1/kernel_0 instead.
INFO:tensorflow:Summary name conv2d_1/bias:0 is illegal; using conv2d_1/bias_0 instead.
INFO:tensorflow:Summary name conv2d_1/bias:0_grad is illegal; using conv2d_1/bias_0_grad instead.
INFO:tensorflow:Summary name conv2d_1/bias:0 is illegal; using conv2d_1/bias_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/gamma:0 is illegal; using batch_normalization_1/gamma_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/gamma:0_grad is illegal; using batch_normalization_1/gamma_0_grad instead.
INFO:tensorflow:Summary name batch_normalization_1/gamma:0 is illegal; using batch_normalization_1/gamma_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/beta:0 is illegal; using batch_normalization_1/beta_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/beta:0_grad is illegal; using batch_normalization_1/beta_0_grad instead.
INFO:tensorflow:Summary name batch_normalization_1/beta:0 is illegal; using batch_normalization_1/beta_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_mean:0 is illegal; using batch_normalization_1/moving_mean_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_mean:0_grad is illegal; using batch_normalization_1/moving_mean_0_grad instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_mean:0 is illegal; using batch_normalization_1/moving_mean_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_variance:0 is illegal; using batch_normalization_1/moving_variance_0 instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_variance:0_grad is illegal; using batch_normalization_1/moving_variance_0_grad instead.
INFO:tensorflow:Summary name batch_normalization_1/moving_variance:0 is illegal; using batch_normalization_1/moving_variance_0 instead.
INFO:tensorflow:Summary name prediction/kernel:0 is illegal; using prediction/kernel_0 instead.
INFO:tensorflow:Summary name prediction/kernel:0_grad is illegal; using prediction/kernel_0_grad instead.
INFO:tensorflow:Summary name prediction/kernel:0 is illegal; using prediction/kernel_0 instead.
INFO:tensorflow:Summary name prediction/bias:0 is illegal; using prediction/bias_0 instead.
INFO:tensorflow:Summary name prediction/bias:0_grad is illegal; using prediction/bias_0_grad instead.
INFO:tensorflow:Summary name prediction/bias:0 is illegal; using prediction/bias_0 instead.
Epoch 1/1
4s - loss: 2.3980 - acc: 0.1250 - val_loss: 2.3977 - val_acc: 0.0000e+00
Out[9]:
<keras.callbacks.History at 0x11f358550>

You can download the event files for all runs from here.

Now create the ./logs directory and launch TensorBoard

tar -xvzf logs.tar.gz
cd logs
tensorboard --logdir ./example_1

and navigate to http://0.0.0.0:6006

We can notice, that it's almost impossible to see anything on the Graphs tab but we can see vividly that the metrics on the Scalar tab are not improving and the gradients values on the Histograms tab are zero.

Our problem is in the weights initialization kernel_initializer='zeros' so now we can fix it and define new model.


In [10]:
def build_model(n_classes):

    input_shape = (N_MEL_BANDS, SEGMENT_DUR, 1)
    channel_axis = 3
    melgram_input = Input(shape=input_shape)

    m_size = 70
    n_size = 3
    n_filters = 64
    maxpool_const = 4

    x = Convolution2D(n_filters, (m_size, n_size),
                      padding='same',
                      kernel_initializer='he_normal',
                      kernel_regularizer=l2(1e-5))(melgram_input)

    x = BatchNormalization(axis=channel_axis)(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(N_MEL_BANDS, SEGMENT_DUR/maxpool_const))(x)
    x = Flatten()(x)

    x = Dropout(0.5)(x)
    x = Dense(n_classes, kernel_initializer='he_normal', kernel_regularizer=l2(1e-5), 
              activation='softmax', name='prediction')(x)

    model = Model(melgram_input, x)

    return model

model = build_model(IRMAS_N_CLASSES)

If you will repeat the training process, you may notice that classification performance improved significantly.

Have a look at a new log file in the ./example_2 directory and restart TensorBoard to explore new data.

cd logs
tensorboard --logdir ./example_2 --port=6002

TensorFlow name scopes

You might have noticed the hell on the Graphs tab. That's because TensorBoard can't connect all the data nodes in the model and operations in the training process together, it's smart enough to group the nodes with similar structure but don't expect too much.

In order to make the better graph visualisation, we need to define the name scopes for each logical layer and each operation we want to see as an individual element.

We can do it just by adding with tf.name_scope(name_scope) clause:


In [11]:
global_namescope = 'train'

def build_model(n_classes):

    with tf.name_scope('input'):
        input_shape = (N_MEL_BANDS, SEGMENT_DUR, 1)
        channel_axis = 3
        melgram_input = Input(shape=input_shape)

        m_size = [5, 5]
        n_size = [5, 5]
        n_filters = 64
        maxpool_const = 8

    with tf.name_scope('conv1'):
        x = Convolution2D(n_filters, (m_size[0], n_size[0]),
                          padding='same',
                          kernel_initializer='he_uniform')(melgram_input)
        x = BatchNormalization(axis=channel_axis)(x)
        x = ELU()(x)
        x = MaxPooling2D(pool_size=(maxpool_const, maxpool_const))(x)

    with tf.name_scope('conv2'):
        x = Convolution2D(n_filters*2, (m_size[1], n_size[1]),
                          padding='same',
                          kernel_initializer='he_uniform')(x)
        x = BatchNormalization(axis=channel_axis)(x)
        x = ELU()(x)
        x = MaxPooling2D(pool_size=(maxpool_const, maxpool_const))(x)
        x = Flatten()(x)

    with tf.name_scope('dense1'):
        x = Dropout(0.5)(x)
        x = Dense(n_filters, kernel_initializer='he_uniform', name='hidden')(x)
        x = ELU()(x)

    with tf.name_scope('dense2'):
        x = Dropout(0.5)(x)
        x = Dense(n_classes, kernel_initializer='he_uniform', activation='softmax', name='prediction')(x)

    model = Model(melgram_input, x)

    return model

model = build_model(IRMAS_N_CLASSES)

with tf.name_scope('optimizer'):
    optimizer = SGD(lr=init_lr, momentum=0.9, nesterov=True)

with tf.name_scope('model'):
    model = build_model(IRMAS_N_CLASSES)

# for the sake of memory, only graphs now
with tf.name_scope('callbacks'):
    # The TensorBoard developers are strongly encourage us to use different directories for every run
    tb = TensorBoard(log_dir='./example_3', write_graph=True)

# yes, we need to recompile the model every time
with tf.name_scope('compile'):
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

# and preudo-train the model
with tf.name_scope(global_namescope):
    model.fit_generator(db,
                        steps_per_epoch=1, # just one step
                        epochs=1, # one epoch to save the graphs
                        verbose=2,
                        validation_data=val_data,
                        callbacks=[tb],
                        workers=1)


Epoch 1/1
1s - loss: 6.1677 - acc: 0.2500 - val_loss: 2.3916 - val_acc: 0.0000e+00

Have a look at a new log file in the ./example_3 directory and restart TensorBoard to explore new data.

cd logs
tensorboard --logdir ./example_3

Embeddings and Hidden Layers Output Visualisation

With TensorBoard we can also visualise the embeddings of the model. In order to do it, you can add Embedding layer to you model.

To visualize the outputs of intermediate layers, we can write our custom callback and use it to store the outputs on validation data during the training process. We will follow the notation from TensorBoard callback, but add some functionality:

  • layer_names - a list of names of layers to keep eye on
  • metadata - a path to a TSV file with associated meta information (labels, notes, etc.), format and details
  • sprite - a path to a sprite image, format and details
  • sprite_shape - a list with values [M, N], the dimensionality of a single image, format and details

In [12]:
from keras import backend as K
if K.backend() == 'tensorflow':
    import tensorflow as tf
    from tensorflow.contrib.tensorboard.plugins import projector

class TensorBoardHiddenOutputVis(Callback):
    """Tensorboard Intermediate Outputs visualization callback."""

    def __init__(self, log_dir='./logs_embed',
                 batch_size=32,
                 freq=0,
                 layer_names=None,
                 metadata=None,
                 sprite=None,
                 sprite_shape=None):
        super(TensorBoardHiddenOutputVis, self).__init__()
        self.log_dir = log_dir
        self.freq = freq
        self.layer_names = layer_names
        # Notice, that only one file is supported in the present callback
        self.metadata = metadata
        self.sprite = sprite
        self.sprite_shape = sprite_shape
        self.batch_size = batch_size

    def set_model(self, model):
        self.model = model
        self.sess = K.get_session()
        self.summary_writer = tf.summary.FileWriter(self.log_dir)
        self.outputs_ckpt_path = os.path.join(self.log_dir, 'keras_outputs.ckpt')

        if self.freq and self.validation_data:
            # define tensors to compute outputs on
            outputs_layers = [layer for layer in self.model.layers
                                 if layer.name in self.layer_names]
            self.output_tensors = [tf.get_default_graph().get_tensor_by_name(layer.get_output_at(0).name)
                                   for layer in outputs_layers]

            # create configuration for visualisation in the same manner as for embeddings
            config = projector.ProjectorConfig()
            for i in range(len(self.output_tensors)):
                embedding = config.embeddings.add()
                embedding.tensor_name = '{ns}/hidden_{i}'.format(ns=global_namescope, i=i)

                # Simpliest metadata handler, a single file for all embeddings
                if self.metadata:
                    embedding.metadata_path = self.metadata

                # Sprite image handler
                if self.sprite and self.sprite_shape:
                    embedding.sprite.image_path = self.sprite
                    embedding.sprite.single_image_dim.extend(self.sprite_shape)

            # define TF variables to store the hidden outputs during the training
            # Notice, that only 1D outputs are supported
            self.hidden_vars = [tf.Variable(np.zeros((len(self.validation_data[0]),
                                                         self.output_tensors[i].shape[1]),
                                                        dtype='float32'),
                                               name='hidden_{}'.format(i))
                                   for i in range(len(self.output_tensors))]
            # add TF variables into computational graph
            for hidden_var in self.hidden_vars:
                self.sess.run(hidden_var.initializer)

            # save the config and setup TF saver for hidden variables
            projector.visualize_embeddings(self.summary_writer, config)
            self.saver = tf.train.Saver(self.hidden_vars)

    def on_epoch_end(self, epoch, logs=None):
        if self.validation_data and self.freq:
            if epoch % self.freq == 0:

                val_data = self.validation_data
                tensors = (self.model.inputs +
                           self.model.targets +
                           self.model.sample_weights)
                all_outputs = [[]]*len(self.output_tensors)

                if self.model.uses_learning_phase:
                    tensors += [K.learning_phase()]

                assert len(val_data) == len(tensors)
                val_size = val_data[0].shape[0]
                i = 0
                # compute outputs batch by batch on validation data
                while i < val_size:
                    step = min(self.batch_size, val_size - i)
                    batch_val = []
                    batch_val.append(val_data[0][i:i + step])
                    batch_val.append(val_data[1][i:i + step])
                    batch_val.append(val_data[2][i:i + step])
                    if self.model.uses_learning_phase:
                        batch_val.append(val_data[3])
                    feed_dict = dict(zip(tensors, batch_val))
                    tensor_outputs = self.sess.run(self.output_tensors, feed_dict=feed_dict)
                    for output_idx, tensor_output in enumerate(tensor_outputs):
                        all_outputs[output_idx].extend(tensor_output)
                    i += self.batch_size
                
                # rewrite the current state of hidden outputs with new values
                for idx, embed in enumerate(self.hidden_vars):
                    embed.assign(np.array(all_outputs[idx])).eval(session=self.sess)
                self.saver.save(self.sess, self.outputs_ckpt_path, epoch)

        self.summary_writer.flush()

    def on_train_end(self, _):
        self.summary_writer.close()

Now we can add the new callback, recompile and retrain the model.


In [13]:
layers_to_monitor = ['hidden']
# find the files precomputed in ./logs_embed directory 
metadata_file_name = 'metadata.tsv'
sprite_file_name = 'sprite.png'
sprite_shape = [N_MEL_BANDS, SEGMENT_DUR]

with tf.name_scope('callbacks'):
    tbe = TensorBoardHiddenOutputVis(log_dir='./logs_embed', freq=1,
                           layer_names=layers_to_monitor,
                           metadata=metadata_file_name,
                           sprite=sprite_file_name,
                           sprite_shape=sprite_shape)
    tbe.validation_data = val_data

with tf.name_scope('compile'):
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

with tf.name_scope(global_namescope):
    model.fit_generator(db,
                        steps_per_epoch=1, # change to STEPS_PER_EPOCH
                        epochs=1, # change to MAX_EPOCH_NUM
                        verbose=2,
                        callbacks=[tbe],
                        validation_data=val_data,
                        class_weight=None,
                        workers=1)


Epoch 1/1
7s - loss: 9.1391 - acc: 0.0000e+00 - val_loss: 2.3934 - val_acc: 0.0000e+00

For the sake of time, we're going to skip the training process. You can find the corresponding data in ./logs_embed directory.

Restart TensorBoard, navigate to http://0.0.0.0:6006/ and now we can discuss the visualisation.

cd logs
tensorboard --logdir ./logs_embed

In [ ]: