TensorFlow Tutorial #13

Visual Analysis

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube

Introduction

In some of the previous tutorials on Convolutional Neural Networks, we showed the convolutional filter weights, see e.g. Tutorial #02 and #06. But it was impossible to determine what the convolutional filters might be recognizing in the input image from merely looking at the filter-weights.

In this tutorial we will present a basic method for visually analysing the inner-workings of a neural network. The idea is to generate an image that maximizes individual features inside the neural network. The image is initialized with a little random noise and then gradually changed using the gradient of the given feature with regard to the input image.

This method for visual analysis of a neural network is also known as feature maximization or activation maximization.

This builds on the previous tutorials. You should be familiar with neural networks in general (e.g. Tutorial #01 and #02), and knowledge of the Inception model is also helpful (Tutorial #07).

Flowchart

We will use the Inception model from Tutorial #07. We want to find an input image that maximizes a given feature inside the neural network. The input image is initialized with a little noise and is then updated using the gradient of the given feature. After performing a number of these optimization iterations we get an image that this particular feature 'likes to see'.

Because the Inception model is constructed from many basic mathematical operations that have been combined, TensorFlow allows us to easily find the gradient using the chain-rule of differentiation.


In [44]:
from IPython.display import Image, display
Image('images/13_visual_analysis_flowchart.png')


Out[44]:

Imports


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

# Functions and classes for loading and using the Inception model.
import inception

This was developed using Python 3.5.2 (Anaconda) and TensorFlow version:


In [3]:
tf.__version__


Out[3]:
'0.11.0rc0'

Inception Model

Download the Inception model from the internet

The Inception model is downloaded from the internet. This is the default directory where you want to save the data-files. The directory will be created if it does not exist.


In [4]:
# inception.data_dir = 'inception/'

Download the data for the Inception model if it doesn't already exist in the directory. It is 85 MB.


In [5]:
inception.maybe_download()


Downloading Inception v3 Model ...
Data has apparently already been downloaded and unpacked.

Names of convolutional layers

This function returns a list of names for the convolutional layers in the Inception model.


In [6]:
def get_conv_layer_names():
    # Load the Inception model.
    model = inception.Inception()
    
    # Create a list of names for the operations in the graph
    # for the Inception model where the operator-type is 'Conv2D'.
    names = [op.name for op in model.graph.get_operations() if op.type=='Conv2D']

    # Close the TensorFlow session inside the model-object.
    model.close()

    return names

In [7]:
conv_names = get_conv_layer_names()

There are a total of 94 convolutional layers in this Inception model.


In [8]:
len(conv_names)


Out[8]:
94

Show the names of the first 5 convolutional layers.


In [9]:
conv_names[:5]


Out[9]:
['conv/Conv2D',
 'conv_1/Conv2D',
 'conv_2/Conv2D',
 'conv_3/Conv2D',
 'conv_4/Conv2D']

Show the names of the last 5 convolutional layers.


In [10]:
conv_names[-5:]


Out[10]:
['mixed_10/tower_1/conv/Conv2D',
 'mixed_10/tower_1/conv_1/Conv2D',
 'mixed_10/tower_1/mixed/conv/Conv2D',
 'mixed_10/tower_1/mixed/conv_1/Conv2D',
 'mixed_10/tower_2/conv/Conv2D']

Helper-function for finding the input image

This function finds the input image that maximizes a given feature in the network. It essentially just performs optimization with gradient ascent. The image is initialized with small random values and is then iteratively updated using the gradient for the given feature with regard to the image.


In [11]:
def optimize_image(conv_id=None, feature=0,
                   num_iterations=30, show_progress=True):
    """
    Find an image that maximizes the feature
    given by the conv_id and feature number.

    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last fully-connected layer
             before the softmax output.
    feature: Index into the layer for the feature to maximize.
    num_iteration: Number of optimization iterations to perform.
    show_progress: Boolean whether to show the progress.
    """

    # Load the Inception model. This is done for each call of
    # this function because we will add a lot to the graph
    # which will cause the graph to grow and eventually the
    # computer will run out of memory.
    model = inception.Inception()

    # Reference to the tensor that takes the raw input image.
    resized_image = model.resized_image

    # Reference to the tensor for the predicted classes.
    # This is the output of the final layer's softmax classifier.
    y_pred = model.y_pred

    # Create the loss-function that must be maximized.
    if conv_id is None:
        # If we want to maximize a feature on the last layer,
        # then we use the fully-connected layer prior to the
        # softmax-classifier. The feature no. is the class-number
        # and must be an integer between 1 and 1000.
        # The loss-function is just the value of that feature.
        loss = model.y_logits[0, feature]
    else:
        # If instead we want to maximize a feature of a
        # convolutional layer inside the neural network.

        # Get the name of the convolutional operator.
        conv_name = conv_names[conv_id]
        
        # Get a reference to the tensor that is output by the
        # operator. Note that ":0" is added to the name for this.
        tensor = model.graph.get_tensor_by_name(conv_name + ":0")

        # Set the Inception model's graph as the default
        # so we can add an operator to it.
        with model.graph.as_default():
            # The loss-function is the average of all the
            # tensor-values for the given feature. This
            # ensures that we generate the whole input image.
            # You can try and modify this so it only uses
            # a part of the tensor.
            loss = tf.reduce_mean(tensor[:,:,:,feature])
    
    # Get the gradient for the loss-function with regard to
    # the resized input image. This creates a mathematical
    # function for calculating the gradient.
    gradient = tf.gradients(loss, resized_image)

    # Create a TensorFlow session so we can run the graph.
    session = tf.Session(graph=model.graph)

    # Generate a random image of the same size as the raw input.
    # Each pixel is a small random value between 128 and 129,
    # which is about the middle of the colour-range.
    image_shape = resized_image.get_shape()
    image = np.random.uniform(size=image_shape) + 128.0

    # Perform a number of optimization iterations to find
    # the image that maximizes the loss-function.
    for i in range(num_iterations):
        # Create a feed-dict. This feeds the image to the
        # tensor in the graph that holds the resized image, because
        # this is the final stage for inputting raw image data.
        feed_dict = {model.tensor_name_resized_image: image}

        # Calculate the predicted class-scores,
        # as well as the gradient and the loss-value.
        pred, grad, loss_value = session.run([y_pred, gradient, loss],
                                             feed_dict=feed_dict)
        
        # Squeeze the dimensionality for the gradient-array.
        grad = np.array(grad).squeeze()

        # The gradient now tells us how much we need to change the
        # input image in order to maximize the given feature.

        # Calculate the step-size for updating the image.
        # This step-size was found to give fast convergence.
        # The addition of 1e-8 is to protect from div-by-zero.
        step_size = 1.0 / (grad.std() + 1e-8)

        # Update the image by adding the scaled gradient
        # This is called gradient ascent.
        image += step_size * grad

        # Ensure all pixel-values in the image are between 0 and 255.
        image = np.clip(image, 0.0, 255.0)

        if show_progress:
            print("Iteration:", i)

            # Convert the predicted class-scores to a one-dim array.
            pred = np.squeeze(pred)

            # The predicted class for the Inception model.
            pred_cls = np.argmax(pred)

            # Name of the predicted class.
            cls_name = model.name_lookup.cls_to_name(pred_cls,
                                               only_first_name=True)

            # The score (probability) for the predicted class.
            cls_score = pred[pred_cls]

            # Print the predicted score etc.
            msg = "Predicted class-name: {0} (#{1}), score: {2:>7.2%}"
            print(msg.format(cls_name, pred_cls, cls_score))

            # Print statistics for the gradient.
            msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}"
            print(msg.format(grad.min(), grad.max(), step_size))

            # Print the loss-value.
            print("Loss:", loss_value)

            # Newline.
            print()

    # Close the TensorFlow session inside the model-object.
    model.close()

    return image.squeeze()

Helper-function for plotting image and noise

This function normalizes an image so its pixel-values are between 0.0 and 1.0.


In [12]:
def normalize_image(x):
    # Get the min and max values for all pixels in the input.
    x_min = x.min()
    x_max = x.max()

    # Normalize so all values are between 0.0 and 1.0
    x_norm = (x - x_min) / (x_max - x_min)

    return x_norm

This function plots a single image.


In [13]:
def plot_image(image):
    # Normalize the image so pixels are between 0.0 and 1.0
    img_norm = normalize_image(image)
    
    # Plot the image.
    plt.imshow(img_norm, interpolation='nearest')
    plt.show()

This function plots 6 images in a grid.


In [14]:
def plot_images(images, show_size=100):
    """
    The show_size is the number of pixels to show for each image.
    The max value is 299.
    """

    # Create figure with sub-plots.
    fig, axes = plt.subplots(2, 3)

    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)

    # Use interpolation to smooth pixels?
    smooth = True
    
    # Interpolation type.
    if smooth:
        interpolation = 'spline16'
    else:
        interpolation = 'nearest'

    # For each entry in the grid.
    for i, ax in enumerate(axes.flat):
        # Get the i'th image and only use the desired pixels.
        img = images[i, 0:show_size, 0:show_size, :]
        
        # Normalize the image so its pixels are between 0.0 and 1.0
        img_norm = normalize_image(img)
        
        # Plot the image.
        ax.imshow(img_norm, interpolation=interpolation)

        # Remove ticks.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

Helper-function for optimizing and plotting images

This function optimizes multiple images and plots them.


In [15]:
def optimize_images(conv_id=None, num_iterations=30, show_size=100):
    """
    Find 6 images that maximize the 6 first features in the layer
    given by the conv_id.
    
    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last layer before the softmax output.
    num_iterations: Number of optimization iterations to perform.
    show_size: Number of pixels to show for each image. Max 299.
    """

    # Which layer are we using?
    if conv_id is None:
        print("Final fully-connected layer before softmax.")
    else:
        print("Layer:", conv_names[conv_id])

    # Initialize the array of images.
    images = []

    # For each feature do the following. Note that the
    # last fully-connected layer only supports numbers
    # between 1 and 1000, while the convolutional layers
    # support numbers between 0 and some other number.
    # So we just use the numbers between 1 and 7.
    for feature in range(1,7):
        print("Optimizing image for feature no.", feature)
        
        # Find the image that maximizes the given feature
        # for the network layer identified by conv_id (or None).
        image = optimize_image(conv_id=conv_id, feature=feature,
                               show_progress=False,
                               num_iterations=num_iterations)

        # Squeeze the dim of the array.
        image = image.squeeze()

        # Append to the list of images.
        images.append(image)

    # Convert to numpy-array so we can index all dimensions easily.
    images = np.array(images)

    # Plot the images.
    plot_images(images=images, show_size=show_size)

Results

Optimize a single image for an early convolutional layer

As an example, find an input image that maximizes feature no. 2 of the convolutional layer with the name conv_names[conv_id] where conv_id=5.


In [16]:
image = optimize_image(conv_id=5, feature=2,
                       num_iterations=30, show_progress=True)


Iteration: 0
Predicted class-name: dishwasher (#667), score:   5.59%
Gradient min: -0.000089, max:  0.000125, stepsize:  76178.66
Loss: 4.83987

Iteration: 1
Predicted class-name: kite (#397), score:  11.12%
Gradient min: -0.000103, max:  0.000109, stepsize:  72136.10
Loss: 5.59879

Iteration: 2
Predicted class-name: wall clock (#524), score:   5.48%
Gradient min: -0.000111, max:  0.000145, stepsize:  80405.14
Loss: 6.91441

Iteration: 3
Predicted class-name: ballpoint (#907), score:   5.42%
Gradient min: -0.000123, max:  0.000120, stepsize:  86825.41
Loss: 7.90217

Iteration: 4
Predicted class-name: syringe (#531), score:  13.64%
Gradient min: -0.000113, max:  0.000102, stepsize:  95255.84
Loss: 8.85303

Iteration: 5
Predicted class-name: syringe (#531), score:  22.57%
Gradient min: -0.000100, max:  0.000093, stepsize: 103656.49
Loss: 9.69852

Iteration: 6
Predicted class-name: syringe (#531), score:  25.47%
Gradient min: -0.000093, max:  0.000112, stepsize: 112196.24
Loss: 10.4557

Iteration: 7
Predicted class-name: syringe (#531), score:  28.49%
Gradient min: -0.000084, max:  0.000089, stepsize: 119770.54
Loss: 11.1315

Iteration: 8
Predicted class-name: syringe (#531), score:  21.78%
Gradient min: -0.000083, max:  0.000073, stepsize: 127605.42
Loss: 11.7311

Iteration: 9
Predicted class-name: syringe (#531), score:  15.24%
Gradient min: -0.000066, max:  0.000073, stepsize: 134462.88
Loss: 12.2657

Iteration: 10
Predicted class-name: syringe (#531), score:  11.62%
Gradient min: -0.000073, max:  0.000082, stepsize: 141011.21
Loss: 12.7509

Iteration: 11
Predicted class-name: binder (#835), score:   8.65%
Gradient min: -0.000061, max:  0.000068, stepsize: 146931.20
Loss: 13.1895

Iteration: 12
Predicted class-name: envelope (#879), score:  10.13%
Gradient min: -0.000067, max:  0.000077, stepsize: 151580.17
Loss: 13.5902

Iteration: 13
Predicted class-name: envelope (#879), score:  11.14%
Gradient min: -0.000064, max:  0.000072, stepsize: 157857.40
Loss: 13.9552

Iteration: 14
Predicted class-name: binder (#835), score:  10.95%
Gradient min: -0.000078, max:  0.000061, stepsize: 161917.28
Loss: 14.3037

Iteration: 15
Predicted class-name: binder (#835), score:  11.27%
Gradient min: -0.000063, max:  0.000080, stepsize: 166986.92
Loss: 14.6235

Iteration: 16
Predicted class-name: binder (#835), score:  10.49%
Gradient min: -0.000075, max:  0.000058, stepsize: 172647.02
Loss: 14.9356

Iteration: 17
Predicted class-name: binder (#835), score:   9.76%
Gradient min: -0.000052, max:  0.000070, stepsize: 176689.68
Loss: 15.2271

Iteration: 18
Predicted class-name: binder (#835), score:   8.48%
Gradient min: -0.000070, max:  0.000055, stepsize: 179760.24
Loss: 15.499

Iteration: 19
Predicted class-name: quilt (#976), score:   9.15%
Gradient min: -0.000052, max:  0.000058, stepsize: 184475.44
Loss: 15.761

Iteration: 20
Predicted class-name: quilt (#976), score:   9.73%
Gradient min: -0.000053, max:  0.000048, stepsize: 187894.86
Loss: 16.013

Iteration: 21
Predicted class-name: quilt (#976), score:  12.62%
Gradient min: -0.000051, max:  0.000049, stepsize: 190949.04
Loss: 16.2436

Iteration: 22
Predicted class-name: quilt (#976), score:  11.78%
Gradient min: -0.000055, max:  0.000052, stepsize: 196239.93
Loss: 16.4788

Iteration: 23
Predicted class-name: quilt (#976), score:  13.37%
Gradient min: -0.000060, max:  0.000055, stepsize: 198187.55
Loss: 16.692

Iteration: 24
Predicted class-name: bib (#941), score:  14.19%
Gradient min: -0.000046, max:  0.000049, stepsize: 203560.81
Loss: 16.9098

Iteration: 25
Predicted class-name: bib (#941), score:  15.99%
Gradient min: -0.000046, max:  0.000054, stepsize: 205084.38
Loss: 17.1124

Iteration: 26
Predicted class-name: bib (#941), score:  15.30%
Gradient min: -0.000055, max:  0.000048, stepsize: 209322.75
Loss: 17.3075

Iteration: 27
Predicted class-name: bib (#941), score:  16.96%
Gradient min: -0.000045, max:  0.000057, stepsize: 212571.72
Loss: 17.5013

Iteration: 28
Predicted class-name: bib (#941), score:  17.67%
Gradient min: -0.000047, max:  0.000057, stepsize: 217085.27
Loss: 17.6893

Iteration: 29
Predicted class-name: bib (#941), score:  16.67%
Gradient min: -0.000052, max:  0.000050, stepsize: 220132.07
Loss: 17.8696


In [17]:
plot_image(image)


Optimize multiple images for convolutional layers

In the following we optimize and plot multiple images for convolutional layers inside the Inception model. These images show what the layers 'like to see'. Notice how the patterns become increasingly complex for deeper layers.


In [18]:
optimize_images(conv_id=0, num_iterations=10)


Layer: conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [19]:
optimize_images(conv_id=3, num_iterations=30)


Layer: conv_3/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [20]:
optimize_images(conv_id=4, num_iterations=30)


Layer: conv_4/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [21]:
optimize_images(conv_id=5, num_iterations=30)


Layer: mixed/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [22]:
optimize_images(conv_id=6, num_iterations=30)


Layer: mixed/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [23]:
optimize_images(conv_id=7, num_iterations=30)


Layer: mixed/tower/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [24]:
optimize_images(conv_id=8, num_iterations=30)


Layer: mixed/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [25]:
optimize_images(conv_id=9, num_iterations=30)


Layer: mixed/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [26]:
optimize_images(conv_id=10, num_iterations=30)


Layer: mixed/tower_1/conv_2/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [27]:
optimize_images(conv_id=20, num_iterations=30)


Layer: mixed_2/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [28]:
optimize_images(conv_id=30, num_iterations=30)


Layer: mixed_4/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [29]:
optimize_images(conv_id=40, num_iterations=30)


Layer: mixed_5/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [30]:
optimize_images(conv_id=50, num_iterations=30)


Layer: mixed_6/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [31]:
optimize_images(conv_id=60, num_iterations=30)


Layer: mixed_7/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [32]:
optimize_images(conv_id=70, num_iterations=30)


Layer: mixed_8/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [33]:
optimize_images(conv_id=80, num_iterations=30)


Layer: mixed_9/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [34]:
optimize_images(conv_id=90, num_iterations=30)


Layer: mixed_10/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

In [35]:
optimize_images(conv_id=93, num_iterations=30)


Layer: mixed_10/tower_2/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

Final fully-connected layer before Softmax

Now we optimize and plot images for the final layer in the Inception model. This is the fully-connected layer right before the softmax-classifier. The features in this layer correspond to output classes.

We might have hoped to see recognizable patterns in these images, e.g. monkeys and birds corresponding to the output classes, but the images just show complex, abstract patterns.


In [36]:
optimize_images(conv_id=None, num_iterations=30)


Final fully-connected layer before softmax.
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

The above images only show 100x100 pixels but the images are actually 299x299 pixels. It is possible that there might be recognizable patterns if we optimize for more iterations and plot the full image. So let us optimize the first image again and plot it in full resolution.

The Inception model classifies the resulting image as a 'kit fox' with about 100% certainty, but to the human eye the image just shows abstract patterns.

If you want to try this for another feature number, note that it must be between 1 and 1000 because it must correspond to a valid class-number for the final output-layer.


In [37]:
image = optimize_image(conv_id=None, feature=1,
                       num_iterations=100, show_progress=True)


Iteration: 0
Predicted class-name: dishwasher (#667), score:   4.98%
Gradient min: -0.006252, max:  0.004451, stepsize:   3734.48
Loss: -0.837608

Iteration: 1
Predicted class-name: ballpoint (#907), score:   8.52%
Gradient min: -0.007303, max:  0.006427, stepsize:   2152.89
Loss: -0.416723

Iteration: 2
Predicted class-name: spider web (#600), score:  90.44%
Gradient min: -0.007480, max:  0.012272, stepsize:   1343.66
Loss: 2.77814

Iteration: 3
Predicted class-name: pot (#838), score:   3.01%
Gradient min: -0.009853, max:  0.007638, stepsize:   1526.98
Loss: 3.27751

Iteration: 4
Predicted class-name: American egret (#426), score:  11.55%
Gradient min: -0.008507, max:  0.006308, stepsize:   1787.96
Loss: 5.95497

Iteration: 5
Predicted class-name: spider web (#600), score:  21.98%
Gradient min: -0.010415, max:  0.009410, stepsize:   1722.27
Loss: 5.07394

Iteration: 6
Predicted class-name: kit fox (#1), score:  29.21%
Gradient min: -0.009298, max:  0.007885, stepsize:   2471.91
Loss: 7.98241

Iteration: 7
Predicted class-name: brain coral (#649), score:   8.95%
Gradient min: -0.004683, max:  0.004366, stepsize:   2876.78
Loss: 2.61856

Iteration: 8
Predicted class-name: kit fox (#1), score:  30.05%
Gradient min: -0.008918, max:  0.006374, stepsize:   2243.47
Loss: 8.18703

Iteration: 9
Predicted class-name: kit fox (#1), score:  55.10%
Gradient min: -0.041977, max:  0.025564, stepsize:   1270.96
Loss: 9.4695

Iteration: 10
Predicted class-name: kit fox (#1), score:  45.79%
Gradient min: -0.025474, max:  0.026726, stepsize:    858.88
Loss: 8.49094

Iteration: 11
Predicted class-name: kit fox (#1), score:  71.39%
Gradient min: -0.021939, max:  0.016643, stepsize:   1316.54
Loss: 12.4999

Iteration: 12
Predicted class-name: kit fox (#1), score:  82.78%
Gradient min: -0.011797, max:  0.017714, stepsize:   1763.08
Loss: 11.1421

Iteration: 13
Predicted class-name: kit fox (#1), score:  67.38%
Gradient min: -0.016686, max:  0.015832, stepsize:   1908.94
Loss: 11.186

Iteration: 14
Predicted class-name: kit fox (#1), score:  57.64%
Gradient min: -0.017312, max:  0.014563, stepsize:   1412.06
Loss: 9.67373

Iteration: 15
Predicted class-name: kit fox (#1), score:  69.09%
Gradient min: -0.005773, max:  0.005870, stepsize:   3163.64
Loss: 12.9428

Iteration: 16
Predicted class-name: kit fox (#1), score:  82.98%
Gradient min: -0.020765, max:  0.017950, stepsize:   1225.38
Loss: 10.6559

Iteration: 17
Predicted class-name: kit fox (#1), score:  99.04%
Gradient min: -0.005492, max:  0.006126, stepsize:   2520.72
Loss: 16.579

Iteration: 18
Predicted class-name: kit fox (#1), score:  86.78%
Gradient min: -0.017253, max:  0.028574, stepsize:   1280.11
Loss: 11.7084

Iteration: 19
Predicted class-name: kit fox (#1), score:  96.57%
Gradient min: -0.007056, max:  0.006660, stepsize:   1838.04
Loss: 17.8698

Iteration: 20
Predicted class-name: kit fox (#1), score:  99.04%
Gradient min: -0.008916, max:  0.008408, stepsize:   2720.73
Loss: 17.922

Iteration: 21
Predicted class-name: kit fox (#1), score:  68.39%
Gradient min: -0.012104, max:  0.013627, stepsize:   1398.73
Loss: 12.5718

Iteration: 22
Predicted class-name: kit fox (#1), score:  98.38%
Gradient min: -0.007660, max:  0.007840, stepsize:   2043.20
Loss: 14.0164

Iteration: 23
Predicted class-name: kit fox (#1), score:  98.36%
Gradient min: -0.009233, max:  0.006748, stepsize:   1951.74
Loss: 19.118

Iteration: 24
Predicted class-name: kit fox (#1), score:  99.44%
Gradient min: -0.013526, max:  0.015166, stepsize:   1557.67
Loss: 23.2171

Iteration: 25
Predicted class-name: kit fox (#1), score:  99.83%
Gradient min: -0.005306, max:  0.006063, stepsize:   2142.04
Loss: 21.0666

Iteration: 26
Predicted class-name: kit fox (#1), score:  99.85%
Gradient min: -0.005931, max:  0.005094, stepsize:   2287.80
Loss: 21.2772

Iteration: 27
Predicted class-name: kit fox (#1), score:  97.91%
Gradient min: -0.008425, max:  0.010999, stepsize:   1633.57
Loss: 23.1276

Iteration: 28
Predicted class-name: kit fox (#1), score:  99.98%
Gradient min: -0.012720, max:  0.010505, stepsize:   1749.55
Loss: 25.9384

Iteration: 29
Predicted class-name: kit fox (#1), score:  99.90%
Gradient min: -0.020819, max:  0.023275, stepsize:   1026.48
Loss: 22.4687

Iteration: 30
Predicted class-name: kit fox (#1), score:  99.87%
Gradient min: -0.005569, max:  0.007158, stepsize:   2436.42
Loss: 21.3727

Iteration: 31
Predicted class-name: kit fox (#1), score:  97.25%
Gradient min: -0.010902, max:  0.007087, stepsize:   1689.47
Loss: 13.2659

Iteration: 32
Predicted class-name: kit fox (#1), score:  99.96%
Gradient min: -0.006695, max:  0.006514, stepsize:   2277.89
Loss: 23.113

Iteration: 33
Predicted class-name: kit fox (#1), score:  99.89%
Gradient min: -0.011343, max:  0.011963, stepsize:   1713.67
Loss: 20.4645

Iteration: 34
Predicted class-name: kit fox (#1), score:  99.83%
Gradient min: -0.005129, max:  0.005226, stepsize:   2531.23
Loss: 22.9016

Iteration: 35
Predicted class-name: kit fox (#1), score:  99.96%
Gradient min: -0.004618, max:  0.005916, stepsize:   1979.85
Loss: 19.6406

Iteration: 36
Predicted class-name: kit fox (#1), score:  99.94%
Gradient min: -0.005298, max:  0.007882, stepsize:   2158.99
Loss: 26.6898

Iteration: 37
Predicted class-name: kit fox (#1), score:  99.77%
Gradient min: -0.009913, max:  0.010110, stepsize:   1643.65
Loss: 21.2908

Iteration: 38
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.005472, max:  0.004434, stepsize:   2654.99
Loss: 28.4096

Iteration: 39
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.006044, max:  0.007171, stepsize:   1646.69
Loss: 32.005

Iteration: 40
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007782, max:  0.007306, stepsize:   1853.00
Loss: 34.7635

Iteration: 41
Predicted class-name: kit fox (#1), score:  99.94%
Gradient min: -0.030789, max:  0.017443, stepsize:   1224.15
Loss: 32.2997

Iteration: 42
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005752, max:  0.006784, stepsize:   2148.87
Loss: 34.8329

Iteration: 43
Predicted class-name: kit fox (#1), score:  99.98%
Gradient min: -0.005747, max:  0.005908, stepsize:   2058.32
Loss: 33.3857

Iteration: 44
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.005644, max:  0.005296, stepsize:   2042.49
Loss: 33.5334

Iteration: 45
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008353, max:  0.008290, stepsize:   1814.68
Loss: 34.6049

Iteration: 46
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.007643, max:  0.006041, stepsize:   2002.25
Loss: 36.0055

Iteration: 47
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.008912, max:  0.009060, stepsize:   1462.12
Loss: 31.0812

Iteration: 48
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.018941, max:  0.019518, stepsize:   1859.71
Loss: 36.9466

Iteration: 49
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007313, max:  0.010175, stepsize:   1605.92
Loss: 39.6561

Iteration: 50
Predicted class-name: kit fox (#1), score:  99.99%
Gradient min: -0.005102, max:  0.005128, stepsize:   2222.82
Loss: 34.06

Iteration: 51
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.009598, max:  0.007205, stepsize:   1756.78
Loss: 26.9699

Iteration: 52
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006587, max:  0.006691, stepsize:   1967.08
Loss: 37.3345

Iteration: 53
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006564, max:  0.006621, stepsize:   2305.14
Loss: 39.8643

Iteration: 54
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.020868, max:  0.016884, stepsize:   1375.65
Loss: 37.7343

Iteration: 55
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005574, max:  0.005976, stepsize:   1908.25
Loss: 41.346

Iteration: 56
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.009256, max:  0.010253, stepsize:   1768.63
Loss: 35.9501

Iteration: 57
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.010851, max:  0.017633, stepsize:   1499.83
Loss: 42.5105

Iteration: 58
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005229, max:  0.006164, stepsize:   2135.08
Loss: 43.219

Iteration: 59
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006745, max:  0.006746, stepsize:   1642.55
Loss: 38.7929

Iteration: 60
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005743, max:  0.004990, stepsize:   2049.10
Loss: 45.1963

Iteration: 61
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007454, max:  0.006493, stepsize:   1576.57
Loss: 39.2328

Iteration: 62
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005872, max:  0.006283, stepsize:   2189.59
Loss: 42.8966

Iteration: 63
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006593, max:  0.007255, stepsize:   1561.50
Loss: 43.5881

Iteration: 64
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006010, max:  0.005091, stepsize:   1858.49
Loss: 49.4166

Iteration: 65
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.011517, max:  0.009566, stepsize:   1209.07
Loss: 41.3484

Iteration: 66
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008072, max:  0.008848, stepsize:   2159.80
Loss: 45.9205

Iteration: 67
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008305, max:  0.006268, stepsize:   1548.71
Loss: 42.3279

Iteration: 68
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006149, max:  0.009358, stepsize:   1883.51
Loss: 46.414

Iteration: 69
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006578, max:  0.006697, stepsize:   1474.94
Loss: 42.8873

Iteration: 70
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005391, max:  0.006146, stepsize:   2104.57
Loss: 49.2656

Iteration: 71
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008280, max:  0.008504, stepsize:   1563.85
Loss: 49.4512

Iteration: 72
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.009657, max:  0.011234, stepsize:   1538.54
Loss: 52.4693

Iteration: 73
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006796, max:  0.008983, stepsize:   1630.90
Loss: 47.8848

Iteration: 74
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007820, max:  0.007669, stepsize:   1989.92
Loss: 53.1304

Iteration: 75
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007080, max:  0.009358, stepsize:   1601.16
Loss: 52.802

Iteration: 76
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006778, max:  0.011433, stepsize:   1852.65
Loss: 51.9139

Iteration: 77
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007497, max:  0.007986, stepsize:   1568.70
Loss: 49.05

Iteration: 78
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005112, max:  0.006590, stepsize:   2242.10
Loss: 58.7734

Iteration: 79
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008460, max:  0.008341, stepsize:   1525.32
Loss: 60.3876

Iteration: 80
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006530, max:  0.006501, stepsize:   1898.08
Loss: 60.4282

Iteration: 81
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.009223, max:  0.006812, stepsize:   1697.45
Loss: 59.8338

Iteration: 82
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006340, max:  0.006535, stepsize:   1737.38
Loss: 60.1702

Iteration: 83
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008974, max:  0.008040, stepsize:   1453.65
Loss: 59.8085

Iteration: 84
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008100, max:  0.005810, stepsize:   1709.52
Loss: 63.3426

Iteration: 85
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.006354, max:  0.008465, stepsize:   1674.72
Loss: 62.9884

Iteration: 86
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007080, max:  0.006697, stepsize:   1713.75
Loss: 66.6128

Iteration: 87
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.015349, max:  0.009788, stepsize:   1500.05
Loss: 63.2206

Iteration: 88
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.008576, max:  0.007250, stepsize:   1536.09
Loss: 68.5237

Iteration: 89
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005157, max:  0.005224, stepsize:   1869.55
Loss: 71.2678

Iteration: 90
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007810, max:  0.007995, stepsize:   1401.19
Loss: 62.0469

Iteration: 91
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.010371, max:  0.009543, stepsize:   1559.59
Loss: 70.518

Iteration: 92
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.009110, max:  0.006689, stepsize:   1855.04
Loss: 67.8497

Iteration: 93
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005365, max:  0.006440, stepsize:   1969.30
Loss: 69.9785

Iteration: 94
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.010603, max:  0.011318, stepsize:   1475.43
Loss: 69.6375

Iteration: 95
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.004267, max:  0.005465, stepsize:   2023.68
Loss: 76.1746

Iteration: 96
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.011737, max:  0.010223, stepsize:   1207.37
Loss: 58.1862

Iteration: 97
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005620, max:  0.005410, stepsize:   1992.51
Loss: 73.5772

Iteration: 98
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007732, max:  0.010692, stepsize:   1286.44
Loss: 67.5603

Iteration: 99
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005850, max:  0.006159, stepsize:   1863.65
Loss: 75.6356


In [38]:
plot_image(image=image)


Close TensorFlow Session

The TensorFlow session was already closed in the functions above that used the Inception model. This was done to save memory so the computer would not crash when adding many gradient-functions to the computational graph.

Conclusion

This tutorial showed how to optimize input images that maximize features inside a neural network. This allows us to visually analyze what the neural network 'likes to see', because the given feature (or neuron) inside the neural network reacts most strongly to that particular image.

For the lower layers in the neural network, the images had simple patterns, e.g. different types of wavy lines. The image patterns become increasingly complex for deeper layers of the neural network. We might have expected or hoped that the image patterns would be recognizable for deeper layers, e.g. showing monkeys, foxes, cars, etc. But instead the image patterns become increasingly complex and abstract for the deeper layers.

Why is that? Recall from Tutorial #11 that the Inception model can easily be fooled with a little adversarial noise, so it classifies any input image as another target-class. So it is not surprising that the Inception model recognizes these abstract image patterns, which are unclear to the human eye. There is probably an infinite number of images that maximize the features deep inside a neural network, and the images that are also recognizable by humans are only a small fraction of all these image patterns. This may be the reason why the optimization process only found abstract image patterns.

Other Methods

There are many proposals in the research literature for guiding the optimization process so as to find image patterns that are more recognizable to humans.

This paper proposes a combination of heuristics for guiding the optimization process of the image patterns. The paper shows example images for several classes such as flamingo, pelican and black swan, all of which are somewhat recognizable to the human eye. The method is apparently implemented here (the exact line-numbers could change in the future). It requires a combination of heuristics and their parameters must be finely tuned in order to generate these images. But the parameter choice is not entirely clear from the research paper. In spite of several attempts, I could not reproduce their results. Maybe I have misunderstood their paper, or maybe the heuristics were finely tuned to their network architecture, which is a variant of the so-called AlexNet, whereas this tutorial uses the more advanced Inception model.

This paper proposes another method for producing images that are even more recognizable to the human eye. However, the method is actually cheating, because it goes through all the images in the training-set (e.g. ImageNet) and takes the images that maximally activate a given feature inside the neural network. Then it clusters and averages similar images. This produces the initial image for the optimization procedure. So it is no wonder that the method gives better results when it starts with an image that is constructed from real photos.

Exercises

These are a few suggestions for exercises that may help improve your skills with TensorFlow. It is important to get hands-on experience with TensorFlow in order to learn how to use it properly.

You may want to backup this Notebook and the other files before making any changes.

  • Try and run the optimization several times for features in lower layers of the network. Are the resulting images always the same?
  • Try and use fewer and more optimization iterations. How does it affect the image quality?
  • Try and change the loss-function for a convolutional feature. This can be done in different ways. How does it affect the image patterns? Why is that?
  • Do you think the optimizer also increases other features than the one we want maximized? How can you measure this? Can you ensure that the optimizer only maximizes one feature at a time?
  • Try maximizing multiple features simultaneously.
  • Try visualizing the features and layers in a smaller neural network trained on the MNIST data-set. Is it easier to see patterns in the images?
  • Try and implement the methods from the papers above.
  • Try your own ideas for improving the optimized images.
  • Explain to a friend how the program works.

License (MIT)

Copyright (c) 2016 by Magnus Erik Hvass Pedersen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.