If we have a model that takes in an image as its input, and outputs class scores, i.e. probabilities that a certain object is present in the image, then we can use ELI5 to check what is it in the image that made the model predict a certain class score. We do that using a method called 'Grad-CAM' (https://arxiv.org/abs/1610.02391).
We will be using images from ImageNet (http://image-net.org/), and classifiers from keras.applications
.
This has been tested with Python 3.7.3, Keras 2.2.4, and Tensorflow 1.13.1.
To start out, let's get our modules in place
In [1]:
from PIL import Image
from IPython.display import display
import numpy as np
# you may want to keep logging enabled when doing your own work
import logging
import tensorflow as tf
tf.get_logger().setLevel(logging.ERROR) # disable Tensorflow warnings for this tutorial
import warnings
warnings.simplefilter("ignore") # disable Keras warnings for this tutorial
import keras
from keras.applications import mobilenet_v2
import eli5
And load our image classifier (a light-weight model from keras.applications
).
In [2]:
model = mobilenet_v2.MobileNetV2(include_top=True, weights='imagenet', classes=1000)
# check the input format
print(model.input_shape)
dims = model.input_shape[1:3] # -> (height, width)
print(dims)
We see that we need a numpy tensor of shape (batches, height, width, channels), with the specified height and width.
Loading our sample image:
In [3]:
# we start from a path / URI.
# If you already have an image loaded, follow the subsequent steps
image_uri = 'imagenet-samples/cat_dog.jpg'
# this is the original "cat dog" image used in the Grad-CAM paper
# check the image with Pillow
im = Image.open(image_uri)
print(type(im))
display(im)
We see that this image will need some preprocessing to have the correct dimensions! Let's resize it:
In [4]:
# we could resize the image manually
# but instead let's use a utility function from `keras.preprocessing`
# we pass the required dimensions as a (height, width) tuple
im = keras.preprocessing.image.load_img(image_uri, target_size=dims) # -> PIL image
print(im)
display(im)
Looking good. Now we need to convert the image to a numpy array.
In [5]:
# we use a routine from `keras.preprocessing` for that as well
# we get a 'doc', an object almost ready to be inputted into the model
doc = keras.preprocessing.image.img_to_array(im) # -> numpy array
print(type(doc), doc.shape)
In [6]:
# dimensions are looking good
# except that we are missing one thing - the batch size
# we can use a numpy routine to create an axis in the first position
doc = np.expand_dims(doc, axis=0)
print(type(doc), doc.shape)
In [7]:
# `keras.applications` models come with their own input preprocessing function
# for best results, apply that as well
# mobilenetv2-specific preprocessing
# (this operation is in-place)
mobilenet_v2.preprocess_input(doc)
print(type(doc), doc.shape)
Let's convert back the array to an image just to check what we are inputting
In [8]:
# take back the first image from our 'batch'
image = keras.preprocessing.image.array_to_img(doc[0])
print(image)
display(image)
Ready to go!
In [9]:
# make a prediction about our sample image
predictions = model.predict(doc)
print(type(predictions), predictions.shape)
In [10]:
# check the top 5 indices
# `keras.applications` contains a function for that
top = mobilenet_v2.decode_predictions(predictions)
top_indices = np.argsort(predictions)[0, ::-1][:5]
print(top)
print(top_indices)
Indeed there is a dog in that picture The class ID (index into the output layer) 243
stands for bull mastiff
in ImageNet with 1000 classes (https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a ).
But how did the network know that? Let's check where the model "looked" for a dog with ELI5:
In [11]:
# we need to pass the network
# the input as a numpy array
eli5.show_prediction(model, doc)
Out[11]:
The dog region is highlighted. Makes sense!
When explaining image based models, we can optionally pass the image associated with the input as a Pillow image object. If we don't, the image will be created from doc
. This may not work with custom models or inputs, in which case it's worth passing the image explicitly.
In [12]:
eli5.show_prediction(model, doc, image=image)
Out[12]:
In [13]:
cat_idx = 282 # ImageNet ID for "tiger_cat" class, because we have a cat in the picture
eli5.show_prediction(model, doc, targets=[cat_idx]) # pass the class id
Out[13]:
The model looks at the cat now!
We have to pass the class ID as a list to the targets
parameter. Currently only one class can be explained at a time.
In [14]:
window_idx = 904 # 'window screen'
turtle_idx = 35 # 'mud turtle', some nonsense
display(eli5.show_prediction(model, doc, targets=[window_idx]))
display(eli5.show_prediction(model, doc, targets=[turtle_idx]))
That's quite noisy! Perhaps the model is weak at classifying 'window screens'! On the other hand the nonsense 'turtle' example could be excused.
Note that we need to wrap show_prediction()
with IPython.display.display()
to actually display the image when show_prediction()
is not the last thing in a cell.
In [15]:
# we could use model.summary() here, but the model has over 100 layers.
# we will only look at the first few and last few layers
head = model.layers[:5]
tail = model.layers[-8:]
def pretty_print_layers(layers):
for l in layers:
info = [l.name, type(l).__name__, l.output_shape, l.count_params()]
pretty_print(info)
def pretty_print(lst):
s = ',\t'.join(map(str, lst))
print(s)
pretty_print(['name', 'type', 'output shape', 'param. no'])
print('-'*100)
pretty_print([model.input.name, type(model.input), model.input_shape, 0])
pretty_print_layers(head)
print()
print('...')
print()
pretty_print_layers(tail)
Rough print but okay. Let's pick a few convolutional layers that are 'far apart' and do Grad-CAM on them:
In [16]:
for l in ['block_2_expand', 'block_9_expand', 'Conv_1']:
print(l)
display(eli5.show_prediction(model, doc, layer=l)) # we pass the layer as an argument
These results should make intuitive sense for Convolutional Neural Networks. Initial layers detect 'low level' features, ending layers detect 'high level' features!
The layer
parameter accepts a layer instance, index, name, or None (get layer automatically) as its arguments. This is where Grad-CAM builds its heatmap from.
This time we will use the eli5.explain_prediction()
and eli5.format_as_image()
functions (that are called one after the other by the convenience function eli5.show_prediction()
), so we can better understand what is going on.
In [17]:
expl = eli5.explain_prediction(model, doc)
Examining the structure of the Explanation
object:
In [18]:
print(expl)
We can check the score (raw value) or probability (normalized score) of the neuron for the predicted class, and get the class ID itself:
In [19]:
# we can access the various attributes of a target being explained
print((expl.targets[0].target, expl.targets[0].score, expl.targets[0].proba))
We can also access the original image and the Grad-CAM heatmap:
In [20]:
image = expl.image
heatmap = expl.targets[0].heatmap
display(image) # the .image attribute is a PIL image
print(heatmap) # the .heatmap attribute is a numpy array
Visualizing the heatmap:
In [21]:
heatmap_im = eli5.formatters.image.heatmap_to_image(heatmap)
display(heatmap_im)
That's only 7x7! This is the spatial dimensions of the activation/feature maps in the last layers of the network. What Grad-CAM produces is only a rough approximation.
Let's resize the heatmap (we have to pass the heatmap and the image with the required dimensions as Pillow images, and the filter for resampling):
In [22]:
heatmap_im = eli5.formatters.image.expand_heatmap(heatmap, image, resampling_filter=Image.BOX)
display(heatmap_im)
Now it's clear what is being highlighted. We just need to apply some colors and overlay the heatmap over the original image, exactly what eli5.format_as_image()
does!
In [23]:
I = eli5.format_as_image(expl)
display(I)
format_as_image()
has a couple of parameters too:
In [24]:
import matplotlib.cm
I = eli5.format_as_image(expl, alpha_limit=1.0, colormap=matplotlib.cm.cividis)
display(I)
The alpha_limit
argument controls the maximum opacity that the heatmap pixels should have. It is between 0.0 and 1.0. Low values are useful for seeing the original image.
The colormap
argument is a function (callable) that does the colorisation of the heatmap. See matplotlib.cm
for some options. Pick your favourite color!
Another optional argument is resampling_filter
. The default is PIL.Image.LANCZOS
(shown here). You have already seen PIL.Image.BOX
.
The original Grad-CAM paper (https://arxiv.org/pdf/1610.02391.pdf) suggests that we should use the output of the layer before softmax when doing Grad-CAM (use raw score values, not probabilities). Currently ELI5 simply takes the model as-is. Let's try and swap the softmax (logits) layer of our current model with a linear (no activation) layer, and check the explanation:
In [25]:
# first check the explanation *with* softmax
print('with softmax')
display(eli5.show_prediction(model, doc))
# remove softmax
l = model.get_layer(index=-1) # get the last (output) layer
l.activation = keras.activations.linear # swap activation
# save and load back the model as a trick to reload the graph
model.save('tmp_model_save_rmsoftmax') # note that this creates a file of the model
model = keras.models.load_model('tmp_model_save_rmsoftmax')
print('without softmax')
display(eli5.show_prediction(model, doc))
We see some slight differences. The activations are brighter. Do consider swapping out softmax if explanations for your model seem off.
According to the paper at https://arxiv.org/abs/1711.06104, if an explanation method such as Grad-CAM is any good, then explaining different models should yield different results. Let's verify that by loading another model and explaining a classification of the same image:
In [26]:
from keras.applications import nasnet
model2 = nasnet.NASNetMobile(include_top=True, weights='imagenet', classes=1000)
# we reload the image array to apply nasnet-specific preprocessing
doc2 = keras.preprocessing.image.img_to_array(im)
doc2 = np.expand_dims(doc2, axis=0)
nasnet.preprocess_input(doc2)
print(model.name)
# note that this model is without softmax
display(eli5.show_prediction(model, doc))
print(model2.name)
display(eli5.show_prediction(model2, doc2))
Wow show_prediction()
is so robust!