In [1]:
%tensorflow_version 1.x
#The default TensorFlow version in Colab switched from 1.x to 2.x on the 27th of March, 2020.


`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `1.x #https://colab.research.google.com/notebooks/tensorflow_version.ipynb`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.

In [2]:
# Clone the entire repo.
!git clone https://github.com/tensorflow/tcav.git tcav
%cd tcav
!ls


Cloning into 'tcav'...
remote: Enumerating objects: 88, done.
remote: Counting objects: 100% (88/88), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 588 (delta 44), reused 54 (delta 20), pack-reused 500
Receiving objects: 100% (588/588), 493.45 KiB | 2.48 MiB/s, done.
Resolving deltas: 100% (356/356), done.
/content/tcav
 CONTRIBUTING.md	 LICENSE     requirements.txt   setup.py
 FetchDataAndModels.sh	 README.md  'Run TCAV.ipynb'    tcav

In [3]:
%cd /content/tcav/tcav/tcav_examples/image_models/imagenet
%run download_and_make_datasets.py --source_dir=YOUR_FOLDER --number_of_images_per_folder=10 --number_of_random_folders=10


/content/tcav/tcav/tcav_examples/image_models/imagenet
Created source directory at YOUR_FOLDER
WARNING:tensorflow:From /content/tcav/tcav/tcav_examples/image_models/imagenet/imagenet_and_broden_fetcher.py:163: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Fetching imagenet data for zebra
INFO:tensorflow:Saving images at YOUR_FOLDER/zebra
INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.featurepics.com/FI/Marked/20060909/Zebra84929.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was <urlopen error [Errno -2] Name or service not known> for URL http://bonfire.learnnc.org/zoo/week02/photos/mccrary_zootales/images/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.istockphoto.com/file_thumbview_approve/269671/2/istockphoto_269671_zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 410: Gone for URL http://www.mediastorehouse.com/image/Grevys-Zebra_463879.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.lorilamont.com/Zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.cathouse-fcc.org/gifs-jpegs/southafrica/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was <urlopen error [Errno -3] Temporary failure in name resolution> for URL http://www.kilimanjaro.com/animals/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was <urlopen error [Errno -2] Name or service not known> for URL http://www.conceptexpeditions.com/images/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://alumnus.caltech.edu/~kantner/zebras/pictures/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 403: Forbidden for URL http://www.zslprints.com/image/Chapmans-Zebra_561562.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was [Errno 104] Connection reset by peer for URL http://www.free-slideshow.com/stock-photos/lovely_animals/zebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was <urlopen error [Errno -2] Name or service not known> for URL http://www.namibia-lov.com/lovnadivljac/zebra0001p4.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.hoopoe.com/images/lioneatingzebra.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not found for URL http://www.northrup.org/Photos/zebra/low/baby-zebra-with-mother.jpg

INFO:tensorflow:Problem downloading imagenet image. Exception was HTTP Error 404: Not Found for URL http://www.kotaku.com/assets/resources/2006/09/afrika_zebra_sm.jpg

Downloaded 10 for zebra
INFO:tensorflow:Using path YOUR_FOLDER/broden1_224/broden1_224/images/dtd/ for texture: striped
INFO:tensorflow:We have 120 images for the concept striped
INFO:tensorflow:Using path YOUR_FOLDER/broden1_224/broden1_224/images/dtd/ for texture: dotted
INFO:tensorflow:We have 120 images for the concept dotted
INFO:tensorflow:Using path YOUR_FOLDER/broden1_224/broden1_224/images/dtd/ for texture: zigzagged
INFO:tensorflow:We have 120 images for the concept zigzagged
INFO:tensorflow:Downloaded 10/10 for random500_0
INFO:tensorflow:Downloaded 10/10 for random500_1
INFO:tensorflow:Downloaded 10/10 for random500_2
INFO:tensorflow:Downloaded 10/10 for random500_3
INFO:tensorflow:Downloaded 10/10 for random500_4
INFO:tensorflow:Downloaded 10/10 for random500_5
INFO:tensorflow:Downloaded 10/10 for random500_6
INFO:tensorflow:Downloaded 10/10 for random500_7
INFO:tensorflow:Downloaded 10/10 for random500_8
INFO:tensorflow:Downloaded 10/10 for random500_9
INFO:tensorflow:Downloaded 10/10 for random500_10
Successfully created data at YOUR_FOLDER

In [4]:
%cd /content/tcav


/content/tcav

Running TCAV

This notebook walks you through things you need to run TCAV.

Before running this notebook, run the following to download all the data.

cd tcav/tcav_examples/image_models/imagenet

python download_and_make_datasets.py --source_dir=YOUR_PATH --number_of_images_per_folder=50 --number_of_random_folders=3

In high level, you need:

  1. example images in each folder (you have this if you ran the above)
    • images for each concept
    • images for the class/labels of interest
    • random images that will be negative examples when learning CAVs (images that probably don't belong to any concepts)
  2. model wrapper (below uses example from tcav/model.py)
    • an instance of ModelWrapper abstract class (in model.py). This tells TCAV class (tcav.py) how to communicate with your model (e.g., getting internal tensors)
  3. act_generator (below uses example from tcav/activation_generator.py)
    • an instance of ActivationGeneratorInterface that tells TCAV class how to load example data and how to get activations from the model

Requirements

pip install the tcav and tensorflow packages (or tensorflow-gpu if using GPU)

In [0]:
%load_ext autoreload
%autoreload 2

In [0]:
import tcav.activation_generator as act_gen
import tcav.cav as cav
import tcav.model  as model
import tcav.tcav as tcav
import tcav.utils as utils
import tcav.utils_plot as utils_plot # utils_plot requires matplotlib
import os 
import tensorflow as tf

Step 1. Store concept and target class images to local folders

and tell TCAV where they are.

source_dir: where images of concepts, target class and random images (negative samples when learning CAVs) live. Each should be a sub-folder within this directory.

Note that random image directories can be in any name. In this example, we are using random500_0, random500_1,.. for an arbitrary reason.

You need roughly 50-200 images per concept and target class (10-20 pictures also tend to work, but 200 is pretty safe).

cav_dir: directory to store CAVs (None if you don't want to store)

target, concept: names of the target class (that you want to investigate) and concepts (strings) - these are folder names in source_dir

bottlenecks: list of bottleneck names (intermediate layers in your model) that you want to use for TCAV. These names are defined in the model wrapper below.


In [8]:
# This is the name of your model wrapper (InceptionV3 and GoogleNet are provided in model.py)
model_to_run = 'GoogleNet'
# the name of the parent directory that results are stored (only if you want to cache)
project_name = 'tcav_class_test'
working_dir = '/content/tcav/tcav'
# where activations are stored (only if your act_gen_wrapper does so)
activation_dir =  working_dir+ '/activations/'
# where CAVs are stored. 
# You can say None if you don't wish to store any.
cav_dir = working_dir + '/cavs/'
# where the images live. 
source_dir = '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER'
bottlenecks = [ 'mixed4c']  # @param 
      
utils.make_dir_if_not_exists(activation_dir)
utils.make_dir_if_not_exists(working_dir)
utils.make_dir_if_not_exists(cav_dir)

# this is a regularizer penalty parameter for linear classifier to get CAVs. 
alphas = [0.1]   

target = 'zebra'  
concepts = ["dotted","striped","zigzagged"]


REMEMBER TO UPDATE YOUR_PATH (where images, models are)!

Step 2. Write your model wrapper

Next step is to tell TCAV how to communicate with your model. See model.GoogleNetWrapper_public for details.

You can define a subclass of ModelWrapper abstract class to do this. Let me walk you thru what each function does (tho they are pretty self-explanatory). This wrapper includes a lot of the functions that you already have, for example, get_prediction.

1. Tensors from the graph: bottleneck tensors and ends

First, store your bottleneck tensors in self.bottlenecks_tensors as a dictionary. You only need bottlenecks that you are interested in running TCAV with. Similarly, fill in self.ends dictionary with input, logit and prediction tensors.

2. Define loss

Get your loss tensor, and assigned it to self.loss. This is what TCAV uses to take directional derivatives.

While doing so, you would also want to set

self.y_input

this simply is a tensorflow place holder for the target index in the logit layer (e.g., 0 index for a dog, 1 for a cat). For multi-class classification, typically something like this works:

self.y_input = tf.placeholder(tf.int64, shape=[None])

For example, for a multiclass classifier, something like below would work.

# Construct gradient ops.
    with g.as_default():
      self.y_input = tf.placeholder(tf.int64, shape=[None])

      self.pred = tf.expand_dims(self.ends['prediction'][0], 0)

      self.loss = tf.reduce_mean(
          tf.nn.softmax_cross_entropy_with_logits(
              labels=tf.one_hot(self.y_input, len(self.labels)),
              logits=self.pred))
    self._make_gradient_tensors()

3. Call _make_gradient_tensors in init() of your wrapper

_make_gradient_tensors()

does what you expect - given the loss and bottleneck tensors defined above, it adds gradient tensors.

4. Fill in labels, image shapes and a model name.

Get the mapping from labels (strings) to indice in the logit layer (int) in a dictionary format.

def id_to_label(self, idx)
def label_to_id(self, label)

Set your input image shape at self.image_shape

Set your model name to self.model_name

You are done with writing the model wrapper! I wrote two model wrapers, InceptionV3 and Googlenet.

sess: a tensorflow session.


In [9]:
%cp -av '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224' '/content/tcav/tcav/mobilenet_v2_1.0_224'
%rm '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224'


'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224' -> '/content/tcav/tcav/mobilenet_v2_1.0_224'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_eval.pbtxt' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_eval.pbtxt'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.data-00000-of-00001' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.data-00000-of-00001'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_frozen.pb' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_frozen.pb'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.tflite' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.tflite'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_info.txt' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224_info.txt'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.meta' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.meta'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.index' -> '/content/tcav/tcav/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.ckpt.index'
rm: cannot remove '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/mobilenet_v2_1.0_224': Is a directory

In [10]:
%cp -av '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h' '/content/tcav/tcav/inception5h'
%rm '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h'


'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h' -> '/content/tcav/tcav/inception5h'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h/imagenet_comp_graph_label_strings.txt' -> '/content/tcav/tcav/inception5h/imagenet_comp_graph_label_strings.txt'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h/tensorflow_inception_graph.pb' -> '/content/tcav/tcav/inception5h/tensorflow_inception_graph.pb'
'/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h/LICENSE' -> '/content/tcav/tcav/inception5h/LICENSE'
rm: cannot remove '/content/tcav/tcav/tcav_examples/image_models/imagenet/YOUR_FOLDER/inception5h': Is a directory

In [11]:
sess = utils.create_session()

# GRAPH_PATH is where the trained model is stored.
GRAPH_PATH = "/content/tcav/tcav/inception5h/tensorflow_inception_graph.pb"
# LABEL_PATH is where the labels are stored. Each line contains one class, and they are ordered with respect to their index in 
# the logit layer. (yes, id_to_label function in the model wrapper reads from this file.)
# For example, imagenet_comp_graph_label_strings.txt looks like:
# dummy                                                                                      
# kit fox
# English setter
# Siberian husky ...

LABEL_PATH = "/content/tcav/tcav/inception5h/imagenet_comp_graph_label_strings.txt"

mymodel = model.GoogleNetWrapper_public(sess,
                                        GRAPH_PATH,
                                        LABEL_PATH)


WARNING:tensorflow:From /content/tcav/tcav/utils.py:40: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /content/tcav/tcav/utils.py:44: The name tf.InteractiveSession is deprecated. Please use tf.compat.v1.InteractiveSession instead.

WARNING:tensorflow:From /content/tcav/tcav/model.py:304: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /content/tcav/tcav/model.py:310: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

WARNING:tensorflow:From /content/tcav/tcav/model.py:293: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /content/tcav/tcav/model.py:263: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Step 3. Implement a class that returns activations (maybe with caching!)

Lastly, you will implement a class of the ActivationGenerationInterface which TCAV uses to load example data for a given concept or target, call into your model wrapper and return activations. I pulled out this logic outside of mymodel because this step often takes the longest. By making it modular, you can cache your activations and/or parallelize your computations, as I have done in ActivationGeneratorBase.process_and_load_activations in activation_generator.py.

The process_and_load_activations method of the activation generator must return a dictionary of activations that has concept or target name as a first key, and the bottleneck name as a second key. So something like:

{concept1: {bottleneck1: [[0.2, 0.1, ....]]},
concept2: {bottleneck1: [[0.1, 0.02, ....]]},
target1: {bottleneck1: [[0.02, 0.99, ....]]}

In [0]:
act_generator = act_gen.ImageActivationGenerator(mymodel, source_dir, activation_dir, max_examples=100)

You are ready to run TCAV!

Let's do it.

num_random_exp: number of experiments to confirm meaningful concept direction. TCAV will search for this many folders named random500_0, random500_1, etc. You can alternatively set the random_concepts keyword to be a list of folders of random concepts. Run at least 10-20 for meaningful tests.

random_counterpart: as well as the above, you can optionally supply a single folder with random images as the "positive set" for statistical testing. Reduces computation time at the cost of less reliable random TCAV scores.


In [13]:
tf.logging.set_verbosity(0)
num_random_exp=10
## only running num_random_exp = 10 to save some time. The paper number are reported for 500 random runs. 
mytcav = tcav.TCAV(sess,
                   target,
                   concepts,
                   bottlenecks,
                   act_generator,
                   alphas,
                   cav_dir=cav_dir,
                   num_random_exp=num_random_exp)#10)
print ('This may take a while... Go get coffee!')
results = mytcav.run(run_parallel=False)
print ('done!')


This may take a while... Go get coffee!
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

done!

In [14]:
utils_plot.plot_results(results, num_random_exp=num_random_exp)


Class = zebra
  Concept = dotted
    Bottleneck = mixed4c. TCAV Score = 0.49 (+- 0.21), random was 0.48 (+- 0.22). p-val = 0.903 (not significant)
  Concept = striped
    Bottleneck = mixed4c. TCAV Score = 0.86 (+- 0.10), random was 0.48 (+- 0.22). p-val = 0.000 (significant)
  Concept = zigzagged
    Bottleneck = mixed4c. TCAV Score = 0.88 (+- 0.15), random was 0.48 (+- 0.22). p-val = 0.000 (significant)
{'mixed4c': {'bn_vals': [0.01, 0.8600000000000001, 0.8800000000000001], 'bn_stds': [0, 0.10198039027185571, 0.14696938456699069], 'significant': [False, True, True]}}