Session 4: Visualizing Representations

Creative Applications of Deep Learning with Google's Tensorflow Parag K. Mital Kadenze, Inc.

Learning Goals

  • Learn how to inspect deep networks by visualizing their gradients
  • Learn how to "deep dream" with different objective functions and regularization techniques
  • Learn how to "stylize" an image using content and style losses from different images

Introduction

So far, we've seen that a deep convolutional network can get very high accuracy in classifying the MNIST dataset, a dataset of handwritten digits numbered 0 - 9. What happens when the number of classes grows higher than 10 possibilities? Or the images get much larger? We're going to explore a few new datasets and bigger and better models to try and find out. We'll then explore a few interesting visualization tehcniques to help us understand what the networks are representing in its deeper layers and how these techniques can be used for some very interesting creative applications.

Deep Convolutional Networks

Almost 30 years of computer vision and machine learning research based on images takes an approach to processing images like what we saw at the end of Session 1: you take an image, convolve it with a set of edge detectors like the gabor filter we created, and then find some thresholding of this image to find more interesting features, such as corners, or look at histograms of the number of some orientation of edges in a particular window. In the previous session, we started to see how Deep Learning has allowed us to move away from hand crafted features such as Gabor-like filters to letting data discover representations. Though, how well does it scale?

A seminal shift in the perceived capabilities of deep neural networks occurred in 2012. A network dubbed AlexNet, after its primary author, Alex Krizevsky, achieved remarkable performance on one of the most difficult computer vision datasets at the time, ImageNet. <TODO: Insert montage of ImageNet>. ImageNet is a dataset used in a yearly challenge called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), started in 2010. The dataset contains nearly 1.2 million images composed of 1000 different types of objects. Each object has anywhere between 600 - 1200 different images. <TODO: Histogram of object labels>

Up until now, the most number of labels we've considered is 10! The image sizes were also very small, only 28 x 28 pixels, and it didn't even have color.

Let's look at a state-of-the-art network that has already been trained on ImageNet.

Loading a Pretrained Network

We can use an existing network that has been trained by loading the model's weights into a network definition. The network definition is basically saying what are the set of operations in the tensorflow graph. So how is the image manipulated, filtered, in order to get from an input image to a probability saying which 1 of 1000 possible objects is the image describing? It also restores the model's weights. Those are the values of every parameter in the network learned through gradient descent. Luckily, many researchers are releasing their model definitions and weights so we don't have to train them! We just have to load them up and then we can use the model straight away. That's very lucky for us because these models take a lot of time, cpu, memory, and money to train.

To get the files required for these models, you'll need to download them from the resources page.

First, let's import some necessary libraries.


In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import IPython.display as ipyd
from libs import gif, nb_utils

In [2]:
# Bit of formatting because I don't like the default inline code style:
from IPython.core.display import HTML
HTML("""<style> .rendered_html code { 
    padding: 2px 4px;
    color: #c7254e;
    background-color: #f9f2f4;
    border-radius: 4px;
} </style>""")


Out[2]:

Start an interactive session:


In [3]:
sess = tf.InteractiveSession()

Now we'll load Google's Inception model, which is a pretrained network for classification built using the ImageNet database. I've included some helper functions for getting this model loaded and setup w/ Tensorflow.


In [4]:
from libs import inception
net = inception.get_inception_model()

Here's a little extra that wasn't in the lecture. We can visualize the graph definition using the nb_utils module's show_graph function. This function is taken from an example in the Tensorflow repo so I can't take credit for it! It uses Tensorboard, which we didn't get a chance to discuss, Tensorflow's web interface for visualizing graphs and training performance. It is very useful but we sadly did not have enough time to discuss this!


In [5]:
nb_utils.show_graph(net['graph_def'])


We'll now get the graph from the storage container, and tell tensorflow to use this as its own graph. This will add all the computations we need to compute the entire deep net, as well as all of the pre-trained parameters.


In [6]:
tf.import_graph_def(net['graph_def'], name='inception')

In [7]:
net['labels']


Out[7]:
[(0, 'dummy'),
 (1, 'kit fox'),
 (2, 'English setter'),
 (3, 'Siberian husky'),
 (4, 'Australian terrier'),
 (5, 'English springer'),
 (6, 'grey whale'),
 (7, 'lesser panda'),
 (8, 'Egyptian cat'),
 (9, 'ibex'),
 (10, 'Persian cat'),
 (11, 'cougar'),
 (12, 'gazelle'),
 (13, 'porcupine'),
 (14, 'sea lion'),
 (15, 'malamute'),
 (16, 'badger'),
 (17, 'Great Dane'),
 (18, 'Walker hound'),
 (19, 'Welsh springer spaniel'),
 (20, 'whippet'),
 (21, 'Scottish deerhound'),
 (22, 'killer whale'),
 (23, 'mink'),
 (24, 'African elephant'),
 (25, 'Weimaraner'),
 (26, 'soft-coated wheaten terrier'),
 (27, 'Dandie Dinmont'),
 (28, 'red wolf'),
 (29, 'Old English sheepdog'),
 (30, 'jaguar'),
 (31, 'otterhound'),
 (32, 'bloodhound'),
 (33, 'Airedale'),
 (34, 'hyena'),
 (35, 'meerkat'),
 (36, 'giant schnauzer'),
 (37, 'titi'),
 (38, 'three-toed sloth'),
 (39, 'sorrel'),
 (40, 'black-footed ferret'),
 (41, 'dalmatian'),
 (42, 'black-and-tan coonhound'),
 (43, 'papillon'),
 (44, 'skunk'),
 (45, 'Staffordshire bullterrier'),
 (46, 'Mexican hairless'),
 (47, 'Bouvier des Flandres'),
 (48, 'weasel'),
 (49, 'miniature poodle'),
 (50, 'Cardigan'),
 (51, 'malinois'),
 (52, 'bighorn'),
 (53, 'fox squirrel'),
 (54, 'colobus'),
 (55, 'tiger cat'),
 (56, 'Lhasa'),
 (57, 'impala'),
 (58, 'coyote'),
 (59, 'Yorkshire terrier'),
 (60, 'Newfoundland'),
 (61, 'brown bear'),
 (62, 'red fox'),
 (63, 'Norwegian elkhound'),
 (64, 'Rottweiler'),
 (65, 'hartebeest'),
 (66, 'Saluki'),
 (67, 'grey fox'),
 (68, 'schipperke'),
 (69, 'Pekinese'),
 (70, 'Brabancon griffon'),
 (71, 'West Highland white terrier'),
 (72, 'Sealyham terrier'),
 (73, 'guenon'),
 (74, 'mongoose'),
 (75, 'indri'),
 (76, 'tiger'),
 (77, 'Irish wolfhound'),
 (78, 'wild boar'),
 (79, 'EntleBucher'),
 (80, 'zebra'),
 (81, 'ram'),
 (82, 'French bulldog'),
 (83, 'orangutan'),
 (84, 'basenji'),
 (85, 'leopard'),
 (86, 'Bernese mountain dog'),
 (87, 'Maltese dog'),
 (88, 'Norfolk terrier'),
 (89, 'toy terrier'),
 (90, 'vizsla'),
 (91, 'cairn'),
 (92, 'squirrel monkey'),
 (93, 'groenendael'),
 (94, 'clumber'),
 (95, 'Siamese cat'),
 (96, 'chimpanzee'),
 (97, 'komondor'),
 (98, 'Afghan hound'),
 (99, 'Japanese spaniel'),
 (100, 'proboscis monkey'),
 (101, 'guinea pig'),
 (102, 'white wolf'),
 (103, 'ice bear'),
 (104, 'gorilla'),
 (105, 'borzoi'),
 (106, 'toy poodle'),
 (107, 'Kerry blue terrier'),
 (108, 'ox'),
 (109, 'Scotch terrier'),
 (110, 'Tibetan mastiff'),
 (111, 'spider monkey'),
 (112, 'Doberman'),
 (113, 'Boston bull'),
 (114, 'Greater Swiss Mountain dog'),
 (115, 'Appenzeller'),
 (116, 'Shih-Tzu'),
 (117, 'Irish water spaniel'),
 (118, 'Pomeranian'),
 (119, 'Bedlington terrier'),
 (120, 'warthog'),
 (121, 'Arabian camel'),
 (122, 'siamang'),
 (123, 'miniature schnauzer'),
 (124, 'collie'),
 (125, 'golden retriever'),
 (126, 'Irish terrier'),
 (127, 'affenpinscher'),
 (128, 'Border collie'),
 (129, 'hare'),
 (130, 'boxer'),
 (131, 'silky terrier'),
 (132, 'beagle'),
 (133, 'Leonberg'),
 (134, 'German short-haired pointer'),
 (135, 'patas'),
 (136, 'dhole'),
 (137, 'baboon'),
 (138, 'macaque'),
 (139, 'Chesapeake Bay retriever'),
 (140, 'bull mastiff'),
 (141, 'kuvasz'),
 (142, 'capuchin'),
 (143, 'pug'),
 (144, 'curly-coated retriever'),
 (145, 'Norwich terrier'),
 (146, 'flat-coated retriever'),
 (147, 'hog'),
 (148, 'keeshond'),
 (149, 'Eskimo dog'),
 (150, 'Brittany spaniel'),
 (151, 'standard poodle'),
 (152, 'Lakeland terrier'),
 (153, 'snow leopard'),
 (154, 'Gordon setter'),
 (155, 'dingo'),
 (156, 'standard schnauzer'),
 (157, 'hamster'),
 (158, 'Tibetan terrier'),
 (159, 'Arctic fox'),
 (160, 'wire-haired fox terrier'),
 (161, 'basset'),
 (162, 'water buffalo'),
 (163, 'American black bear'),
 (164, 'Angora'),
 (165, 'bison'),
 (166, 'howler monkey'),
 (167, 'hippopotamus'),
 (168, 'chow'),
 (169, 'giant panda'),
 (170, 'American Staffordshire terrier'),
 (171, 'Shetland sheepdog'),
 (172, 'Great Pyrenees'),
 (173, 'Chihuahua'),
 (174, 'tabby'),
 (175, 'marmoset'),
 (176, 'Labrador retriever'),
 (177, 'Saint Bernard'),
 (178, 'armadillo'),
 (179, 'Samoyed'),
 (180, 'bluetick'),
 (181, 'redbone'),
 (182, 'polecat'),
 (183, 'marmot'),
 (184, 'kelpie'),
 (185, 'gibbon'),
 (186, 'llama'),
 (187, 'miniature pinscher'),
 (188, 'wood rabbit'),
 (189, 'Italian greyhound'),
 (190, 'lion'),
 (191, 'cocker spaniel'),
 (192, 'Irish setter'),
 (193, 'dugong'),
 (194, 'Indian elephant'),
 (195, 'beaver'),
 (196, 'Sussex spaniel'),
 (197, 'Pembroke'),
 (198, 'Blenheim spaniel'),
 (199, 'Madagascar cat'),
 (200, 'Rhodesian ridgeback'),
 (201, 'lynx'),
 (202, 'African hunting dog'),
 (203, 'langur'),
 (204, 'Ibizan hound'),
 (205, 'timber wolf'),
 (206, 'cheetah'),
 (207, 'English foxhound'),
 (208, 'briard'),
 (209, 'sloth bear'),
 (210, 'Border terrier'),
 (211, 'German shepherd'),
 (212, 'otter'),
 (213, 'koala'),
 (214, 'tusker'),
 (215, 'echidna'),
 (216, 'wallaby'),
 (217, 'platypus'),
 (218, 'wombat'),
 (219, 'revolver'),
 (220, 'umbrella'),
 (221, 'schooner'),
 (222, 'soccer ball'),
 (223, 'accordion'),
 (224, 'ant'),
 (225, 'starfish'),
 (226, 'chambered nautilus'),
 (227, 'grand piano'),
 (228, 'laptop'),
 (229, 'strawberry'),
 (230, 'airliner'),
 (231, 'warplane'),
 (232, 'airship'),
 (233, 'balloon'),
 (234, 'space shuttle'),
 (235, 'fireboat'),
 (236, 'gondola'),
 (237, 'speedboat'),
 (238, 'lifeboat'),
 (239, 'canoe'),
 (240, 'yawl'),
 (241, 'catamaran'),
 (242, 'trimaran'),
 (243, 'container ship'),
 (244, 'liner'),
 (245, 'pirate'),
 (246, 'aircraft carrier'),
 (247, 'submarine'),
 (248, 'wreck'),
 (249, 'half track'),
 (250, 'tank'),
 (251, 'missile'),
 (252, 'bobsled'),
 (253, 'dogsled'),
 (254, 'bicycle-built-for-two'),
 (255, 'mountain bike'),
 (256, 'freight car'),
 (257, 'passenger car'),
 (258, 'barrow'),
 (259, 'shopping cart'),
 (260, 'motor scooter'),
 (261, 'forklift'),
 (262, 'electric locomotive'),
 (263, 'steam locomotive'),
 (264, 'amphibian'),
 (265, 'ambulance'),
 (266, 'beach wagon'),
 (267, 'cab'),
 (268, 'convertible'),
 (269, 'jeep'),
 (270, 'limousine'),
 (271, 'minivan'),
 (272, 'Model T'),
 (273, 'racer'),
 (274, 'sports car'),
 (275, 'go-kart'),
 (276, 'golfcart'),
 (277, 'moped'),
 (278, 'snowplow'),
 (279, 'fire engine'),
 (280, 'garbage truck'),
 (281, 'pickup'),
 (282, 'tow truck'),
 (283, 'trailer truck'),
 (284, 'moving van'),
 (285, 'police van'),
 (286, 'recreational vehicle'),
 (287, 'streetcar'),
 (288, 'snowmobile'),
 (289, 'tractor'),
 (290, 'mobile home'),
 (291, 'tricycle'),
 (292, 'unicycle'),
 (293, 'horse cart'),
 (294, 'jinrikisha'),
 (295, 'oxcart'),
 (296, 'bassinet'),
 (297, 'cradle'),
 (298, 'crib'),
 (299, 'four-poster'),
 (300, 'bookcase'),
 (301, 'china cabinet'),
 (302, 'medicine chest'),
 (303, 'chiffonier'),
 (304, 'table lamp'),
 (305, 'file'),
 (306, 'park bench'),
 (307, 'barber chair'),
 (308, 'throne'),
 (309, 'folding chair'),
 (310, 'rocking chair'),
 (311, 'studio couch'),
 (312, 'toilet seat'),
 (313, 'desk'),
 (314, 'pool table'),
 (315, 'dining table'),
 (316, 'entertainment center'),
 (317, 'wardrobe'),
 (318, 'Granny Smith'),
 (319, 'orange'),
 (320, 'lemon'),
 (321, 'fig'),
 (322, 'pineapple'),
 (323, 'banana'),
 (324, 'jackfruit'),
 (325, 'custard apple'),
 (326, 'pomegranate'),
 (327, 'acorn'),
 (328, 'hip'),
 (329, 'ear'),
 (330, 'rapeseed'),
 (331, 'corn'),
 (332, 'buckeye'),
 (333, 'organ'),
 (334, 'upright'),
 (335, 'chime'),
 (336, 'drum'),
 (337, 'gong'),
 (338, 'maraca'),
 (339, 'marimba'),
 (340, 'steel drum'),
 (341, 'banjo'),
 (342, 'cello'),
 (343, 'violin'),
 (344, 'harp'),
 (345, 'acoustic guitar'),
 (346, 'electric guitar'),
 (347, 'cornet'),
 (348, 'French horn'),
 (349, 'trombone'),
 (350, 'harmonica'),
 (351, 'ocarina'),
 (352, 'panpipe'),
 (353, 'bassoon'),
 (354, 'oboe'),
 (355, 'sax'),
 (356, 'flute'),
 (357, 'daisy'),
 (358, "yellow lady's slipper"),
 (359, 'cliff'),
 (360, 'valley'),
 (361, 'alp'),
 (362, 'volcano'),
 (363, 'promontory'),
 (364, 'sandbar'),
 (365, 'coral reef'),
 (366, 'lakeside'),
 (367, 'seashore'),
 (368, 'geyser'),
 (369, 'hatchet'),
 (370, 'cleaver'),
 (371, 'letter opener'),
 (372, 'plane'),
 (373, 'power drill'),
 (374, 'lawn mower'),
 (375, 'hammer'),
 (376, 'corkscrew'),
 (377, 'can opener'),
 (378, 'plunger'),
 (379, 'screwdriver'),
 (380, 'shovel'),
 (381, 'plow'),
 (382, 'chain saw'),
 (383, 'cock'),
 (384, 'hen'),
 (385, 'ostrich'),
 (386, 'brambling'),
 (387, 'goldfinch'),
 (388, 'house finch'),
 (389, 'junco'),
 (390, 'indigo bunting'),
 (391, 'robin'),
 (392, 'bulbul'),
 (393, 'jay'),
 (394, 'magpie'),
 (395, 'chickadee'),
 (396, 'water ouzel'),
 (397, 'kite'),
 (398, 'bald eagle'),
 (399, 'vulture'),
 (400, 'great grey owl'),
 (401, 'black grouse'),
 (402, 'ptarmigan'),
 (403, 'ruffed grouse'),
 (404, 'prairie chicken'),
 (405, 'peacock'),
 (406, 'quail'),
 (407, 'partridge'),
 (408, 'African grey'),
 (409, 'macaw'),
 (410, 'sulphur-crested cockatoo'),
 (411, 'lorikeet'),
 (412, 'coucal'),
 (413, 'bee eater'),
 (414, 'hornbill'),
 (415, 'hummingbird'),
 (416, 'jacamar'),
 (417, 'toucan'),
 (418, 'drake'),
 (419, 'red-breasted merganser'),
 (420, 'goose'),
 (421, 'black swan'),
 (422, 'white stork'),
 (423, 'black stork'),
 (424, 'spoonbill'),
 (425, 'flamingo'),
 (426, 'American egret'),
 (427, 'little blue heron'),
 (428, 'bittern'),
 (429, 'crane'),
 (430, 'limpkin'),
 (431, 'American coot'),
 (432, 'bustard'),
 (433, 'ruddy turnstone'),
 (434, 'red-backed sandpiper'),
 (435, 'redshank'),
 (436, 'dowitcher'),
 (437, 'oystercatcher'),
 (438, 'European gallinule'),
 (439, 'pelican'),
 (440, 'king penguin'),
 (441, 'albatross'),
 (442, 'great white shark'),
 (443, 'tiger shark'),
 (444, 'hammerhead'),
 (445, 'electric ray'),
 (446, 'stingray'),
 (447, 'barracouta'),
 (448, 'coho'),
 (449, 'tench'),
 (450, 'goldfish'),
 (451, 'eel'),
 (452, 'rock beauty'),
 (453, 'anemone fish'),
 (454, 'lionfish'),
 (455, 'puffer'),
 (456, 'sturgeon'),
 (457, 'gar'),
 (458, 'loggerhead'),
 (459, 'leatherback turtle'),
 (460, 'mud turtle'),
 (461, 'terrapin'),
 (462, 'box turtle'),
 (463, 'banded gecko'),
 (464, 'common iguana'),
 (465, 'American chameleon'),
 (466, 'whiptail'),
 (467, 'agama'),
 (468, 'frilled lizard'),
 (469, 'alligator lizard'),
 (470, 'Gila monster'),
 (471, 'green lizard'),
 (472, 'African chameleon'),
 (473, 'Komodo dragon'),
 (474, 'triceratops'),
 (475, 'African crocodile'),
 (476, 'American alligator'),
 (477, 'thunder snake'),
 (478, 'ringneck snake'),
 (479, 'hognose snake'),
 (480, 'green snake'),
 (481, 'king snake'),
 (482, 'garter snake'),
 (483, 'water snake'),
 (484, 'vine snake'),
 (485, 'night snake'),
 (486, 'boa constrictor'),
 (487, 'rock python'),
 (488, 'Indian cobra'),
 (489, 'green mamba'),
 (490, 'sea snake'),
 (491, 'horned viper'),
 (492, 'diamondback'),
 (493, 'sidewinder'),
 (494, 'European fire salamander'),
 (495, 'common newt'),
 (496, 'eft'),
 (497, 'spotted salamander'),
 (498, 'axolotl'),
 (499, 'bullfrog'),
 (500, 'tree frog'),
 (501, 'tailed frog'),
 (502, 'whistle'),
 (503, 'wing'),
 (504, 'paintbrush'),
 (505, 'hand blower'),
 (506, 'oxygen mask'),
 (507, 'snorkel'),
 (508, 'loudspeaker'),
 (509, 'microphone'),
 (510, 'screen'),
 (511, 'mouse'),
 (512, 'electric fan'),
 (513, 'oil filter'),
 (514, 'strainer'),
 (515, 'space heater'),
 (516, 'stove'),
 (517, 'guillotine'),
 (518, 'barometer'),
 (519, 'rule'),
 (520, 'odometer'),
 (521, 'scale'),
 (522, 'analog clock'),
 (523, 'digital clock'),
 (524, 'wall clock'),
 (525, 'hourglass'),
 (526, 'sundial'),
 (527, 'parking meter'),
 (528, 'stopwatch'),
 (529, 'digital watch'),
 (530, 'stethoscope'),
 (531, 'syringe'),
 (532, 'magnetic compass'),
 (533, 'binoculars'),
 (534, 'projector'),
 (535, 'sunglasses'),
 (536, 'loupe'),
 (537, 'radio telescope'),
 (538, 'bow'),
 (539, 'cannon [ground]'),
 (540, 'assault rifle'),
 (541, 'rifle'),
 (542, 'projectile'),
 (543, 'computer keyboard'),
 (544, 'typewriter keyboard'),
 (545, 'crane'),
 (546, 'lighter'),
 (547, 'abacus'),
 (548, 'cash machine'),
 (549, 'slide rule'),
 (550, 'desktop computer'),
 (551, 'hand-held computer'),
 (552, 'notebook'),
 (553, 'web site'),
 (554, 'harvester'),
 (555, 'thresher'),
 (556, 'printer'),
 (557, 'slot'),
 (558, 'vending machine'),
 (559, 'sewing machine'),
 (560, 'joystick'),
 (561, 'switch'),
 (562, 'hook'),
 (563, 'car wheel'),
 (564, 'paddlewheel'),
 (565, 'pinwheel'),
 (566, "potter's wheel"),
 (567, 'gas pump'),
 (568, 'carousel'),
 (569, 'swing'),
 (570, 'reel'),
 (571, 'radiator'),
 (572, 'puck'),
 (573, 'hard disc'),
 (574, 'sunglass'),
 (575, 'pick'),
 (576, 'car mirror'),
 (577, 'solar dish'),
 (578, 'remote control'),
 (579, 'disk brake'),
 (580, 'buckle'),
 (581, 'hair slide'),
 (582, 'knot'),
 (583, 'combination lock'),
 (584, 'padlock'),
 (585, 'nail'),
 (586, 'safety pin'),
 (587, 'screw'),
 (588, 'muzzle'),
 (589, 'seat belt'),
 (590, 'ski'),
 (591, 'candle'),
 (592, "jack-o'-lantern"),
 (593, 'spotlight'),
 (594, 'torch'),
 (595, 'neck brace'),
 (596, 'pier'),
 (597, 'tripod'),
 (598, 'maypole'),
 (599, 'mousetrap'),
 (600, 'spider web'),
 (601, 'trilobite'),
 (602, 'harvestman'),
 (603, 'scorpion'),
 (604, 'black and gold garden spider'),
 (605, 'barn spider'),
 (606, 'garden spider'),
 (607, 'black widow'),
 (608, 'tarantula'),
 (609, 'wolf spider'),
 (610, 'tick'),
 (611, 'centipede'),
 (612, 'isopod'),
 (613, 'Dungeness crab'),
 (614, 'rock crab'),
 (615, 'fiddler crab'),
 (616, 'king crab'),
 (617, 'American lobster'),
 (618, 'spiny lobster'),
 (619, 'crayfish'),
 (620, 'hermit crab'),
 (621, 'tiger beetle'),
 (622, 'ladybug'),
 (623, 'ground beetle'),
 (624, 'long-horned beetle'),
 (625, 'leaf beetle'),
 (626, 'dung beetle'),
 (627, 'rhinoceros beetle'),
 (628, 'weevil'),
 (629, 'fly'),
 (630, 'bee'),
 (631, 'grasshopper'),
 (632, 'cricket'),
 (633, 'walking stick'),
 (634, 'cockroach'),
 (635, 'mantis'),
 (636, 'cicada'),
 (637, 'leafhopper'),
 (638, 'lacewing'),
 (639, 'dragonfly'),
 (640, 'damselfly'),
 (641, 'admiral'),
 (642, 'ringlet'),
 (643, 'monarch'),
 (644, 'cabbage butterfly'),
 (645, 'sulphur butterfly'),
 (646, 'lycaenid'),
 (647, 'jellyfish'),
 (648, 'sea anemone'),
 (649, 'brain coral'),
 (650, 'flatworm'),
 (651, 'nematode'),
 (652, 'conch'),
 (653, 'snail'),
 (654, 'slug'),
 (655, 'sea slug'),
 (656, 'chiton'),
 (657, 'sea urchin'),
 (658, 'sea cucumber'),
 (659, 'iron'),
 (660, 'espresso maker'),
 (661, 'microwave'),
 (662, 'Dutch oven'),
 (663, 'rotisserie'),
 (664, 'toaster'),
 (665, 'waffle iron'),
 (666, 'vacuum'),
 (667, 'dishwasher'),
 (668, 'refrigerator'),
 (669, 'washer'),
 (670, 'Crock Pot'),
 (671, 'frying pan'),
 (672, 'wok'),
 (673, 'caldron'),
 (674, 'coffeepot'),
 (675, 'teapot'),
 (676, 'spatula'),
 (677, 'altar'),
 (678, 'triumphal arch'),
 (679, 'patio'),
 (680, 'steel arch bridge'),
 (681, 'suspension bridge'),
 (682, 'viaduct'),
 (683, 'barn'),
 (684, 'greenhouse'),
 (685, 'palace'),
 (686, 'monastery'),
 (687, 'library'),
 (688, 'apiary'),
 (689, 'boathouse'),
 (690, 'church'),
 (691, 'mosque'),
 (692, 'stupa'),
 (693, 'planetarium'),
 (694, 'restaurant'),
 (695, 'cinema'),
 (696, 'home theater'),
 (697, 'lumbermill'),
 (698, 'coil'),
 (699, 'obelisk'),
 (700, 'totem pole'),
 (701, 'castle'),
 (702, 'prison'),
 (703, 'grocery store'),
 (704, 'bakery'),
 (705, 'barbershop'),
 (706, 'bookshop'),
 (707, 'butcher shop'),
 (708, 'confectionery'),
 (709, 'shoe shop'),
 (710, 'tobacco shop'),
 (711, 'toyshop'),
 (712, 'fountain'),
 (713, 'cliff dwelling'),
 (714, 'yurt'),
 (715, 'dock'),
 (716, 'brass'),
 (717, 'megalith'),
 (718, 'bannister'),
 (719, 'breakwater'),
 (720, 'dam'),
 (721, 'chainlink fence'),
 (722, 'picket fence'),
 (723, 'worm fence'),
 (724, 'stone wall'),
 (725, 'grille'),
 (726, 'sliding door'),
 (727, 'turnstile'),
 (728, 'mountain tent'),
 (729, 'scoreboard'),
 (730, 'honeycomb'),
 (731, 'plate rack'),
 (732, 'pedestal'),
 (733, 'beacon'),
 (734, 'mashed potato'),
 (735, 'bell pepper'),
 (736, 'head cabbage'),
 (737, 'broccoli'),
 (738, 'cauliflower'),
 (739, 'zucchini'),
 (740, 'spaghetti squash'),
 (741, 'acorn squash'),
 (742, 'butternut squash'),
 (743, 'cucumber'),
 (744, 'artichoke'),
 (745, 'cardoon'),
 (746, 'mushroom'),
 (747, 'shower curtain'),
 (748, 'jean'),
 (749, 'carton'),
 (750, 'handkerchief'),
 (751, 'sandal'),
 (752, 'ashcan'),
 (753, 'safe'),
 (754, 'plate'),
 (755, 'necklace'),
 (756, 'croquet ball'),
 (757, 'fur coat'),
 (758, 'thimble'),
 (759, 'pajama'),
 (760, 'running shoe'),
 (761, 'cocktail shaker'),
 (762, 'chest'),
 (763, 'manhole cover'),
 (764, 'modem'),
 (765, 'tub'),
 (766, 'tray'),
 (767, 'balance beam'),
 (768, 'bagel'),
 (769, 'prayer rug'),
 (770, 'kimono'),
 (771, 'hot pot'),
 (772, 'whiskey jug'),
 (773, 'knee pad'),
 (774, 'book jacket'),
 (775, 'spindle'),
 (776, 'ski mask'),
 (777, 'beer bottle'),
 (778, 'crash helmet'),
 (779, 'bottlecap'),
 (780, 'tile roof'),
 (781, 'mask'),
 (782, 'maillot'),
 (783, 'Petri dish'),
 (784, 'football helmet'),
 (785, 'bathing cap'),
 (786, 'teddy bear'),
 (787, 'holster'),
 (788, 'pop bottle'),
 (789, 'photocopier'),
 (790, 'vestment'),
 (791, 'crossword puzzle'),
 (792, 'golf ball'),
 (793, 'trifle'),
 (794, 'suit'),
 (795, 'water tower'),
 (796, 'feather boa'),
 (797, 'cloak'),
 (798, 'red wine'),
 (799, 'drumstick'),
 (800, 'shield'),
 (801, 'Christmas stocking'),
 (802, 'hoopskirt'),
 (803, 'menu'),
 (804, 'stage'),
 (805, 'bonnet'),
 (806, 'meat loaf'),
 (807, 'baseball'),
 (808, 'face powder'),
 (809, 'scabbard'),
 (810, 'sunscreen'),
 (811, 'beer glass'),
 (812, 'hen-of-the-woods'),
 (813, 'guacamole'),
 (814, 'lampshade'),
 (815, 'wool'),
 (816, 'hay'),
 (817, 'bow tie'),
 (818, 'mailbag'),
 (819, 'water jug'),
 (820, 'bucket'),
 (821, 'dishrag'),
 (822, 'soup bowl'),
 (823, 'eggnog'),
 (824, 'mortar'),
 (825, 'trench coat'),
 (826, 'paddle'),
 (827, 'chain'),
 (828, 'swab'),
 (829, 'mixing bowl'),
 (830, 'potpie'),
 (831, 'wine bottle'),
 (832, 'shoji'),
 (833, 'bulletproof vest'),
 (834, 'drilling platform'),
 (835, 'binder'),
 (836, 'cardigan'),
 (837, 'sweatshirt'),
 (838, 'pot'),
 (839, 'birdhouse'),
 (840, 'hamper'),
 (841, 'ping-pong ball'),
 (842, 'pencil box'),
 (843, 'pay-phone'),
 (844, 'consomme'),
 (845, 'apron'),
 (846, 'punching bag'),
 (847, 'backpack'),
 (848, 'groom'),
 (849, 'bearskin'),
 (850, 'pencil sharpener'),
 (851, 'broom'),
 (852, 'mosquito net'),
 (853, 'abaya'),
 (854, 'mortarboard'),
 (855, 'poncho'),
 (856, 'crutch'),
 (857, 'Polaroid camera'),
 (858, 'space bar'),
 (859, 'cup'),
 (860, 'racket'),
 (861, 'traffic light'),
 (862, 'quill'),
 (863, 'radio'),
 (864, 'dough'),
 (865, 'cuirass'),
 (866, 'military uniform'),
 (867, 'lipstick'),
 (868, 'shower cap'),
 (869, 'monitor'),
 (870, 'oscilloscope'),
 (871, 'mitten'),
 (872, 'brassiere'),
 (873, 'French loaf'),
 (874, 'vase'),
 (875, 'milk can'),
 (876, 'rugby ball'),
 (877, 'paper towel'),
 (878, 'earthstar'),
 (879, 'envelope'),
 (880, 'miniskirt'),
 (881, 'cowboy hat'),
 (882, 'trolleybus'),
 (883, 'perfume'),
 (884, 'bathtub'),
 (885, 'hotdog'),
 (886, 'coral fungus'),
 (887, 'bullet train'),
 (888, 'pillow'),
 (889, 'toilet tissue'),
 (890, 'cassette'),
 (891, "carpenter's kit"),
 (892, 'ladle'),
 (893, 'stinkhorn'),
 (894, 'lotion'),
 (895, 'hair spray'),
 (896, 'academic gown'),
 (897, 'dome'),
 (898, 'crate'),
 (899, 'wig'),
 (900, 'burrito'),
 (901, 'pill bottle'),
 (902, 'chain mail'),
 (903, 'theater curtain'),
 (904, 'window shade'),
 (905, 'barrel'),
 (906, 'washbasin'),
 (907, 'ballpoint'),
 (908, 'basketball'),
 (909, 'bath towel'),
 (910, 'cowboy boot'),
 (911, 'gown'),
 (912, 'window screen'),
 (913, 'agaric'),
 (914, 'cellular telephone'),
 (915, 'nipple'),
 (916, 'barbell'),
 (917, 'mailbox'),
 (918, 'lab coat'),
 (919, 'fire screen'),
 (920, 'minibus'),
 (921, 'packet'),
 (922, 'maze'),
 (923, 'pole'),
 (924, 'horizontal bar'),
 (925, 'sombrero'),
 (926, 'pickelhaube'),
 (927, 'rain barrel'),
 (928, 'wallet'),
 (929, 'cassette player'),
 (930, 'comic book'),
 (931, 'piggy bank'),
 (932, 'street sign'),
 (933, 'bell cote'),
 (934, 'fountain pen'),
 (935, 'Windsor tie'),
 (936, 'volleyball'),
 (937, 'overskirt'),
 (938, 'sarong'),
 (939, 'purse'),
 (940, 'bolo tie'),
 (941, 'bib'),
 (942, 'parachute'),
 (943, 'sleeping bag'),
 (944, 'television'),
 (945, 'swimming trunks'),
 (946, 'measuring cup'),
 (947, 'espresso'),
 (948, 'pizza'),
 (949, 'breastplate'),
 (950, 'shopping basket'),
 (951, 'wooden spoon'),
 (952, 'saltshaker'),
 (953, 'chocolate sauce'),
 (954, 'ballplayer'),
 (955, 'goblet'),
 (956, 'gyromitra'),
 (957, 'stretcher'),
 (958, 'water bottle'),
 (959, 'dial telephone'),
 (960, 'soap dispenser'),
 (961, 'jersey'),
 (962, 'school bus'),
 (963, 'jigsaw puzzle'),
 (964, 'plastic bag'),
 (965, 'reflex camera'),
 (966, 'diaper'),
 (967, 'Band Aid'),
 (968, 'ice lolly'),
 (969, 'velvet'),
 (970, 'tennis ball'),
 (971, 'gasmask'),
 (972, 'doormat'),
 (973, 'Loafer'),
 (974, 'ice cream'),
 (975, 'pretzel'),
 (976, 'quilt'),
 (977, 'maillot'),
 (978, 'tape player'),
 (979, 'clog'),
 (980, 'iPod'),
 (981, 'bolete'),
 (982, 'scuba diver'),
 (983, 'pitcher'),
 (984, 'matchstick'),
 (985, 'bikini'),
 (986, 'sock'),
 (987, 'CD player'),
 (988, 'lens cap'),
 (989, 'thatch'),
 (990, 'vault'),
 (991, 'beaker'),
 (992, 'bubble'),
 (993, 'cheeseburger'),
 (994, 'parallel bars'),
 (995, 'flagpole'),
 (996, 'coffee mug'),
 (997, 'rubber eraser'),
 (998, 'stole'),
 (999, 'carbonara'),
 ...]

<TODO: visual of graph>

Let's have a look at the graph:


In [8]:
g = tf.get_default_graph()
names = [op.name for op in g.get_operations()]
print(names)


['inception/input', 'inception/conv2d0_w', 'inception/conv2d0_b', 'inception/conv2d1_w', 'inception/conv2d1_b', 'inception/conv2d2_w', 'inception/conv2d2_b', 'inception/mixed3a_1x1_w', 'inception/mixed3a_1x1_b', 'inception/mixed3a_3x3_bottleneck_w', 'inception/mixed3a_3x3_bottleneck_b', 'inception/mixed3a_3x3_w', 'inception/mixed3a_3x3_b', 'inception/mixed3a_5x5_bottleneck_w', 'inception/mixed3a_5x5_bottleneck_b', 'inception/mixed3a_5x5_w', 'inception/mixed3a_5x5_b', 'inception/mixed3a_pool_reduce_w', 'inception/mixed3a_pool_reduce_b', 'inception/mixed3b_1x1_w', 'inception/mixed3b_1x1_b', 'inception/mixed3b_3x3_bottleneck_w', 'inception/mixed3b_3x3_bottleneck_b', 'inception/mixed3b_3x3_w', 'inception/mixed3b_3x3_b', 'inception/mixed3b_5x5_bottleneck_w', 'inception/mixed3b_5x5_bottleneck_b', 'inception/mixed3b_5x5_w', 'inception/mixed3b_5x5_b', 'inception/mixed3b_pool_reduce_w', 'inception/mixed3b_pool_reduce_b', 'inception/mixed4a_1x1_w', 'inception/mixed4a_1x1_b', 'inception/mixed4a_3x3_bottleneck_w', 'inception/mixed4a_3x3_bottleneck_b', 'inception/mixed4a_3x3_w', 'inception/mixed4a_3x3_b', 'inception/mixed4a_5x5_bottleneck_w', 'inception/mixed4a_5x5_bottleneck_b', 'inception/mixed4a_5x5_w', 'inception/mixed4a_5x5_b', 'inception/mixed4a_pool_reduce_w', 'inception/mixed4a_pool_reduce_b', 'inception/mixed4b_1x1_w', 'inception/mixed4b_1x1_b', 'inception/mixed4b_3x3_bottleneck_w', 'inception/mixed4b_3x3_bottleneck_b', 'inception/mixed4b_3x3_w', 'inception/mixed4b_3x3_b', 'inception/mixed4b_5x5_bottleneck_w', 'inception/mixed4b_5x5_bottleneck_b', 'inception/mixed4b_5x5_w', 'inception/mixed4b_5x5_b', 'inception/mixed4b_pool_reduce_w', 'inception/mixed4b_pool_reduce_b', 'inception/mixed4c_1x1_w', 'inception/mixed4c_1x1_b', 'inception/mixed4c_3x3_bottleneck_w', 'inception/mixed4c_3x3_bottleneck_b', 'inception/mixed4c_3x3_w', 'inception/mixed4c_3x3_b', 'inception/mixed4c_5x5_bottleneck_w', 'inception/mixed4c_5x5_bottleneck_b', 'inception/mixed4c_5x5_w', 'inception/mixed4c_5x5_b', 'inception/mixed4c_pool_reduce_w', 'inception/mixed4c_pool_reduce_b', 'inception/mixed4d_1x1_w', 'inception/mixed4d_1x1_b', 'inception/mixed4d_3x3_bottleneck_w', 'inception/mixed4d_3x3_bottleneck_b', 'inception/mixed4d_3x3_w', 'inception/mixed4d_3x3_b', 'inception/mixed4d_5x5_bottleneck_w', 'inception/mixed4d_5x5_bottleneck_b', 'inception/mixed4d_5x5_w', 'inception/mixed4d_5x5_b', 'inception/mixed4d_pool_reduce_w', 'inception/mixed4d_pool_reduce_b', 'inception/mixed4e_1x1_w', 'inception/mixed4e_1x1_b', 'inception/mixed4e_3x3_bottleneck_w', 'inception/mixed4e_3x3_bottleneck_b', 'inception/mixed4e_3x3_w', 'inception/mixed4e_3x3_b', 'inception/mixed4e_5x5_bottleneck_w', 'inception/mixed4e_5x5_bottleneck_b', 'inception/mixed4e_5x5_w', 'inception/mixed4e_5x5_b', 'inception/mixed4e_pool_reduce_w', 'inception/mixed4e_pool_reduce_b', 'inception/mixed5a_1x1_w', 'inception/mixed5a_1x1_b', 'inception/mixed5a_3x3_bottleneck_w', 'inception/mixed5a_3x3_bottleneck_b', 'inception/mixed5a_3x3_w', 'inception/mixed5a_3x3_b', 'inception/mixed5a_5x5_bottleneck_w', 'inception/mixed5a_5x5_bottleneck_b', 'inception/mixed5a_5x5_w', 'inception/mixed5a_5x5_b', 'inception/mixed5a_pool_reduce_w', 'inception/mixed5a_pool_reduce_b', 'inception/mixed5b_1x1_w', 'inception/mixed5b_1x1_b', 'inception/mixed5b_3x3_bottleneck_w', 'inception/mixed5b_3x3_bottleneck_b', 'inception/mixed5b_3x3_w', 'inception/mixed5b_3x3_b', 'inception/mixed5b_5x5_bottleneck_w', 'inception/mixed5b_5x5_bottleneck_b', 'inception/mixed5b_5x5_w', 'inception/mixed5b_5x5_b', 'inception/mixed5b_pool_reduce_w', 'inception/mixed5b_pool_reduce_b', 'inception/head0_bottleneck_w', 'inception/head0_bottleneck_b', 'inception/nn0_w', 'inception/nn0_b', 'inception/softmax0_w', 'inception/softmax0_b', 'inception/head1_bottleneck_w', 'inception/head1_bottleneck_b', 'inception/nn1_w', 'inception/nn1_b', 'inception/softmax1_w', 'inception/softmax1_b', 'inception/softmax2_w', 'inception/softmax2_b', 'inception/conv2d0_pre_relu/conv', 'inception/conv2d0_pre_relu', 'inception/conv2d0', 'inception/maxpool0', 'inception/localresponsenorm0', 'inception/conv2d1_pre_relu/conv', 'inception/conv2d1_pre_relu', 'inception/conv2d1', 'inception/conv2d2_pre_relu/conv', 'inception/conv2d2_pre_relu', 'inception/conv2d2', 'inception/localresponsenorm1', 'inception/maxpool1', 'inception/mixed3a_1x1_pre_relu/conv', 'inception/mixed3a_1x1_pre_relu', 'inception/mixed3a_1x1', 'inception/mixed3a_3x3_bottleneck_pre_relu/conv', 'inception/mixed3a_3x3_bottleneck_pre_relu', 'inception/mixed3a_3x3_bottleneck', 'inception/mixed3a_3x3_pre_relu/conv', 'inception/mixed3a_3x3_pre_relu', 'inception/mixed3a_3x3', 'inception/mixed3a_5x5_bottleneck_pre_relu/conv', 'inception/mixed3a_5x5_bottleneck_pre_relu', 'inception/mixed3a_5x5_bottleneck', 'inception/mixed3a_5x5_pre_relu/conv', 'inception/mixed3a_5x5_pre_relu', 'inception/mixed3a_5x5', 'inception/mixed3a_pool', 'inception/mixed3a_pool_reduce_pre_relu/conv', 'inception/mixed3a_pool_reduce_pre_relu', 'inception/mixed3a_pool_reduce', 'inception/mixed3a/concat_dim', 'inception/mixed3a', 'inception/mixed3b_1x1_pre_relu/conv', 'inception/mixed3b_1x1_pre_relu', 'inception/mixed3b_1x1', 'inception/mixed3b_3x3_bottleneck_pre_relu/conv', 'inception/mixed3b_3x3_bottleneck_pre_relu', 'inception/mixed3b_3x3_bottleneck', 'inception/mixed3b_3x3_pre_relu/conv', 'inception/mixed3b_3x3_pre_relu', 'inception/mixed3b_3x3', 'inception/mixed3b_5x5_bottleneck_pre_relu/conv', 'inception/mixed3b_5x5_bottleneck_pre_relu', 'inception/mixed3b_5x5_bottleneck', 'inception/mixed3b_5x5_pre_relu/conv', 'inception/mixed3b_5x5_pre_relu', 'inception/mixed3b_5x5', 'inception/mixed3b_pool', 'inception/mixed3b_pool_reduce_pre_relu/conv', 'inception/mixed3b_pool_reduce_pre_relu', 'inception/mixed3b_pool_reduce', 'inception/mixed3b/concat_dim', 'inception/mixed3b', 'inception/maxpool4', 'inception/mixed4a_1x1_pre_relu/conv', 'inception/mixed4a_1x1_pre_relu', 'inception/mixed4a_1x1', 'inception/mixed4a_3x3_bottleneck_pre_relu/conv', 'inception/mixed4a_3x3_bottleneck_pre_relu', 'inception/mixed4a_3x3_bottleneck', 'inception/mixed4a_3x3_pre_relu/conv', 'inception/mixed4a_3x3_pre_relu', 'inception/mixed4a_3x3', 'inception/mixed4a_5x5_bottleneck_pre_relu/conv', 'inception/mixed4a_5x5_bottleneck_pre_relu', 'inception/mixed4a_5x5_bottleneck', 'inception/mixed4a_5x5_pre_relu/conv', 'inception/mixed4a_5x5_pre_relu', 'inception/mixed4a_5x5', 'inception/mixed4a_pool', 'inception/mixed4a_pool_reduce_pre_relu/conv', 'inception/mixed4a_pool_reduce_pre_relu', 'inception/mixed4a_pool_reduce', 'inception/mixed4a/concat_dim', 'inception/mixed4a', 'inception/mixed4b_1x1_pre_relu/conv', 'inception/mixed4b_1x1_pre_relu', 'inception/mixed4b_1x1', 'inception/mixed4b_3x3_bottleneck_pre_relu/conv', 'inception/mixed4b_3x3_bottleneck_pre_relu', 'inception/mixed4b_3x3_bottleneck', 'inception/mixed4b_3x3_pre_relu/conv', 'inception/mixed4b_3x3_pre_relu', 'inception/mixed4b_3x3', 'inception/mixed4b_5x5_bottleneck_pre_relu/conv', 'inception/mixed4b_5x5_bottleneck_pre_relu', 'inception/mixed4b_5x5_bottleneck', 'inception/mixed4b_5x5_pre_relu/conv', 'inception/mixed4b_5x5_pre_relu', 'inception/mixed4b_5x5', 'inception/mixed4b_pool', 'inception/mixed4b_pool_reduce_pre_relu/conv', 'inception/mixed4b_pool_reduce_pre_relu', 'inception/mixed4b_pool_reduce', 'inception/mixed4b/concat_dim', 'inception/mixed4b', 'inception/mixed4c_1x1_pre_relu/conv', 'inception/mixed4c_1x1_pre_relu', 'inception/mixed4c_1x1', 'inception/mixed4c_3x3_bottleneck_pre_relu/conv', 'inception/mixed4c_3x3_bottleneck_pre_relu', 'inception/mixed4c_3x3_bottleneck', 'inception/mixed4c_3x3_pre_relu/conv', 'inception/mixed4c_3x3_pre_relu', 'inception/mixed4c_3x3', 'inception/mixed4c_5x5_bottleneck_pre_relu/conv', 'inception/mixed4c_5x5_bottleneck_pre_relu', 'inception/mixed4c_5x5_bottleneck', 'inception/mixed4c_5x5_pre_relu/conv', 'inception/mixed4c_5x5_pre_relu', 'inception/mixed4c_5x5', 'inception/mixed4c_pool', 'inception/mixed4c_pool_reduce_pre_relu/conv', 'inception/mixed4c_pool_reduce_pre_relu', 'inception/mixed4c_pool_reduce', 'inception/mixed4c/concat_dim', 'inception/mixed4c', 'inception/mixed4d_1x1_pre_relu/conv', 'inception/mixed4d_1x1_pre_relu', 'inception/mixed4d_1x1', 'inception/mixed4d_3x3_bottleneck_pre_relu/conv', 'inception/mixed4d_3x3_bottleneck_pre_relu', 'inception/mixed4d_3x3_bottleneck', 'inception/mixed4d_3x3_pre_relu/conv', 'inception/mixed4d_3x3_pre_relu', 'inception/mixed4d_3x3', 'inception/mixed4d_5x5_bottleneck_pre_relu/conv', 'inception/mixed4d_5x5_bottleneck_pre_relu', 'inception/mixed4d_5x5_bottleneck', 'inception/mixed4d_5x5_pre_relu/conv', 'inception/mixed4d_5x5_pre_relu', 'inception/mixed4d_5x5', 'inception/mixed4d_pool', 'inception/mixed4d_pool_reduce_pre_relu/conv', 'inception/mixed4d_pool_reduce_pre_relu', 'inception/mixed4d_pool_reduce', 'inception/mixed4d/concat_dim', 'inception/mixed4d', 'inception/mixed4e_1x1_pre_relu/conv', 'inception/mixed4e_1x1_pre_relu', 'inception/mixed4e_1x1', 'inception/mixed4e_3x3_bottleneck_pre_relu/conv', 'inception/mixed4e_3x3_bottleneck_pre_relu', 'inception/mixed4e_3x3_bottleneck', 'inception/mixed4e_3x3_pre_relu/conv', 'inception/mixed4e_3x3_pre_relu', 'inception/mixed4e_3x3', 'inception/mixed4e_5x5_bottleneck_pre_relu/conv', 'inception/mixed4e_5x5_bottleneck_pre_relu', 'inception/mixed4e_5x5_bottleneck', 'inception/mixed4e_5x5_pre_relu/conv', 'inception/mixed4e_5x5_pre_relu', 'inception/mixed4e_5x5', 'inception/mixed4e_pool', 'inception/mixed4e_pool_reduce_pre_relu/conv', 'inception/mixed4e_pool_reduce_pre_relu', 'inception/mixed4e_pool_reduce', 'inception/mixed4e/concat_dim', 'inception/mixed4e', 'inception/maxpool10', 'inception/mixed5a_1x1_pre_relu/conv', 'inception/mixed5a_1x1_pre_relu', 'inception/mixed5a_1x1', 'inception/mixed5a_3x3_bottleneck_pre_relu/conv', 'inception/mixed5a_3x3_bottleneck_pre_relu', 'inception/mixed5a_3x3_bottleneck', 'inception/mixed5a_3x3_pre_relu/conv', 'inception/mixed5a_3x3_pre_relu', 'inception/mixed5a_3x3', 'inception/mixed5a_5x5_bottleneck_pre_relu/conv', 'inception/mixed5a_5x5_bottleneck_pre_relu', 'inception/mixed5a_5x5_bottleneck', 'inception/mixed5a_5x5_pre_relu/conv', 'inception/mixed5a_5x5_pre_relu', 'inception/mixed5a_5x5', 'inception/mixed5a_pool', 'inception/mixed5a_pool_reduce_pre_relu/conv', 'inception/mixed5a_pool_reduce_pre_relu', 'inception/mixed5a_pool_reduce', 'inception/mixed5a/concat_dim', 'inception/mixed5a', 'inception/mixed5b_1x1_pre_relu/conv', 'inception/mixed5b_1x1_pre_relu', 'inception/mixed5b_1x1', 'inception/mixed5b_3x3_bottleneck_pre_relu/conv', 'inception/mixed5b_3x3_bottleneck_pre_relu', 'inception/mixed5b_3x3_bottleneck', 'inception/mixed5b_3x3_pre_relu/conv', 'inception/mixed5b_3x3_pre_relu', 'inception/mixed5b_3x3', 'inception/mixed5b_5x5_bottleneck_pre_relu/conv', 'inception/mixed5b_5x5_bottleneck_pre_relu', 'inception/mixed5b_5x5_bottleneck', 'inception/mixed5b_5x5_pre_relu/conv', 'inception/mixed5b_5x5_pre_relu', 'inception/mixed5b_5x5', 'inception/mixed5b_pool', 'inception/mixed5b_pool_reduce_pre_relu/conv', 'inception/mixed5b_pool_reduce_pre_relu', 'inception/mixed5b_pool_reduce', 'inception/mixed5b/concat_dim', 'inception/mixed5b', 'inception/avgpool0', 'inception/head0_pool', 'inception/head0_bottleneck_pre_relu/conv', 'inception/head0_bottleneck_pre_relu', 'inception/head0_bottleneck', 'inception/head0_bottleneck/reshape/shape', 'inception/head0_bottleneck/reshape', 'inception/nn0_pre_relu/matmul', 'inception/nn0_pre_relu', 'inception/nn0', 'inception/nn0/reshape/shape', 'inception/nn0/reshape', 'inception/softmax0_pre_activation/matmul', 'inception/softmax0_pre_activation', 'inception/softmax0', 'inception/head1_pool', 'inception/head1_bottleneck_pre_relu/conv', 'inception/head1_bottleneck_pre_relu', 'inception/head1_bottleneck', 'inception/head1_bottleneck/reshape/shape', 'inception/head1_bottleneck/reshape', 'inception/nn1_pre_relu/matmul', 'inception/nn1_pre_relu', 'inception/nn1', 'inception/nn1/reshape/shape', 'inception/nn1/reshape', 'inception/softmax1_pre_activation/matmul', 'inception/softmax1_pre_activation', 'inception/softmax1', 'inception/avgpool0/reshape/shape', 'inception/avgpool0/reshape', 'inception/softmax2_pre_activation/matmul', 'inception/softmax2_pre_activation', 'inception/softmax2', 'inception/output', 'inception/output1', 'inception/output2']

The input to the graph is stored in the first tensor output, and the probability of the 1000 possible objects is in the last layer:


In [9]:
input_name = names[0] + ':0'
x = g.get_tensor_by_name(input_name)

In [10]:
softmax = g.get_tensor_by_name(names[-1] + ':0')

Predicting with the Inception Network

Let's try to use the network to predict now:


In [11]:
from skimage.data import coffee
og = coffee()
plt.imshow(og)
print(og.min(), og.max())


0 255

We'll crop and resize the image to 224 x 224 pixels. I've provided a simple helper function which will do this for us:


In [12]:
# Note that in the lecture, I used a slightly different inception
# model, and this one requires us to subtract the mean from the input image.
# The preprocess function will also crop/resize the image to 299x299
img = inception.preprocess(og)
print(og.shape), print(img.shape)


(400, 600, 3)
(299, 299, 3)
Out[12]:
(None, None)

In [13]:
# So this will now be a different range than what we had in the lecture:
print(img.min(), img.max())


-117.0 138.0

As we've seen from the last session, our images must be shaped as a 4-dimensional shape describing the number of images, height, width, and number of channels. So our original 3-dimensional image of height, width, channels needs an additional dimension on the 0th axis.


In [14]:
img_4d = img[np.newaxis]
print(img_4d.shape)


(1, 299, 299, 3)

In [15]:
fig, axs = plt.subplots(1, 2)
axs[0].imshow(og)

# Note that unlike the lecture, we have to call the `inception.deprocess` function
# so that it adds back the mean!
axs[1].imshow(inception.deprocess(img))


Out[15]:
<matplotlib.image.AxesImage at 0x13ef9fcc0>

In [16]:
res = np.squeeze(softmax.eval(feed_dict={x: img_4d}))

In [17]:
# Note that this network is slightly different than the one used in the lecture.
# Instead of just 1 output, there will be 16 outputs of 1008 probabilities.
# We only use the first 1000 probabilities (the extra ones are for negative/unseen labels)
res.shape


Out[17]:
(16, 1008)

The result of the network is a 1000 element vector, with probabilities of each class. Inside our net dictionary are the labels for every element. We can sort these and use the labels of the 1000 classes to see what the top 5 predicted probabilities and labels are:


In [18]:
# Note that this is one way to aggregate the different probabilities.  We could also
# take the argmax.
res = np.mean(res, 0)
res = res / np.sum(res)

In [19]:
print([(res[idx], net['labels'][idx])
       for idx in res.argsort()[-5:][::-1]])


[(0.99849206, (947, 'espresso')), (0.000631253, (859, 'cup')), (0.00050241494, (953, 'chocolate sauce')), (0.00019483207, (844, 'consomme')), (0.00013370356, (822, 'soup bowl'))]

Visualizing Filters

Wow so it works! But how!? Well that's an ongoing research question. There has been a lot of great developments in the last few years to help us understand what might be happening. Let's try to first visualize the weights of the convolution filters, like we've done with our MNIST network before.


In [20]:
W = g.get_tensor_by_name('inception/conv2d0_w:0')
W_eval = W.eval()
print(W_eval.shape)


(7, 7, 3, 64)

With MNIST, our input number of filters was 1, since our input number of channels was also 1, as all of MNIST is grayscale. But in this case, our input number of channels is 3, and so the input number of convolution filters is also 3. We can try to see every single individual filter using the library tool I've provided:


In [21]:
from libs import utils
W_montage = utils.montage_filters(W_eval)
plt.figure(figsize=(10,10))
plt.imshow(W_montage, interpolation='nearest')


Out[21]:
<matplotlib.image.AxesImage at 0x1379950b8>

Or, we can also try to look at them as RGB filters, showing the influence of each color channel, for each neuron or output filter.


In [22]:
Ws = [utils.montage_filters(W_eval[:, :, [i], :]) for i in range(3)]
Ws = np.rollaxis(np.array(Ws), 0, 3)
plt.figure(figsize=(10,10))
plt.imshow(Ws, interpolation='nearest')


Out[22]:
<matplotlib.image.AxesImage at 0x143c37550>

In order to better see what these are doing, let's normalize the filters range:


In [23]:
np.min(Ws), np.max(Ws)
Ws = (Ws / np.max(np.abs(Ws)) * 128 + 128).astype(np.uint8)
plt.figure(figsize=(10,10))
plt.imshow(Ws, interpolation='nearest')


Out[23]:
<matplotlib.image.AxesImage at 0x14408d710>

Like with our MNIST example, we can probably guess what some of these are doing. They are responding to edges, corners, and center-surround or some kind of contrast of two things, like red, green, blue yellow, which interestingly is also what neuroscience of vision tells us about how the human vision identifies color, which is through opponency of red/green and blue/yellow. To get a better sense, we can try to look at the output of the convolution:


In [24]:
feature = g.get_tensor_by_name('inception/conv2d0_pre_relu:0')

Let's look at the shape:


In [25]:
layer_shape = tf.shape(feature).eval(feed_dict={x:img_4d})
print(layer_shape)


[  1 150 150  64]

So our original image which was 1 x 224 x 224 x 3 color channels, now has 64 new channels of information. The image's height and width are also halved, because of the stride of 2 in the convolution. We've just seen what each of the convolution filters look like. Let's try to see how they filter the image now by looking at the resulting convolution.


In [26]:
f = feature.eval(feed_dict={x: img_4d})
montage = utils.montage_filters(np.rollaxis(np.expand_dims(f[0], 3), 3, 2))
fig, axs = plt.subplots(1, 3, figsize=(20, 10))
axs[0].imshow(inception.deprocess(img))
axs[0].set_title('Original Image')
axs[1].imshow(Ws, interpolation='nearest')
axs[1].set_title('Convolution Filters')
axs[2].imshow(montage, cmap='gray')
axs[2].set_title('Convolution Outputs')


Out[26]:
<matplotlib.text.Text at 0x1406f3cc0>

It's a little hard to see what's happening here but let's try. The third filter for instance seems to be a lot like the gabor filter we created in the first session. It respond to horizontal edges, since it has a bright component at the top, and a dark component on the bottom. Looking at the output of the convolution, we can see that the horizontal edges really pop out.

Visualizing the Gradient

So this is a pretty useful technique for the first convolution layer. But when we get to the next layer, all of sudden we have 64 different channels of information being fed to more convolution filters of some very high dimensions. It's very hard to conceptualize that many dimensions, let alone also try and figure out what it could be doing with all the possible combinations it has with other neurons in other layers.

If we want to understand what the deeper layers are really doing, we're going to have to start to use backprop to show us the gradients of a particular neuron with respect to our input image. Let's visualize the network's gradient activation when backpropagated to the original input image. This is effectively telling us which pixels are responding to the predicted class or given neuron.

We use a forward pass up to the layer that we are interested in, and then a backprop to help us understand what pixels in particular contributed to the final activation of that layer. We will need to create an operation which will find the max neuron of all activations in a layer, and then calculate the gradient of that objective with respect to the input image.


In [27]:
feature = g.get_tensor_by_name('inception/conv2d0_pre_relu:0')
gradient = tf.gradients(tf.reduce_max(feature, 3), x)

When we run this network now, we will specify the gradient operation we've created, instead of the softmax layer of the network. This will run a forward prop up to the layer we asked to find the gradient with, and then run a back prop all the way to the input image.


In [28]:
res = sess.run(gradient, feed_dict={x: img_4d})[0]

Let's visualize the original image and the output of the backpropagated gradient:


In [29]:
fig, axs = plt.subplots(1, 2)
axs[0].imshow(inception.deprocess(img))
axs[1].imshow(res[0])


Out[29]:
<matplotlib.image.AxesImage at 0x146bc2940>

Well that looks like a complete mess! What we can do is normalize the activations in a way that let's us see it more in terms of the normal range of color values.


In [30]:
def normalize(img, s=0.1):
    '''Normalize the image range for visualization'''
    z = img / np.std(img)
    return np.uint8(np.clip(
        (z - z.mean()) / max(z.std(), 1e-4) * s + 0.5,
        0, 1) * 255)

In [31]:
r = normalize(res)
fig, axs = plt.subplots(1, 2)
axs[0].imshow(inception.deprocess(img))
axs[1].imshow(r[0])


Out[31]:
<matplotlib.image.AxesImage at 0x14fc042b0>

Much better! This sort of makes sense! There are some strong edges and we can really see what colors are changing along those edges.

We can try within individual layers as well, pulling out individual neurons to see what each of them are responding to. Let's first create a few functions which will help us visualize a single neuron in a layer, and every neuron of a layer:


In [32]:
def compute_gradient(input_placeholder, img, layer_name, neuron_i):
    feature = g.get_tensor_by_name(layer_name)
    gradient = tf.gradients(tf.reduce_mean(feature[:, :, :, neuron_i]), x)
    res = sess.run(gradient, feed_dict={input_placeholder: img})[0]
    return res

def compute_gradients(input_placeholder, img, layer_name):
    feature = g.get_tensor_by_name(layer_name)
    layer_shape = tf.shape(feature).eval(feed_dict={input_placeholder: img})
    gradients = []
    for neuron_i in range(layer_shape[-1]):
        gradients.append(compute_gradient(input_placeholder, img, layer_name, neuron_i))
    return gradients

Now we can pass in a layer name, and see the gradient of every neuron in that layer with respect to the input image as a montage. Let's try the second convolutional layer. This can take awhile depending on your computer:


In [33]:
gradients = compute_gradients(x, img_4d, 'inception/conv2d1_pre_relu:0')
gradients_norm = [normalize(gradient_i[0]) for gradient_i in gradients]
montage = utils.montage(np.array(gradients_norm))

In [34]:
plt.figure(figsize=(12, 12))
plt.imshow(montage)


Out[34]:
<matplotlib.image.AxesImage at 0x14139f2e8>

So it's clear that each neuron is responding to some type of feature. It looks like a lot of them are interested in the texture of the cup, and seem to respond in different ways across the image. Some seem to be more interested in the shape of the cup, responding pretty strongly to the circular opening, while others seem to catch the liquid in the cup more. There even seems to be one that just responds to the spoon, and another which responds to only the plate.

Let's try to get a sense of how the activations in each layer progress. We can get every max pooling layer like so:


In [35]:
features = [name for name in names if 'maxpool' in name.split()[-1]]
print(features)


['inception/maxpool0', 'inception/maxpool1', 'inception/maxpool4', 'inception/maxpool10']

So I didn't mention what max pooling is. But it 's a simple operation. You can think of it like a convolution, except instead of using a learned kernel, it will just find the maximum value in the window, for performing "max pooling", or find the average value, for performing "average pooling".

We'll now loop over every feature and create an operation that first will find the maximally activated neuron. It will then find the sum of all activations across every pixel and input channel of this neuron, and then calculate its gradient with respect to the input image.


In [36]:
n_plots = len(features) + 1
fig, axs = plt.subplots(1, n_plots, figsize=(20, 5))
base = img_4d
axs[0].imshow(inception.deprocess(img))
for feature_i, featurename in enumerate(features):
    feature = g.get_tensor_by_name(featurename + ':0')
    neuron = tf.reduce_max(feature, len(feature.get_shape())-1)
    gradient = tf.gradients(tf.reduce_sum(neuron), x)
    this_res = sess.run(gradient[0], feed_dict={x: base})[0]
    axs[feature_i+1].imshow(normalize(this_res))
    axs[feature_i+1].set_title(featurename)


To really understand what's happening in these later layers, we're going to have to experiment with some other visualization techniques.

Deep Dreaming

Sometime in May of 2015, A researcher at Google, Alexander Mordvintsev, took a deep network meant to recognize objects in an image, and instead used it to *generate new objects in an image. The internet quickly exploded after seeing one of the images it produced. Soon after, Google posted a blog entry on how to perform the technique they re-dubbed "Inceptionism", <TODO: cut to blog and scroll> and tons of interesting outputs were soon created. Somehow the name Deep Dreaming caught on, and tons of new creative applications came out, from twitter bots (DeepForger), to streaming television (twitch.tv), to apps, it was soon everywhere.

What Deep Dreaming is doing is taking the backpropagated gradient activations and simply adding it back to the image, running the same process again and again in a loop. I think "dreaming" is a great description of what's going on. We're really pushing the network in a direction, and seeing what happens when left to its devices. What it is effectively doing is amplifying whatever our objective is, but we get to see how that objective is optimized in the input space rather than deep in the network in some arbitrarily high dimensional space that no one can understand.

There are many tricks one can add to this idea, such as blurring, adding constraints on the total activations, decaying the gradient, infinitely zooming into the image by cropping and scaling, adding jitter by randomly moving the image around, or plenty of other ideas waiting to be explored.

Simplest Approach

Let's try the simplest approach for deep dream using a few of these layers. We're going to try the first max pooling layer to begin with. We'll specify our objective which is to follow the gradient of the mean of the selected layers's activation. What we should see is that same objective being amplified so that we can start to understand in terms of the input image what the mean activation of that layer tends to like, or respond to. We'll also produce a gif of every few frames. For the remainder of this section, we'll need to rescale our 0-255 range image to 0-1 as it will speed up things:


In [37]:
# Rescale to 0-1 range
img_4d = img_4d / np.max(img_4d)

# Get the max pool layer
layer = g.get_tensor_by_name('inception/maxpool0:0')

# Find the gradient of this layer's mean activation with respect to the input image
gradient = tf.gradients(tf.reduce_mean(layer), x)

# Copy the input image as we'll add the gradient to it in a loop
img_copy = img_4d.copy()

# We'll run it for 50 iterations
n_iterations = 50

# Think of this as our learning rate.  This is how much of the gradient we'll add to the input image
step = 1.0

# Every 10 iterations, we'll add an image to a GIF
gif_step = 10

# Storage for our GIF
imgs = []
for it_i in range(n_iterations):
    print(it_i, end=', ')

    # This will calculate the gradient of the layer we chose with respect to the input image.
    this_res = sess.run(gradient[0], feed_dict={x: img_copy})[0]

    # Let's normalize it by the maximum activation
    this_res /= (np.max(np.abs(this_res)) + 1e-8)

    # Then add it to the input image
    img_copy += this_res * step

    # And add to our gif
    if it_i % gif_step == 0:
        imgs.append(normalize(img_copy[0]))

# Build the gif
gif.build_gif(imgs, saveto='1-simplest-mean-layer.gif')


0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
Out[37]:
<matplotlib.animation.ArtistAnimation at 0x152121128>

In [38]:
ipyd.Image(url='1-simplest-mean-layer.gif', height=200, width=200)


Out[38]:

What we can see is pretty quickly, the activations tends to pick up the fine detailed edges of the cup, plate, and spoon. Their structure is very local, meaning they are really describing information at a very small scale.

We could also specify the maximal neuron's mean activation, instead of the mean of the entire layer:


In [39]:
# Find the maximal neuron in a layer
neuron = tf.reduce_max(layer, len(layer.get_shape())-1)
# Then find the mean over this neuron
gradient = tf.gradients(tf.reduce_mean(neuron), x)

The rest is exactly the same as before:


In [40]:
img_copy = img_4d.copy()
imgs = []
for it_i in range(n_iterations):
    print(it_i, end=', ')
    this_res = sess.run(gradient[0], feed_dict={x: img_copy})[0]
    this_res /= (np.max(np.abs(this_res)) + 1e-8)
    img_copy += this_res * step
    if it_i % gif_step == 0:
        imgs.append(normalize(img_copy[0]))
gif.build_gif(imgs, saveto='1-simplest-max-neuron.gif')


0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
Out[40]:
<matplotlib.animation.ArtistAnimation at 0x13fdb3dd8>

In [41]:
ipyd.Image(url='1-simplest-max-neuron.gif', height=200, width=200)


Out[41]:

What we should see here is how the maximal neuron in a layer's activation is slowly maximized through gradient ascent. So over time, we're increasing the overall activation of the neuron we asked for.

Let's try doing this for each of our max pool layers, in increasing depth, and let it run a little longer. This will take a long time depending on your machine!


In [42]:
# For each max pooling feature, we'll produce a GIF
for feature_i in features:
    layer = g.get_tensor_by_name(feature_i + ':0')
    gradient = tf.gradients(tf.reduce_mean(layer), x)
    img_copy = img_4d.copy()
    imgs = []
    for it_i in range(n_iterations):
        print(it_i, end=', ')
        this_res = sess.run(gradient[0], feed_dict={x: img_copy})[0]
        this_res /= (np.max(np.abs(this_res)) + 1e-8)
        img_copy += this_res * step
        if it_i % gif_step == 0:
            imgs.append(normalize(img_copy[0]))
    gif.build_gif(
        imgs, saveto='1-simplest-' + feature_i.split('/')[-1] + '.gif')


0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,