Lesson 8 Code Along

Lesson 8 Wiki | Notebook: pascal.ipynb



In [1]:

    
%matplotlib inline
%reload_ext autoreload
%autoreload 2



In [2]:

    
from fastai.conv_learner import *
from fastai.dataset import *

from pathlib import Path
import json
from PIL import ImageDraw, ImageFont
from matplotlib import patches, patheffects

1. Pascal VOC

We'll be looking at the Pascal VOC dataset. It's quite slow, so you may prefer to download from this mirror. There are two different competition/research datasets, from 2007 and 2012. We'll use the 2007 version. You can use the larger 2012 for better results, or even combine them (but be careful to avoid data leakage between the validation sets if you do this).

Unlike previous lessons, we're using the Python 3 standard library pathlib for our paths and file access. Note that it returns an OS-specific class (on Unix: PosixPath) so your output may look a little different. Most libraries that take paths as input can take a pathlib object - although some (like cv2) can't, in which case you can use str() to convert to a string.



In [6]:

    
PATH = Path('data/pascal/')
list(PATH.iterdir())









    Out[6]:





[PosixPath('data/pascal/tmp'),
 PosixPath('data/pascal/VOCdevkit'),
 PosixPath('data/pascal/pascal_train2012.json'),
 PosixPath('data/pascal/pascal_val2012.json'),
 PosixPath('data/pascal/models'),
 PosixPath('data/pascal/pascal_val2007.json'),
 PosixPath('data/pascal/pascal_train2007.json'),
 PosixPath('data/pascal/pascal_test2007.json')]

As well as images, there're also annotations - bounding boxes showing where each object is. These were hand labeled. The original versions were in XML, which is a little hard to work with nowadas, so we use the more recent JSON version.

pathlib includes the ability to open files, and much more.

Here we want to open JSON files that contain the bounding boxes and object classes. The fastest way to do this in Python is with the JSON library -- although there are Google versions for super-large files.



In [7]:

    
trn_j = json.load((PATH/'pascal_train2007.json').open())
trn_j.keys()









    Out[7]:





dict_keys(['images', 'type', 'annotations', 'categories'])

JSON - JavaScript Object Notation - is kind of a standard way to pass around hierarchical structured data now.



In [8]:

    
IMAGES, ANNOTATIONS, CATEGORIES = ['images', 'annotations', 'categories']
trn_j[IMAGES][:5]









    Out[8]:





[{'file_name': '000012.jpg', 'height': 333, 'width': 500, 'id': 12},
 {'file_name': '000017.jpg', 'height': 364, 'width': 480, 'id': 17},
 {'file_name': '000023.jpg', 'height': 500, 'width': 334, 'id': 23},
 {'file_name': '000026.jpg', 'height': 333, 'width': 500, 'id': 26},
 {'file_name': '000032.jpg', 'height': 281, 'width': 500, 'id': 32}]



In [9]:

    
trn_j[ANNOTATIONS][:2]









    Out[9]:





[{'segmentation': [[155, 96, 155, 270, 351, 270, 351, 96]],
  'area': 34104,
  'iscrowd': 0,
  'image_id': 12,
  'bbox': [155, 96, 196, 174],
  'category_id': 7,
  'id': 1,
  'ignore': 0},
 {'segmentation': [[184, 61, 184, 199, 279, 199, 279, 61]],
  'area': 13110,
  'iscrowd': 0,
  'image_id': 17,
  'bbox': [184, 61, 95, 138],
  'category_id': 15,
  'id': 2,
  'ignore': 0}]

Segmentation is Polygon Segmentation. We'll use bounding box.



In [10]:

    
trn_j[CATEGORIES][:4]









    Out[10]:





[{'supercategory': 'none', 'id': 1, 'name': 'aeroplane'},
 {'supercategory': 'none', 'id': 2, 'name': 'bicycle'},
 {'supercategory': 'none', 'id': 3, 'name': 'bird'},
 {'supercategory': 'none', 'id': 4, 'name': 'boat'}]

It's helpful to use constants instead of strings, since we get tab-completion and don't mistype.

We can turn this categories list from a dictionary of id -> name:



In [11]:

    
FILE_NAME, ID, IMG_ID, CAT_ID, BBOX = 'file_name', 'id', 'image_id', 'category_id', 'bbox'
cats = {o[ID]:o['name'] for o in trn_j[CATEGORIES]}
trn_fns = {o[ID]:o[FILE_NAME] for o in trn_j[IMAGES]}
trn_ids = [o[ID] for o in trn_j[IMAGES]]



In [12]:

    
list((PATH/'VOCdevkit'/'VOC2007').iterdir())









    Out[12]:





[PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/SegmentationObject'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/ImageSets'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/SegmentationClass'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/Annotations')]



In [13]:

    
JPEGS = 'VOCdevkit/VOC2007/JPEGImages'



In [14]:

    
IMG_PATH = PATH/JPEGS
list(IMG_PATH.iterdir())[:5]









    Out[14]:





[PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/007594.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/005682.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/005016.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/001930.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/007666.jpg')]

Each image has a unique ID



In [15]:

    
im0_d = trn_j[IMAGES][0]
im0_d[FILE_NAME], im0_d[ID]









    Out[15]:





('000012.jpg', 12)

A defaultdict is useful any time you want to have a default dictionary entry for new keys. Here we create a dict from image IDs to a list of annotations (tuple of bounding box and class id).

We convert VOC's height/width into top-left/bot-right, and witch x/y coords to be consistent with NumPy.

The idea here is to create a dictionary where the key is the image id, and the value is the list of all its annotations. So: go through each of the annotations; if it doesn't say to ignore it: append its bounding-box and class to the appropriate dictionary item (where that dictionary item is a list).

But if that dictionary item doesnt exist yet then there's no list to append to. collections.defaultdict which behaves just like a regular dictionary, except that if you try to access a key that does not exist, it will make that key exist with the default value of a function you specify -- in this case: lambda: []

NOTE that the dimensions are reversed in hw_bb. This is because CV usually uses W,H whereas Mathematics uses R,C. Width x Height vs Rows x Columns. FastAI follows the NumPy/PyTorch way of RxC. FastAI also uses Top-Left;Bottom-Right coordinates, instead of Top-Left;Heigh,Width.



In [16]:

    
def hw_bb(bb): return np.array([bb[1], bb[0], bb[3]+bb[1]-1, bb[2]+bb[0]-1])

trn_anno = collections.defaultdict(lambda:[])
for o in trn_j[ANNOTATIONS]:
    if not o['ignore']:
        bb = o[BBOX]
        bb = hw_bb(bb)
        trn_anno[o[IMG_ID]].append((bb, o[CAT_ID]))
        
len(trn_anno)









    Out[16]:





2501

Now we have a dictionary of filenames -> tuple(bounding_box_coords, class_id)



In [17]:

    
im_a = trn_anno[im0_d[ID]]; im_a









    Out[17]:





[(array([ 96, 155, 269, 350]), 7)]



In [18]:

    
im0_a = im_a[0]; im0_a









    Out[18]:





(array([ 96, 155, 269, 350]), 7)



In [19]:

    
cats[7]









    Out[19]:





'car'



In [20]:

    
trn_anno[17]









    Out[20]:





[(array([ 61, 184, 198, 278]), 15), (array([ 77,  89, 335, 402]), 13)]



In [21]:

    
cats[15], cats[13]









    Out[21]:





('person', 'horse')

Some libs take VOC format bounding boxes, so this let's us convert back when required:



In [22]:

    
def bb_hw(a): return np.array([a[1], a[0], a[3]-a[1]+1, a[2]-a[0]+1])



In [23]:

    
bb_voc = [155, 96, 196, 174]
bb_fastai = hw_bb(bb_voc)



In [24]:

    
f'expected: {bb_voc}, actual: {bb_hw(bb_fastai)}'









    Out[24]:





'expected: [155, 96, 196, 174], actual: [155  96 196 174]'

YOu can use Visual Studio Code (vscode - open source editor that comes with recent versions of Anaconda, or can be installed seperately), or mose editors and IDEs, to find out all about the open_image function. csvode things to know:

Command palette (Ctrl-shift-p)
Select interpreter (for fastai env)
Select terminal shell
Go to symbol (Ctrl-t)
Find references (Shift-F12)
Go to definition (F12)
Go back (alt-left)
View documentation
Hide sidebar (Ctrl-b)
Zen mode (Ctrl-k,z)



In [25]:

    
im = open_image(IMG_PATH/im0_d[FILE_NAME])

Matplotlib's plt.subplots is a really sueful wrapper for creating plots, regardless of whether you have more than one subplot. NOTE that Matplotlib has an optional object-oriented API which is much easier to understand and use (although few examples online use it).



In [26]:

    
def show_img(im, figsize=None, ax=None):
    if not ax: fig, ax = plt.subplots(figsize=figsize)
    ax.imshow(im)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    return ax

A simple but rarely used trick to making text visible regardless of background is to use white text with blackoutline, or vice versa. Here's how to do it in matplotlib:



In [27]:

    
def draw_outline(o, lw):
    o.set_path_effects([patheffects.Stroke(
        linewidth=lw, foreground='black'), patheffects.Normal()])

Note that * in argument lists is the splat operator. In this case it's a little shortcut compared to writing out b[-2], b[-1].



In [28]:

    
def draw_rect(ax, b):
    patch = ax.add_patch(patches.Rectangle(b[:2], *b[-2:], fill=False, edgecolor='white', lw=2))
    draw_outline(patch, 4)



In [29]:

    
def draw_text(ax, xy, txt, sz=14):
    text = ax.text(*xy, txt, 
                   verticalalignment='top', color='white', fontsize=sz, weight='bold')
    draw_outline(text, 1)



In [30]:

    
ax = show_img(im)
b = bb_hw(im0_a[0])
draw_rect(ax, b)
draw_text(ax, b[:2], cats[im0_a[1]]) # b[:2] is top-left; im0_a[1] is class

So because Matplotlib has an OO API, we can just create an axis object in draw_text, and pass that off to draw_outline to draw an outline around it. Same for the bounding box: draw_rect creates an axis object called patch that it sends to draw_outline which puts a black outline around the white rectangle.

Matplotlib calls .add_patch on an axis object to draw a rectangle via a patches.Rectangle argument.

What's great is now that we have all that set up, we can use it for all our Object Detection work going forward! So let's package that all up a bit for quick use later.



In [34]:

    
# draw image with annotations
def draw_im(im, ann):
    ax = show_img(im, figsize=(16,8))
    for b,c in ann:  # destructuring assignment
        b = bb_hw(b)
        draw_rect(ax, b)
        draw_text(ax, b[:2], cats[c], sz=16)

# draw image at a particular index
def draw_idx(i):
    im_a = trn_anno[i]  # grab img ID
    im = open_image(IMG_PATH/trn_fns[i]) # open image
    print(im.shape)
    draw_im(im, im_a)



In [35]:

    
draw_idx(17)









    



(364, 480, 3)

2. Largest Item Classifier

A λambda function is simply a way to define an anonymous function inline. Here we use it to describe how to sort the annotation for each image - by bounding box size (descending).



In [36]:

    
def get_lrg(b):
    if not b: raise Exception()
    b = sorted(b, key=lambda x: np.product(x[0][-2:] - x[0][:2]), reverse=True)
    return b[0]



In [37]:

    
trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

Here's something cool -- J.Howard started with the second line, above, then wrote the first. He started with the API he wanted to work with -- then implemented it.

Something that takes all of the bounding boxes for a particular image and finds the largest.

He does that by sorting the bounding boxes via: the product of the difference of the last two items of the bounding-box lis (bottom-right corner) and the first two items (top-left corner). (bot-right) minus (top-left) = width and height; product of that = size of the bounding box. Cool.

Now we have a dictionary from image id to a single bounding box - the largest for that image.



In [38]:

    
b, c = trn_lrg_anno[23]
b = bb_hw(b)
ax = show_img(open_image(IMG_PATH/trn_fns[23]), figsize=(5,10))
draw_rect(ax, b)
draw_text(ax, b[:2], cats[c], sz=16)

It's very important to look at your work at every stage in the pipeline.



In [39]:

    
(PATH/'tmp').mkdir(exist_ok=True) # making a new folder in our directory
CSV = PATH/'tmp/lrg.csv'  # path to large-objects csv file

Often it's easiest to simply create a CSV of the data you want to model, rather than trying to create a custom dataset. here we use Pandas to help us create a CSV of the image filename and class. Basically: we already have a ImageClassifierData.from_csv method, there's no reason to build a custom dataloader; just put the labels & ids into a CSV file.

--> this is actually exactly what I did for my GLoC Detector.

Below: easiest way to create CSV: Pandas dataframe. Create as dictionary of 'name of column' : 'list of things in that column'. columns is specified even though columns are already given becaues dictionaries are unordered -- and order matters here.

--> Learned that the hard way in my GLoC Detector.



In [40]:

    
df = pd.DataFrame({'fn': [trn_fns[o] for o in trn_ids], 
                   'cat': [cats[trn_lrg_anno[o][1]] for o in trn_ids]}, columns=['fn','cat'])
df.to_csv(CSV, index=False)



In [41]:

    
f_model = resnet34
sz = 224
bs = 64

From here on it's jus tlike Dogs vs Cats! We have a CSV file containing a bunch of file names, and for each one: the class and bounding box.



In [42]:

    
tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_side_on, crop_type=CropType.NO)
md = ImageClassifierData.from_csv(PATH, JPEGS, CSV, tfms=tfms)



In [43]:

    
x, y = next(iter(md.val_dl))



In [44]:

    
show_img(md.val_ds.denorm(to_np(x))[0]);









    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Some differences with how things were done in Part 1. crop_type is different - to resize an image to 224x224 the image is resized to 224 along its smallest axis, then a random square crop is taken; during validation take a center crop (multiple random crops if using data augmentation). We don't want to do that for Object Detection because objects can be anywhere in an image -- in Image Classification the object is usually in the center -- we don't want to crop out the object we want to detect.

crop_type=CropType.NO means no crop -- the image is just resized to a square. Most CV models work better if you crop rather than squish, but the still work well nonetheless.

md is a ModelData object. It's .trn_dl is a train dataloader iterator that returns a the next minibatch. TO use it manually: iter(md.val_dl) returns an iterator from which you can call next(.) to get the net minibatch.

However, we can't take X and Y from the next minibatch and send it straight to show_img because:

It's not a NumPy array
It's not on the CPU
Its shape is wrong
Numbers are not between 0 and 1

All standard ImageNet-pretrained models expect data to've been normalized to a zero-mean and a one-standard-deviation.

So you use the method denorm via md.val_ds.denorm(.) on the dataset which denormalizes the image and reorders its dimensions. The image norms are hardcoded in Fastai from ImageNet, Inception, etc. statistics. The denormalization depends on the transform - and the dataset knows which transform was used to create it.



In [41]:

    
# x[0] # x : minibatch of 64x3x224x224



In [41]:

    
learn = ConvLearner.pretrained(f_model, md, metrics=[accuracy])
learn.opt_fn = optim.Adam



In [53]:

    
lrf = learn.lr_find(1e-5, 100)









    





 
 










    



 25%|██▌       | 8/32 [00:09<00:27,  1.16s/it, loss=3.61] 
 78%|███████▊  | 25/32 [00:22<00:06,  1.12it/s, loss=10.8]

When your LR finder graph looks like this, you can ask for more points on each end:



In [54]:

    
learn.sched.plot()



In [55]:

    
learn.sched.plot(n_skip=5, n_skip_end=1)



In [56]:

    
# NB: disabling monitor thread to fix annoying tqdm errors - https://github.com/tqdm/tqdm/issues/481
# also: https://github.com/tqdm/tqdm/issues/481#issuecomment-378067008
# tqdm.monitor_interval = 0 ## <-- doesn't seem to change anything



In [57]:

    
lr = 2e-2



In [58]:

    
learn.fit(lr, 1, cycle_len=1)









    





 
 










    



  6%|▋         | 2/32 [00:05<01:27,  2.93s/it, loss=3.8]  
epoch      trn_loss   val_loss   accuracy                 
    0      1.334422   0.621988   0.824519  







    Out[58]:





[0.6219884678721428, 0.8245192319154739]



In [59]:

    
lrs = np.array([lr/1000, lr/100, lr])



In [60]:

    
learn.freeze_to(-2)



In [61]:

    
lrf = learn.lr_find(lrs/1000)
learn.sched.plot(1)









    





 
 










    



 84%|████████▍ | 27/32 [00:24<00:04,  1.09it/s, loss=3.72]



In [62]:

    
learn.fit(lrs/5, 1, cycle_len=1)









    





 
 










    



  0%|          | 0/32 [00:00<?, ?it/s]                    
epoch      trn_loss   val_loss   accuracy                  
    0      0.786478   0.609383   0.79372   







    Out[62]:





[0.6093826554715633, 0.7937199547886848]



In [63]:

    
learn.unfreeze()

Accuracy isn't improving much - since many images have multiple different objects, it's going to be impossible to be that accurate.



In [64]:

    
learn.fit(lrs/5, 1, cycle_len=2)









    





 
 










    



epoch      trn_loss   val_loss   accuracy                  
    0      0.612344   0.559828   0.820613  
    1      0.426435   0.551404   0.836689                  







    Out[64]:





[0.5514039918780327, 0.8366887047886848]



In [65]:

    
learn.save('class_one')



In [42]:

    
learn.load('class_one')



In [43]:

    
x,y = next(iter(md.val_dl))
probs = F.softmax(predict_batch(learn.model, x), -1)
x,preds = to_np(x),to_np(probs)
preds = np.argmax(preds, -1)

You can use the python debugger pdb to step through code.

pdb.set_trace() to set a breakpoint
%debug magic to trace an error

Commands you need to know:

s / n / c
u / d
p
l



In [56]:

    
fix, axes = plt.subplots(3, 4, figsize=(12, 8))
for i,ax in enumerate(axes.flat):
    ima = md.val_ds.denorm(x)[i]
    b = md.classes[preds[i]]
    ax = show_img(ima, ax=ax)
    draw_text(ax, (0,0), b)
plt.tight_layout()









    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

A way to see what the code above does is to take the contents of the loop, outdent them, set i = 0, put each line in a seperate cell, and run each cell, printing its output.



In [ ]:

    
i = 0



In [ ]:

    
ima = md.val_ds.denorm(x)[i]



In [ ]:

    
b = md.classes[preds[i]]



In [ ]:

    
ax = show_img(ima, ax=ax)



In [ ]:

    
draw_text(ax, (0,0), b)

Python debugger is also very useful. If you know an issue is happening at a specific minibatch/iteration: you can just set a breakpoint: pdb.set_trace() to trigger conditionally. h for help. Pdb will show you the line it's about to run. If you want to print out something, you can write any python expression - hit enter - and it'll display it: in this case: md.val_ds.denorm(x)

Then to see what comes after that piece of code: l for list displays where in the code/loop you are. It'll point an arrow to the line you are about to run.

To run that line and go to the next: n. We enter n again to go to the next line -- also if you just hit enter, pdb will do the last thing you entered. At this point, if we want to see the b -- b is also a pdb command, so to force pdb to print the b variable: p b. Then we enter n for the next line.

At this point the code is about to draw the image. We don't want to draw it - but we want to see how it's drawn -- so we want to step into the function with s

s takes us into draw_text. We can enter n to go to the next line inside draw_text, and l to see where we are inside the function.

If we want to continue on the next break point we enter c

Example case: say we step into denorm: n -> s -> l

What'll often happen is your debugging something in your PyTorch module, and it's hit an exception and you're trying to debug: you'll find yourself 6 layers deep inside PyTorch, and you want to see back up to where you called it from.

In this case we're inside a @property but we want to know what was going on up the call stack: we hit u - which doesn't run anything, but changes the context of the debugger, to show us what called it -- at which point we can enter things to find out about that environment, like p i to print the value of i.

After that if we want to go back down again: d.

ipdb is the IPython debugger and it's prettier.



In [57]:

    
fix, axes = plt.subplots(3, 4, figsize=(12, 8))
for i,ax in enumerate(axes.flat):
    pdb.set_trace()  # <-- pdb breakpoint
    ima = md.val_ds.denorm(x)[i]
    b = md.classes[preds[i]]
    ax = show_img(ima, ax=ax)
    draw_text(ax, (0,0), b)
plt.tight_layout()









    



> <ipython-input-57-a39fe248546d>(4)<module>()
-> ima = md.val_ds.denorm(x)[i]
(Pdb) h

Documented commands (type help <topic>):
========================================
EOF    c          d        h         list      q        rv       undisplay
a      cl         debug    help      ll        quit     s        unt      
alias  clear      disable  ignore    longlist  r        source   until    
args   commands   display  interact  n         restart  step     up       
b      condition  down     j         next      return   tbreak   w        
break  cont       enable   jump      p         retval   u        whatis   
bt     continue   exit     l         pp        run      unalias  where    

Miscellaneous help topics:
==========================
exec  pdb

(Pdb) l
  1  	fix, axes = plt.subplots(3, 4, figsize=(12, 8))
  2  	for i,ax in enumerate(axes.flat):
  3  	    pdb.set_trace()  # <-- pdb breakpoint
  4  ->	    ima = md.val_ds.denorm(x)[i]
  5  	    b = md.classes[preds[i]]
  6  	    ax = show_img(ima, ax=ax)
  7  	    draw_text(ax, (0,0), b)
  8  	plt.tight_layout()
[EOF]
(Pdb) n
> <ipython-input-57-a39fe248546d>(5)<module>()
-> b = md.classes[preds[i]]
(Pdb) 
> <ipython-input-57-a39fe248546d>(6)<module>()
-> ax = show_img(ima, ax=ax)
(Pdb) p b
'person'
(Pdb) n






    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).






    



> <ipython-input-57-a39fe248546d>(7)<module>()
-> draw_text(ax, (0,0), b)
(Pdb) s
--Call--
> <ipython-input-26-70200bcfd910>(1)draw_text()
-> def draw_text(ax, xy, txt, sz=14):
(Pdb) n
> <ipython-input-26-70200bcfd910>(2)draw_text()
-> text = ax.text(*xy, txt,
(Pdb) l
  1  	def draw_text(ax, xy, txt, sz=14):
  2  ->	    text = ax.text(*xy, txt,
  3  	                   verticalalignment='top', color='white', fontsize=sz, weight='bold')
  4  	    draw_outline(text, 1)
[EOF]
(Pdb) c
> <ipython-input-57-a39fe248546d>(3)<module>()
-> pdb.set_trace()  # <-- pdb breakpoint
(Pdb) n
> <ipython-input-57-a39fe248546d>(4)<module>()
-> ima = md.val_ds.denorm(x)[i]
(Pdb) s
--Call--
> /home/ubuntu/Kaukasos/FADL2/fastai/dataset.py(306)val_ds()
-> @property
(Pdb) l
301  	
302  	    @property
303  	    def is_reg(self): return self.trn_ds.is_reg
304  	    @property
305  	    def trn_ds(self): return self.trn_dl.dataset
306  ->	    @property
307  	    def val_ds(self): return self.val_dl.dataset
308  	    @property
309  	    def test_ds(self): return self.test_dl.dataset
310  	    @property
311  	    def trn_y(self): return self.trn_ds.y
(Pdb) u
> <ipython-input-57-a39fe248546d>(4)<module>()
-> ima = md.val_ds.denorm(x)[i]
(Pdb) p i
1
(Pdb) d
> /home/ubuntu/Kaukasos/FADL2/fastai/dataset.py(306)val_ds()
-> @property
(Pdb) q






    



---------------------------------------------------------------------------
BdbQuit                                   Traceback (most recent call last)
<ipython-input-57-a39fe248546d> in <module>()
      2 for i,ax in enumerate(axes.flat):
      3     pdb.set_trace()  # <-- pdb breakpoint
----> 4     ima = md.val_ds.denorm(x)[i]
      5     b = md.classes[preds[i]]
      6     ax = show_img(ima, ax=ax)

~/Kaukasos/FADL2/fastai/dataset.py in val_ds(self)
    304     @property
    305     def trn_ds(self): return self.trn_dl.dataset
--> 306     @property
    307     def val_ds(self): return self.val_dl.dataset
    308     @property

~/src/anaconda3/envs/fastai/lib/python3.6/bdb.py in trace_dispatch(self, frame, event, arg)
     51             return self.dispatch_line(frame)
     52         if event == 'call':
---> 53             return self.dispatch_call(frame, arg)
     54         if event == 'return':
     55             return self.dispatch_return(frame, arg)

~/src/anaconda3/envs/fastai/lib/python3.6/bdb.py in dispatch_call(self, frame, arg)
     84             return self.trace_dispatch
     85         self.user_call(frame, arg)
---> 86         if self.quitting: raise BdbQuit
     87         return self.trace_dispatch
     88 

BdbQuit:

h - ~~md.val_ds.denorm(x)~~ - l - n - n - p b - n - s - n - l - c -- n - s - l - u - p i - d - exit

The other place that the debugger comes in particularly hadny is if you've got an Exception - particularly if that's deep inside PyTorch.

Imagine we set preds[i*100] instead of preds[i]. In this case it's easy to see what's wrong but often it isn't so here's what we do.



In [58]:

    
fix, axes = plt.subplots(3, 4, figsize=(12, 8))
for i,ax in enumerate(axes.flat):
    ima = md.val_ds.denorm(x)[i]
    b = md.classes[preds[i*100]]
    ax = show_img(ima, ax=ax)
    draw_text(ax, (0,0), b)
plt.tight_layout()









    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).






    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-58-fa3693042431> in <module>()
      2 for i,ax in enumerate(axes.flat):
      3     ima = md.val_ds.denorm(x)[i]
----> 4     b = md.classes[preds[i*100]]
      5     ax = show_img(ima, ax=ax)
      6     draw_text(ax, (0,0), b)

IndexError: index 100 is out of bounds for axis 0 with size 64

%debug pops open the debuggger at the point the exception happened. So now we can check what happened. Try len(preds). Try p i*100 to print i*100; and you can do down, up the list, and etc. J.Howard does all of the Fastai Library and Course development in Jupyter notebooks interactively, and uses %debug all the time - along with copying out functions into indiv.cells and running them piecemeal.



In [59]:

    
%debug









    



> <ipython-input-58-fa3693042431>(4)<module>()
      2 for i,ax in enumerate(axes.flat):
      3     ima = md.val_ds.denorm(x)[i]
----> 4     b = md.classes[preds[i*100]]
      5     ax = show_img(ima, ax=ax)
      6     draw_text(ax, (0,0), b)

ipdb> len(preds)
64
ipdb> i*100
100
ipdb> q

Next from here we want to create the bounding box. We can createa a Regression instead of a Classification Neural Network. A Classification Neural Network is one that has a Sigmoid or Softmax output, and a Cross Entropy, Binary Cross Entropy, or Negative Log Likelihood loss function. If we don't have the Softmax or Sigmoid at the end and use Mean Squared Error as a loss function: it's now a regression model. So we can use it to predict a continuous number rather than a category.

We also know that we can have multiple outputs -- we did multiple object classification in the Planet competition.

So we can combine those two ideas and do a multiple column regression. In this case we have 4 numbers (top-left x,y; bot-right x,y) - and we could create a neural net with 4 activations and no softmax/sigmoid, and with an MSE loss function.

Here is were you think in terms of Differentiable Programming. You're not thinking "how do I create a bounding box model" -- instead it's "I need 4 numbers, therefore I need a neural network with 4 activations. That's half of what i need to know. The other half is the loss function. What's a loss function that, when it is lower, it means that the 4 numbers are better? If I can do those 2 things: I'm done.

Well, if the X is close to the first activation, and the Y to the second and so forth .. then I'm done! So that's it. I just need to create a model with 4 activations with an MSE loss function, and that should be it.

3. Bbox only

Now we'll try to find the bounding box of the largest object. This is simply a regression with 4 outputs. So we can use a CSV with multiple 'labels'.



In [45]:

    
BB_CSV = PATH/'tmp/bb.csv'



In [46]:

    
bb = np.array([trn_lrg_anno[o][0] for o in trn_ids]) # largest item dictionary
bbs = [' '.join(str(p) for p in o) for o in bb] # bbxs separated by spaces via list comprehension

df = pd.DataFrame({'fn': [trn_fns[o] for o in trn_ids], 'bbox': bbs}, columns=['fn','bbox'])
df.to_csv(BB_CSV, index=False) # turn dataframe to csv

From Part 1: to do a multiple-labels classification: the multiple labels have to be space-separated, and the filename is comma-separated.



In [47]:

    
BB_CSV.open().readlines()[:5]









    Out[47]:





['fn,bbox\n',
 '000012.jpg,96 155 269 350\n',
 '000017.jpg,77 89 335 402\n',
 '000023.jpg,1 2 461 242\n',
 '000026.jpg,124 89 211 336\n']



In [50]:

    
f_model = resnet34
sz = 224
bs = 64

Set continuous=True to tell fastai this is a regression problem, which means it won't one-hot encode the labels, and will use MSE as the default crit.

NOTE that we have to tell the transforms constructor that our labels are coordinates, so that it can handle the transforms correctly.

Also, we use CropType.NO because we want to squish the rectangular images into squares, rather than center cropping, so that we don't accidentally crop out some of the objects (This is less of an issue in something like ImageNet, where there's a single object to classify, and it's generally large and centrally located).

NOTE that when we're doing scaling and data augmentation - that has to be applied to the bounding boxes as well as the images $\longrightarrow$ tfm_y=TfmType.COORD

The transforms are defined inside the fastai transforms module as just a list. You can always create your own list of augmentations:



In [51]:

    
augs = [RandomFlip(), 
        RandomRotate(30), 
        RandomLighting(0.1, 0.1)]



In [52]:

    
tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True)

Now we can grab a minibatch of data:



In [57]:

    
x,y = next(iter(md.val_dl))



In [58]:

    
ima = md.val_ds.denorm(to_np(x))[0] # denormalize
b = bb_hw(to_np(y[0])); b # cvt bb -> hw to display









    Out[58]:





array([ 73.,   0., 199., 462.])

Let's go through and rerun the iterator a few times with the new model data object & augmentations:



In [59]:

    
idx = 3
fig,axes = plt.subplots(3, 3, figsize=(9,9))
for i,ax in enumerate(axes.flat):
    x,y = next(iter(md.aug_dl))
    ima = md.val_ds.denorm(to_np(x))[idx]
    b = bb_hw(to_np(y[idx]))
    print(b)
    show_img(ima, ax=ax)
    draw_rect(ax, b)









    



[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]
[115.  63. 241. 312.]

This is the problem with Data Augmentation when your Dependent Variable is pixel values or in some way connected to your Independent Variable. The 2 need to be Augmented together.

Looking at the arrays above: the image is larger than 224 but we're asking for 224 without any scaling or cropping. Our Dependent Variable needs to go through all the same Gemoetric Transformations as our Dependent Variable.

So to do that: every transformation has an optional 'transform y' parameter. It takes a transform type Enum with a few options. the COORD option says the y vals represent coordinates. So if you flip or rotate, you need to change those coordinates to match. So we just add TfmType.COORD to all our augmentations.

We also have to add the same thing to our Transforms from Model function bc it does the cropping, zooming, padding, and resizing -- and all that needs to happen to the Dependent Variable as well.



In [53]:

    
augs = [RandomFlip(tfm_y=TfmType.COORD), 
        RandomRotate(30, tfm_y=TfmType.COORD), 
        RandomLighting(0.1, 0.1, tfm_y=TfmType.COORD)]



In [54]:

    
tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=TfmType.COORD, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True, bs=4)



In [64]:

    
idx = 3
fig,axes = plt.subplots(3, 3, figsize=(9,9))
for i,ax in enumerate(axes.flat):
    x,y = next(iter(md.aug_dl))
    ima = md.val_ds.denorm(to_np(x))[idx]
    b = bb_hw(to_np(y[idx]))
    print(b)
    show_img(ima, ax=ax)
    draw_rect(ax, b)









    



[ 52.  38. 106. 183.]
[ 43.  28. 130. 195.]
[ 37.  22. 145. 201.]
[ 40.  21. 147. 202.]
[ 40.  21. 147. 202.]
[ 48.  33. 113. 190.]
[ 34.  18. 156. 205.]
[ 66.  38. 105. 183.]
[ 33.  19. 137. 204.]

Now you'll see the bounding box changes each time, and also matches the transform of the picture.

You need to be careful not to do too much rotation with bounding boxes - since there's not enough information for them to stay accurate. Polygons or Segmentations would work fine.

We'll use a maximum of 3° rotations, and only half the time.



In [55]:

    
tfm_y = TfmType.COORD
augs  = [RandomFlip(tfm_y=tfm_y), 
         RandomRotate(3, p=0.5, tfm_y=tfm_y), 
         RandomLighting(0.05, 0.05, tfm_y=tfm_y)]

tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=tfm_y, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms, continuous=True)

fastai let's you use a custom_head to add your own module on top of a ConvNet, instead of the Adaptive Pooling and Fully Connected Net which is added by default. In this case, we don't want to do any pooling, since we need to know the activations of each grid cell.

The final layer has 4 activations, one per bounding box coordinate. Our target is continous, not categorical, so the MSE loss function used does not do any Sigmoid or Softmax to the module outputs.

We want to create a ConvNet based on ResNet34, but we don't want to add the standard set of Fully-Connected layers that create a Classifier. We'll add a single Linear Layer with 4 outputs. L1 loss vs MSE: instead of taking mean of squared errors, add up absolute errors.



In [68]:

    
512*7*7









    Out[68]:





25088



In [56]:

    
head_reg4 = nn.Sequential(Flatten(), nn.Linear(25088, 4)) # flatten prev layer, add linear layer
learn = ConvLearner.pretrained(f_model, md, custom_head=head_reg4) # add custom head to resnet34 model
learn.opt_fn = optim.Adam
learn.crit = nn.L1Loss() # L1 loss instead of MSE

.summary runs a small batch of data through the model and prints out how big it is at every layer.

We can see that at the end of the Convolutional section before we hit the Flatten its 512 7 7. So a Rank-3 Tensor sized 51277 flattened out into a Rank-1 Tensor (a Vector) would be 25,088 long.

That's why we have that line nn.Linear(25088, 4) $\longrightarrow$ inputting the flattened Tensor and outputting 4 numbers for our bounding box coordinates.

So now we just stick that on top of a pretrained ResNet



In [70]:

    
learn.summary()









    Out[70]:





OrderedDict([('Conv2d-1',
              OrderedDict([('input_shape', [-1, 3, 224, 224]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', False),
                           ('nb_params', 9408)])),
             ('BatchNorm2d-2',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-3',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('nb_params', 0)])),
             ('MaxPool2d-4',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-5',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-6',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-7',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-8',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-9',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-10',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('BasicBlock-11',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-12',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-13',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-14',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-15',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-16',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-17',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('BasicBlock-18',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-19',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-20',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-21',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-22',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 36864)])),
             ('BatchNorm2d-23',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('trainable', False),
                           ('nb_params', 128)])),
             ('ReLU-24',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('BasicBlock-25',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-26',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 73728)])),
             ('BatchNorm2d-27',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-28',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-29',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-30',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('Conv2d-31',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 8192)])),
             ('BatchNorm2d-32',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-33',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('BasicBlock-34',
              OrderedDict([('input_shape', [-1, 64, 56, 56]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-35',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-36',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-37',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-38',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-39',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-40',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('BasicBlock-41',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-42',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-43',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-44',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-45',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-46',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-47',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('BasicBlock-48',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-49',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-50',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-51',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-52',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 147456)])),
             ('BatchNorm2d-53',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('trainable', False),
                           ('nb_params', 256)])),
             ('ReLU-54',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('BasicBlock-55',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 128, 28, 28]),
                           ('nb_params', 0)])),
             ('Conv2d-56',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 294912)])),
             ('BatchNorm2d-57',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-58',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-59',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-60',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('Conv2d-61',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 32768)])),
             ('BatchNorm2d-62',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-63',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-64',
              OrderedDict([('input_shape', [-1, 128, 28, 28]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-65',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-66',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-67',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-68',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-69',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-70',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-71',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-72',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-73',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-74',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-75',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-76',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-77',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-78',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-79',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-80',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-81',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-82',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-83',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-84',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-85',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-86',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-87',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-88',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-89',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-90',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-91',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-92',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-93',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-94',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-95',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-96',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 589824)])),
             ('BatchNorm2d-97',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('trainable', False),
                           ('nb_params', 512)])),
             ('ReLU-98',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('BasicBlock-99',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 256, 14, 14]),
                           ('nb_params', 0)])),
             ('Conv2d-100',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1179648)])),
             ('BatchNorm2d-101',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-102',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Conv2d-103',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 2359296)])),
             ('BatchNorm2d-104',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('Conv2d-105',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 131072)])),
             ('BatchNorm2d-106',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-107',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('BasicBlock-108',
              OrderedDict([('input_shape', [-1, 256, 14, 14]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Conv2d-109',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 2359296)])),
             ('BatchNorm2d-110',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-111',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Conv2d-112',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 2359296)])),
             ('BatchNorm2d-113',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-114',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('BasicBlock-115',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Conv2d-116',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 2359296)])),
             ('BatchNorm2d-117',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-118',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Conv2d-119',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 2359296)])),
             ('BatchNorm2d-120',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('trainable', False),
                           ('nb_params', 1024)])),
             ('ReLU-121',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('BasicBlock-122',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 512, 7, 7]),
                           ('nb_params', 0)])),
             ('Flatten-123',
              OrderedDict([('input_shape', [-1, 512, 7, 7]),
                           ('output_shape', [-1, 25088]),
                           ('nb_params', 0)])),
             ('Linear-124',
              OrderedDict([('input_shape', [-1, 25088]),
                           ('output_shape', [-1, 4]),
                           ('trainable', True),
                           ('nb_params', 100356)]))])



In [71]:

    
learn.lr_find(1e-5, 100)
learn.sched.plot(5)









    





 
 










    



 78%|███████▊  | 25/32 [00:25<00:07,  1.02s/it, loss=484]



In [72]:

    
lr = 2e-3



In [73]:

    
learn.fit(lr, 2, cycle_len=1, cycle_mult=2)









    





 
 










    



  0%|          | 0/32 [00:00<?, ?it/s]                   
epoch      trn_loss   val_loss                            
    0      50.772287  35.129805 
    1      38.535283  28.261074                           
    2      32.358841  27.802977                           







    Out[73]:





[27.802977323532104]



In [74]:

    
lrs = np.array([lr/100, lr/10, lr])



In [75]:

    
learn.freeze_to(-2)



In [76]:

    
lrf = learn.lr_find(lrs/1000)
learn.sched.plot(1)









    





 
 










    



epoch      trn_loss   val_loss                            
    0      77.061368  3180939047862272.0



In [77]:

    
learn.fit(lrs, 2, cycle_len=1, cycle_mult=2)









    





 
 










    



epoch      trn_loss   val_loss                            
    0      26.232766  22.649825 
    1      23.470803  23.043481                           
    2      19.954377  20.096373                           







    Out[77]:





[20.096372604370117]



In [78]:

    
learn.freeze_to(-3)



In [79]:

    
learn.fit(lrs, 1, cycle_len=2)









    





 
 










    



epoch      trn_loss   val_loss                            
    0      19.141457  21.559981 
    1      16.852875  19.628677                           







    Out[79]:





[19.628676652908325]



In [80]:

    
learn.save('reg4')



In [81]:

    
learn.load('reg4')



In [82]:

    
x,y = next(iter(md.val_dl))
learn.model.eval()
preds = to_np(learn.model(VV(x)))



In [83]:

    
fig, axes = plt.subplots(3, 4, figsize=(12,8))
for i,ax in enumerate(axes.flat):
    ima = md.val_ds.denorm(to_np(x))[i]
    b = bb_hw(preds[i])
    ax = show_img(ima, ax=ax)
    draw_rect(ax, b)
plt.tight_layout()









    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

4. Single Object Detection

Now let's put those pieces together and get something that classifies and does bounding boxes.

There are 3 things we need to do whenever we want to train a Neural Network:

Provide Data
Pick Architecture
Loss Function

Loss Function: "anything that gives a lower number here is a better network, using this Data and Architecture.



In [57]:

    
f_model = resnet34
sz = 224
bs = 64

val_idxs = get_cv_idxs(len(trn_fns))

We need to create those 3 things for our Classification + Bounding Box Regression. We need a ModelData object with Independents: the images; Dependents a tuple of the bounding box coordinates, and class.

There are many ways to do this. The way J.Howard chose is to simply create 2 ModelData objects representing the 2 Dependent Variables, using CSVs as before.



In [58]:

    
tfms = tfms_from_model(f_model, sz, crop_type=CropType.NO, tfm_y=TfmType.COORD, aug_tfms=augs)
md = ImageClassifierData.from_csv(PATH, JPEGS, BB_CSV, tfms=tfms,
        continuous=True, val_idxs=val_idxs)



In [59]:

    
md2 = ImageClassifierData.from_csv(PATH, JPEGS, CSV, tfms=tfms_from_model(f_model, sz))

And we'll just create a class to merge them together.

A dataset can be anything with __len__ and __getitem__. Here's a dataset that adds a 2nd label to an existing dataset:



In [60]:

    
class ConcatLblDataset(Dataset):
    def __init__(self, ds, y2): self.ds, self.y2 = ds,y2
    def __len__(self): return len(self.ds)
    
    def __getitem__(self, i): # indexer - lets you use []
        x,y = self.ds[i]
        return (x, (y, self.y2[i]))

We'll use it to add the classes to the bounding box labels.



In [61]:

    
trn_ds2 = ConcatLblDataset(md.trn_ds, md2.trn_y)
val_ds2 = ConcatLblDataset(md.val_ds, md2.val_y)



In [63]:

    
val_ds2[0][1]









    Out[63]:





(array([  0.,  49., 205., 180.], dtype=float32), 14)

We can replace the dataloader's datasets with these new ones:



In [64]:

    
md.trn_dl.dataset = trn_ds2
md.val_dl.dataset = val_ds2

We can test it by grabbing a minibatch of data.

We have to denormalize the images from the dataloader before they can be plotted.



In [66]:

    
x,y = next(iter(md.val_dl))
idx = 3
ima = md.val_ds.ds.denorm(to_np(x))[idx]
b   = bb_hw(to_np(y[0][idx])); b









    Out[66]:





array([ 52.,  38., 107., 185.])



In [67]:

    
ax = show_img(ima)
draw_rect(ax, b)
draw_text(ax, b[:2], md2.classes[y[1][idx]])

That's one way to customize the dataset.

So we have our Data. Now we need the Architecture. The architectures will be the same as those we used for the classifier and bounding box regression -- we're just going to combine them.

If there are C classes, then the num actvns we need in the final layer is 4 + C.

We need 1 output actvn foreach class (for its probaility) plus 1 foreach bounding box coordinate. We'll use an extra linear layer this time, plus some Dropout, to help us train a more flexibly model. That's why we have the Flatten layer at the start of this head (for the new linear layer). There's no batchnorm to start because the ResNet backbone already has batchnorm in its final layer.



In [68]:

    
head_reg4 = nn.Sequential(
    Flatten(),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(25088, 256),
    nn.ReLU(),
    nn.BatchNorm1d(256),
    nn.Dropout(0.5),
    nn.Linear(256, 4+len(cats)), # final layer: 4+C actvns
)
models = ConvnetBuilder(f_model, 0, 0, 0, custom_head=head_reg4)

learn = ConvLearner(md, models)
learn.opt_fn = optim.Adam

We got Data, Architecture, now we need a Loss Function.

The loss function needs to look at those 4+C actvns and decide how good they are. For the first 4 we use L1 loss just as in bounding box regression before (L1 loss is like MSE: instead of sum of squares, it's the sum of absolute values). For the rest of the actvns we can use Cross Entropy loss.

BN -> ReLU cannot create negative numbers; ReLU -> BN can & works a bit better. The bb_i = F.sigmoid(bb_i)*224 helps force our data into the right range -- this helps our network train.

A great thing about Dropout is it has a parameter -- parameters are great esp for regularization: bc it lets you build a great big overparamaterized model and then decide how much to regularize it.

Finally, with detn_loss (Detection Loss): now that we got our inputs and targets, we can just calculate the L1 Loss and add to it the Cross Entropy:

F.l1_loss(bb_i, bb_t) + F.cross_entropy(c_i, c_t)*20

And that's our Loss Function. The CE and L1L may be of (wildly) different scales, and in the LossFn the larger one will dominate. So J.Howard ran them in the debugger, found how big each was, and found multiply the CE by 20 made them both about the same scale.

As you're training it's nice to print out information as you go. So J.Howard grabbed the L1 part of the LossFn in the function detn_l1, and also created a fn for accuracy, so he could make them into metrics to print out.



In [70]:

    
def detn_loss(input, target): # input: actvns; target: grnd truth
    bb_t,c_t = target # destructuring asnmt to grab bbs & classes
    bb_i,c_i = input[:, :4], input[:, 4:] # batch dim; 1st 4 (bbx); 4 onwards (classes)
    bb_i = F.sigmoid(bb_i)*224 # we know bbxs between 0:224 (img size)
    # these quantities were looked at sepearately first, then a 
    # multiplier was chosen to make them approximately equal
    return F.l1_loss(bb_i, bb_t) + F.cross_entropy(c_i, c_t)*20

def detn_l1(input, target):
    bb_t,_ = target
    bb_i = input[:, :4]
    bb_i = F.sigmoid(bb_i)*224
    return F.l1_loss(V(bb_i),V(bb_t)).data

def detn_acc(input, target):
    _,c_t = target
    c_i = input[:, 4:]
    return accuracy(c_i, c_t)

learn.crit = detn_loss
learn.metrics = [detn_acc, detn_l1]



In [71]:

    
learn.lr_find()
learn.sched.plot()









    





 
 










    



 97%|█████████▋| 31/32 [00:27<00:00,  1.13it/s, loss=683]



In [88]:

    
lr = 1e-2

Now we have something that's printing out our Object Detection Loss, Accuracy, and Detection L1:



In [73]:

    
learn.fit(lr, 1, cycle_len=3, use_clr=(32,5))









    





 
 










    



  0%|          | 0/32 [00:00<?, ?it/s]                   
epoch      trn_loss   val_loss   detn_acc   detn_l1       
    0      73.176612  44.366713  0.809345   32.001707 
    1      52.393672  36.935222  0.804988   25.659293     
    2      43.470237  35.287855  0.830228   24.564463     







    Out[73]:





[35.28785514831543, 0.830228365957737, 24.564463138580322]



In [74]:

    
learn.save('reg1_0')



In [75]:

    
learn.freeze_to(-2)



In [76]:

    
lrs = np.array([lr/100, lr/10, lr])



In [77]:

    
learn.lr_find(lrs/1000)
learn.sched.plot(0)









    





 
 










    



 91%|█████████ | 29/32 [00:29<00:03,  1.01s/it, loss=221]



In [78]:

    
learn.fit(lrs/5, 1, cycle_len=5, use_clr=(32,10))









    





 
 










    



  0%|          | 0/32 [00:00<?, ?it/s]                   
epoch      trn_loss   val_loss   detn_acc   detn_l1       
    0      37.609015  38.193221  0.771635   23.343077 
    1      32.696741  33.438532  0.800631   21.385442     
    2      27.560341  31.346839  0.823618   20.193344     
    3      23.907904  30.862823  0.836238   19.943489     
    4      21.603783  30.55057   0.835787   19.524885     







    Out[78]:





[30.550570011138916, 0.835787259042263, 19.524885416030884]



In [79]:

    
learn.save('reg1_1')



In [80]:

    
learn.load('reg1_1')



In [81]:

    
learn.unfreeze()



In [82]:

    
learn.fit(lrs/10, 1, cycle_len=10, use_clr=(32,10))









    





 
 










    



epoch      trn_loss   val_loss   detn_acc   detn_l1       
    0      18.72655   31.53353   0.813401   19.960532 
    1      18.660238  30.159864  0.847055   19.484958     
    2      17.797316  30.981511  0.827524   19.371871     
    3      17.041492  30.29471   0.827073   18.924679     
    4      16.199508  30.725677  0.82512    18.962325     
    5      15.334077  30.009115  0.827073   18.492353     
    6      14.451973  30.234323  0.821214   18.437176     
    7      13.944366  29.614287  0.829928   18.197707     
    8      13.433998  29.723704  0.833834   18.123147     
    9      12.8435    29.678114  0.821214   18.090298     







    Out[82]:





[29.678114414215088, 0.8212139457464218, 18.090298175811768]



In [83]:

    
learn.save('reg1')



In [84]:

    
learn.load('reg1')



In [85]:

    
y = learn.predict()
x,_ = next(iter(md.val_dl))

Detection accuracy is still in the low 80s, which isn't surprising since ResNet was designed for classification -- so we're unlikely to improve it with smth so simple.

ResNet wasn't designed to do bbx regression -- it was explicitly design to not care about geometry: it takes that last 7x7 grid of activations and averages them all together: throwing away a bunch of information.

When we only train the last layer the detection accuracy is very bad ( 24.564463 in the first epoch set ), and it improves a lot - although detection accuracy doesn't improve.

Interestingly, the Detection L1, when we do accuracy and bounding box at the same time ( 18.090298 ) seems a lot better than when we just do bounding box regression ( 19.628677 in section 3).

Figuring out what the main object of an image is is the 'hard' part, and where its bounding box is is the 'easy' part. If you have a single Network that's supposed to tell what and where an object is, it's going to share all the computation about finding the object -- so all that shared computation is very efficient.

So when we backprop the errors in the class and place -- all that computation is going to help in finding the main object.

>> anytime you have multiple tasks that share some concept of what those tasks would need to do to complete their work, its very likely they should share some layers of the network.



In [86]:

    
from scipy.special import expit



In [87]:

    
fig, axes = plt.subplots(3, 5, figsize=(12, 8))
for i,ax in enumerate(axes.flat):
    ima = md.val_ds.ds.denorm(to_np(x))[i]
    bb = expit(y[i][:4])*224
    b = bb_hw(bb)
    c = np.argmax(y[i][4:])
    ax = show_img(ima, ax=ax)
    draw_rect(ax, b)
    draw_text(ax, b[:2], md2.classes[c])
plt.tight_layout()









    



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

End



In [ ]: