This approach follows ideas described in Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Arxiv 2013.
First of all, we'll need a little Python script to run the Matlab Selective Search code.
Let's run detection on an image of a couple of cats frolicking (one of the ImageNet detection challenge pictures), which we will download from the web. You'll need a prototxt specifying the network, and a trained model. We will use examples/imagenet_deploy.prototxt
and alexnet_train_iter_470000
or caffe_reference_imagenet_model: you'll need to download the model for yourself! Note that caffe_reference_imagenet_model
may give slightly different results than this example, which uses alexnet_train_iter_470000
.
wget http://farm1.static.flickr.com/220/512450093_7717fb8ce8.jpg
echo `pwd`/"512450093_7717fb8ce8.jpg" > image_cat.txt
python detector.py --images_file=image_cat.txt --crop_mode=selective_search --model_def=<path to imagenet_deploy.prototxt> --pretrained_model=<path to alexnet_train_iter_470000> --output_file=selective_cat.h5
Running this outputs an HDF5 file with the filenames, selected windows, and their ImageNet scores. Of course, we only ran on one image, so the filenames will all be the same.
In general, detector
is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file
, and detector
will process all of them.
Although this guide gives an example of ImageNet detection, detector
is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. Refer to python detector.py --help
and the images_dim
and images_mean_file
parameters to describe your data set. No need for hardcoding.
In [2]:
import pandas as pd
df = pd.read_hdf('images_feats_selective_cat.h5', 'df')
print(df)
print('')
print(df.iloc[0])
Load ImageNet class names and make a DataFrame of features.
In [3]:
with open('../../../examples/synset_words.txt') as f:
labels = [' '.join(l.strip().split(' ')[1:]).split(',')[0] for l in f.readlines()]
feats_df = pd.DataFrame(np.vstack(df.feat.values), columns=labels)
feats_df
Out[3]:
Let's look at the activations.
In [30]:
gray()
matshow(feats_df.values)
xlabel('Classes')
ylabel('Windows')
Out[30]:
Now let's take max across all windows and plot the top classes.
In [33]:
max_s = feats_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
Okay, there are indeed cats in there. Let's see what the top detection and the 10th top detection are.
In [43]:
# Find, print, and display max detection.
window_order = pd.Series(feats_df.values.max(1)).order(ascending=False)
i = window_order.index[0]
j = window_order.index[10]
# Show top predictions for top detection.
f = pd.Series(df['feat'].iloc[i], index=labels)
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')
# Show top predictions for 10th top detection.
f = pd.Series(df['feat'].iloc[j], index=labels)
print('10th detection:')
print(f.order(ascending=False)[:5])
# Show top detection in red, 10th top detection in blue.
im = imread('512450093_7717fb8ce8.jpg')
imshow(im)
currentAxis = plt.gca()
window = df.iloc[i]['window']
coords = (window[1], window[0]), window[3], window[2]
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))
window = df.iloc[j]['window']
coords = (window[1], window[0]), window[3], window[2]
currentAxis.add_patch(Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Out[43]:
That's cool. Both of these detections are tiger cats. Let's take all 'tiger cat' detections and NMS them to get rid of overlapping windows.
In [18]:
def nms_detections(windows, overlap=0.5):
"""
Non-maximum suppression: Greedily select high-scoring detections and skip
detections that are significantly covered by a previously selected detection.
This version is translated from Matlab code by Tomasz Malisiewicz,
who sped up Pedro Felzenszwalb's code.
Input:
dets: ndarray where each row is ['x', 'y', 'w', 'h', 'score']
overlap: float for the minimum overlap ratio (0.5 default)
Output:
dets that remain after suppression.
"""
if np.shape(dets)[0] < 1:
return dets
x1 = dets[:, 1]
y1 = dets[:, 0]
w = dets[:, 3]
h = dets[:, 2]
x2 = w + x1 - 1
y2 = h + y1 - 1
area = w * h
s = dets[:, 4]
ind = np.argsort(s)
pick = []
counter = 0
while len(ind) > 0:
last = len(ind) - 1
i = ind[last]
pick.append(i)
counter += 1
xx1 = np.maximum(x1[i], x1[ind[:last]])
yy1 = np.maximum(y1[i], y1[ind[:last]])
xx2 = np.minimum(x2[i], x2[ind[:last]])
yy2 = np.minimum(y2[i], y2[ind[:last]])
w = np.maximum(0., xx2 - xx1 + 1)
h = np.maximum(0., yy2 - yy1 + 1)
o = w * h / area[ind[:last]]
to_delete = np.concatenate(
(np.nonzero(o > overlap)[0], np.array([last])))
ind = np.delete(ind, to_delete)
return dets[pick, :]
In [44]:
scores = feats_df['tiger cat']
windows = np.vstack(df['window'].values)
dets = np.hstack((windows, scores))
nms_dets = nms_detections(dets)
Show top 3 NMS'd detections for 'tiger cat' in the image.
In [47]:
im = imread('512450093_7717fb8ce8.jpg')
imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
currentAxis.add_patch(Rectangle((det[1], det[0]), det[3], det[2], fill=False, edgecolor=c, linewidth=5))