Kevin J. Walchko created 17 Aug 2017
Feature detection uses various methods to find unique regions within an image. These features can then be used in tracking, stereo vision, image stitching, etc. High frequency locations like corners (think edge detection) are typically good candidates for features. This lesson will start off talking about features and move on to matching and homography. We will use homography in camera calibration and show an example of a virtual billboard (like in baseball or soccer).
In [1]:
%matplotlib inline
from __future__ import print_function
from __future__ import division
import cv2 # opencv itself
import numpy as np # matrix manipulations
from matplotlib import pyplot as plt # this lets you draw inline pictures in the notebooks
import pylab # this allows you to control figure size
pylab.rcParams['figure.figsize'] = (10.0, 8.0) # this controls figure size in the notebook
The image is very simple. At the top of image, six small image patches are given. Question for you is to find the exact location of these patches in the original image. How many correct results can you find?
Features are image locations that are "easy" to find in an image. Indeed, one of the early feature detection techniques Lucas-Kanade, sometimes called Kanade-Lucas-Tomasi or KLT features come from a seminal paper called "Good features to track". I point this out, because there is an OpenCV function called cv2.goodFeaturesToTrack()
which does this!
Edge detection finds brightness discontinuities in an image while feature detection finds distinctive regions. There are a bunch of different feature detectors and these all have some characteristics in common: they should be quick and things that are close in image-space are close in feature-space (that is, the feature representation of an object looks like the feature representation of objects that look like that object).
Some of the available feature detectors available in OpenCV 3.3:
Note: some of the very famous feature detectors (SIFT/SURF and so on) are available, but aren't in OpenCV by default due to patent issues. You can build them for OpenCV if you want, or you can find other implementations (David Lowe's SIFT implementation works just fine). Just google for instructions. For the purposes of this lesson (and to save time) we're only going to look at those which are in OpenCV by default.
If you think of edges as being lines, then corners are an obvious choice for features as they represent the intersection of two lines. One of the earlier corner detectors was introduced by Harris, and it is still a very effective corner detector that gets used quite a lot: it's reliable and it's fast.
In [2]:
input_image = cv2.imread('dnd.jpg')
input_image = cv2.cvtColor(input_image,cv2.COLOR_BGR2RGB)
harris_test=input_image.copy()
# greyscale it
gray = cv2.cvtColor(harris_test,cv2.COLOR_RGB2GRAY)
gray = np.float32(gray)
blocksize=4 #
kernel_size=3 # sobel kernel: must be odd and fairly small
# run the harris corner detector
# parameters are blocksize, Sobel parameter and Harris threshold
dst = cv2.cornerHarris(gray,blocksize,kernel_size,0.05)
# result is dilated for marking the corners, this is
# visualisation related and just makes them bigger
dst = cv2.dilate(dst,None)
# we then plot these on the input image for visualisation
# purposes, using bright red
harris_test[dst>0.01*dst.max()]=[0,0,255]
plt.subplot(1,2,1)
plt.imshow(harris_test);
plt.subplot(1,2,2)
plt.imshow(dst,cmap = 'gray');
Properly speaking the Harris Corner detection is more like a Sobel operator - indeed it is very much like a sobel operator. It doesn't really return a set of features, instead it is a filter which gives a strong response on corner-like regions of the image. We can see this more clearly if we look at the Harris output from the cell above (dst is the Harris response, before thresholding). Well we can kind-of see. You should be able to see that there are slightly light places in the image where there are corner like features, and that there are really light parts of the image around the black and white corners of the writing
When we consider modern feature detectors there are a few things we need to mention. What makes a good feature includes the following:
In [3]:
# make a copy to play with
orbimg=input_image.copy()
orb = cv2.ORB_create()
# find the keypoints with ORB
kp = orb.detect(orbimg,None)
# compute the descriptors with ORB
kp, des = orb.compute(orbimg, kp)
# draw keypoints
print('number key points found:', len(kp))
print('point: {} size: {} angle: {}'.format(kp[1].pt, kp[1].size, kp[1].angle))
cv2.drawKeypoints(orbimg,kp,orbimg)
plt.imshow(orbimg);
In [4]:
# create an image with all zeros (black), use the input image as a template for
# size and the color depth (3 for color)
img2match=np.zeros(input_image.shape,np.uint8)
# grab the goblin's face
face=input_image[60:250, 70:350] # copy out a bit
img2match[60:250,70:350]=[0,0,0] # blank that region
face=cv2.flip(face,0) # flip the copy
img2match[200:200+face.shape[0], 200:200+face.shape[1]]=face # paste it back somewhere else
plt.imshow(img2match);
The feature matching function (in this case Orb) detects and then computes keypoint descriptors. These are a higher dimensional representation of the image region immediately around a point of interest (sometimes literally called "interest points").
These higher-dimensional representations can then be matched; the strength you gain from matching these descriptors rather than image regions directly is that they have a certain invariance to transformations (like rotation, or scaling). OpenCV providers matcher routines to do this, in which you can specify the distance measure to use.
In [5]:
kp2 = orb.detect(img2match,None)
# compute the descriptors with ORB
kp2, des2 = orb.compute(img2match, kp2)
# create BFMatcher object: this is a Brute Force matching object
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des,des2)
# Sort them by distance between matches in feature space - so the best matches are first.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 50 matches.
oimg = cv2.drawMatches(orbimg,kp,img2match,kp2,matches[:50], orbimg)
plt.imshow(oimg);
As you can see there are some false matches, but it's fairly clear that most of the matched keypoints found are actual matches between image regions on the face.
To be more precise about our matching we could choose to enforce homography constraints, which looks for features that sit on the same plane.
A Homography is a transformation (a 3×3 matrix) that maps the points in one image to the corresponding points in the other image.
Since we are dealing with 2d images, we do:
$$ \begin{bmatrix} x_1 \\ y_1 \\ 1 \end{bmatrix} = H_{3x3} \begin{bmatrix} x_2 \\ y_2 \\ 1 \end{bmatrix} \\ H_{3x3} = \begin{bmatrix} Rotation_{2x2} & Translation_{(x,y)} \\ skew & 1 \end{bmatrix} $$Notice the 1 at the bottom, hopefully this reminds of homogenious transforms back in block 1, forward kinematics, but in 2d instead of 3d.
Homography has many uses:
By matching features between 2 images, we can understand the relationship between the images and warp them in such a way, we can join the images together and make a larger panoramic view.
Just like our little goblin above, we understand the points on the QR code and where the are in real space. If we match those point in the image, we can estimate the orientation of the camera relative to the QR code. Once we have that pose
(remember that term from earlier?), we can draw a 3D object onto our 2D image and give it the correct orientation and lighting.
Just like above with the VR examples, we can match a target pattern in an image to real world coordinates and estimate the distortion that must occur. Later we will talk about camera calibration and how to do this.
We don't always get the orientation we want in an image. That's okay, because in various applications we can warp (or reorient) the image perspective so we can get what we want. You see stuff like this in the movies all the time and some of it is real.
To calculate a homography between two images, you need to know at least 4 point correspondences between the two images. If you have more than 4 corresponding points, it is even better. OpenCV will robustly estimate a homography that best fits all corresponding points. Usually, these point correspondences are found automatically by matching features like with SIFT or SURF between the images.
'''
pts_src and pts_dst are numpy arrays of points
in source and destination images. We need at least
4 corresponding points.
'''
h, status = cv2.findHomography(pts_src, pts_dst)
'''
The calculated homography can be used to warp
the source image to destination. Size is the
size (width,height) of im_dst
'''
im_dst = cv2.warpPerspective(im_src, h, size)
In many televised sports events, advertisements are virtually inserted into live video feeds. For example, both soccer and baseball place ads on small advertisement boards right outside the boundary of the field that can be virtually changed. Instead of displaying the same ad to everybody, advertisers can choose which ads to show based on the person’s demographics, location etc. In these applications the four corners of the advertisement board are detected in the video which serve as the destination points. The four corners of the ad serve as the source points. A homography is calculated based on these four corresponding points and it is used to warp the ad into the video frame.
In [6]:
advert = cv2.imread('afa.jpg')
advert = cv2.cvtColor(advert,cv2.COLOR_BGR2RGB)
print('advert size:', advert.shape)
ts = cv2.imread('times-square.jpg')
ts = cv2.cvtColor(ts,cv2.COLOR_BGR2RGB)
# original image dimensions
pts_src = np.array([
[0, 0],
[0, advert.shape[0]],
[advert.shape[0], advert.shape[1]],
[advert.shape[1], 0]
])
# destination to map too
# note: I just eye balled this ... not perfect!
pts_dst = np.array([
[100, 100],
[0, 200],
[150, 330],
[200, 200]
])
h, status = cv2.findHomography(pts_src, pts_dst)
print('Homography matrix:', h)
# warp advert image
im_dst = cv2.warpPerspective(advert, h, (ts.shape[1], ts.shape[0]))
# this section of code will create a mask, then replace the masked
# area with our advert:
# 1. convert copy to grayscale
# 2. threshold image so it is just black and white (binary)
# 3. invert mask (black)
# 4. cut out existing current colors there, anything and'ed with 0 is 0
# 5. now add the two images together and fill in the cut out
img2gray = cv2.cvtColor(im_dst,cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 10, 255, cv2.THRESH_BINARY)
mask_inv = cv2.bitwise_not(mask)
img1_bg = cv2.bitwise_and(ts,ts,mask = mask_inv)
final = img1_bg + im_dst
plt.subplot(2,2,1)
plt.imshow(advert)
plt.title('Advertisement')
plt.subplot(2,2,2)
plt.imshow(ts)
plt.title('Times Square')
plt.subplot(2,2,3)
plt.imshow(im_dst)
plt.title('Warped image')
plt.subplot(2,2,4)
plt.imshow(final)
plt.title('Virtual Billboard');
Since we are talking about homography and warping images, I thought I would throw this in here about omni cameras. There is more than one way to create a camera that can see 360 degrees.
A catadioptric sensor is a camera made up of a mirror (catoptrics) and lenses (dioptrics). By simplifying things a lot, one can think of a catadioptric sensor as consisting of a normal camera viewing the world reflected in a parabolic-shaped mirror. To be useful, the mirror must be perfectly aligned with and placed at a precise distace from the camera. The main advantage of a catadioptric sensor is its panoramic view of the world, being able to view a full hemisphere (360 degrees by 90 degrees).
Now for robotics, this offers a potential oppertunity for a sensor with 360$^\circ$ FOV. There are also examples of this process being done in stereo.
In [7]:
class Dewarper:
def __init__(self, Ws, Hs, Rmax, Rmin, Cx, Cy, interpolation=cv2.INTER_CUBIC):
self.interpolation = interpolation
# determine the destination image size
Wd = int(2.0*(float(Rmax+Rmin)/2.0)*np.pi)
Hd = Rmax-Rmin
print('Unwrapped image size:',Wd,Hd)
self.buildLUT(Wd, Hd, Rmax, Rmin, Cx, Cy)
"""
Creates a polar map look up table (LUT)
in:
Wd - width destination
Hd - height destination
Ws - width src
Hs - height src
Rmin - inner ring of image
Rmax - outer ring of image
Cx - camera center x
Cy - camera center y
out: mapping matrix
"""
def buildLUT(self, Wd, Hd, Rmax, Rmin, Cx, Cy):
map_x = np.zeros((Hd, Wd), np.float32)
map_y = np.zeros((Hd, Wd), np.float32)
# polar to Cartesian
# x = r*cos(t)
# y = r*sin(t)
for i in range(0,int(Hd)):
for j in range(0,int(Wd)):
theta = -float(j)/float(Wd)*2.0*np.pi
rho = float(Rmin + i)
map_x.itemset((i,j), Cx + rho*np.cos(theta))
map_y.itemset((i,j), Cy + rho*np.sin(theta))
(self.map1, self.map2) = cv2.convertMaps(map_x, map_y, cv2.CV_16SC2)
"""
Takes the original image and unwarps it, note the new image is much smaller
in: raw image needing to be unwarped
out: panoramic image
"""
def unwarp(self, img):
output = cv2.remap(img, self.map1, self.map2, self.interpolation)
return output
In [8]:
def process(file, rmin):
# read in the image grayscale
frame = cv2.imread(file,0)
h,w = frame.shape
print('Image size:',w,h)
# These are done by hand ... had trouble automating it reliably
cx = int(w/2)
cy = int(h/2)
rmax = cy - rmin
print('Parameters: center(x,y) %d,%d radius(max,min) %d,%d'%(cx,cy,rmax,rmin))
dewarp = Dewarper(w,h,rmax,rmin,cx,cy)
im = dewarp.unwarp(frame)
return im, frame
In [9]:
im, org = process('image2.png',30)
plt.imshow(org, cmap='gray');
In [10]:
plt.imshow(im, cmap='gray');
In [11]:
im, orig = process('football_donut.jpg', 150)
plt.imshow(orig, cmap='gray');
In [12]:
plt.imshow(cv2.flip(im,0), cmap='gray');
Notice in the bottom picture of times square, we are able to insert the advertisement onto the billboard by warping the image. With a little more programming and finess, you could start your own business selling advertisements to people on virtual billboards.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.