Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
One of the most common and important data types for computer vision are, of course, images. The most straightforward way to represent images is probably by using the grayscale value of each pixel in the image. Usually, grayscale values are not very indicative of the data they describe. For example, if we saw a single pixel with grayscale value 128, could we tell what object this pixel belonged to? Probably not. Therefore, grayscale values are not very effective image features.
Alternatively, we might find that colors contain some information that raw grayscale values cannot capture. Most often, images come in the conventional RGB color space, where every pixel in the image gets an intensity value for its apparent redness (R), greenness (G), and blueness (B). However, OpenCV offers a whole range of other color spaces, such as Hue Saturation Value (HSV), Hue Saturation Lightness (HSL), and the Lab color space. Let's have a quick look at them.
I am sure that you are already familiar with the RGB color space, which uses additive mixing of different shades of red, green, and blue to produce different composite colors. The RGB color space is useful in everyday life, because it covers a large part of the color space that the human eye can see. This is why color television sets or color computer monitors only need to care about producing mixtures of red, green, and blue light.
We can load a sample image in BGR format using cv2.imread
:
In [1]:
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
img_bgr = cv2.imread('data/lena.jpg')
If you have ever tried to display a BGR image using Matplotlib or similar libraries, you
might have noticed a weird blue tint to the image. This is due to the fact that Matplotlib
expects an RGB image. To achieve this, we have to permute the color channels using
cv2.cvtColor
:
In [3]:
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
Then we can use Matplotlib to plot the images (BGR on the left, RGB on the right):
In [4]:
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(img_bgr)
plt.subplot(122)
plt.imshow(img_rgb);
However, ever since the RGB color space was created, people have realized that it is actually quite a poor representation of human vision. Therefore, researchers have developed two alternative representations. One of them is called HSV, which stands for hue, saturation, and value, and the other one is called HLS, which stands for hue, lightness, and saturation. You might have seen these color spaces in color pickers and common image editing software. In these color spaces, the hue of the color is captured by a single hue channel, the colorfulness is captured by a saturation channel, and the lightness or brightness is captured by a lightness or value channel.
In OpenCV, an RGB image can easily be converted to HSV color space using
cv2.cvtColor
:
In [5]:
img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)
The same is true for the HLS color space. In fact, OpenCV provides a whole range of
additional color spaces, which are available via cv2.cvtColor
. All we need to do is to
replace the color flag with one of the following:
cv2.COLOR_BGR2HLS
cv2.COLOR_BGR2LAB
cv2.COLOR_BGR2YUV
One of the most straightforward features to find in an image are corners.
OpenCV provides at least two different algorithms to find corners in an image:
cv2.cornerHarris
in OpenCV.cv2.goodFeaturesToTrack
in OpenCV.Harris corner detection works only on grayscale images, so we first want to convert our BGR image to grayscale:
In [6]:
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
We specify the pixel neighborhood size considered for corner detection (blockSize
), an aperture parameter for the edge detection (ksize
), and the so-called Harris detector-free parameter (k
):
In [7]:
corners = cv2.cornerHarris(img_gray, 2, 3, 0.04)
Let's have a look at the result:
In [8]:
plt.figure(figsize=(12, 6))
plt.imshow(corners, cmap='gray');
However, corner detection is not sufficient when the scale of an image changes. To this end, David Lowe
came up with a method to describe interesting points in image independently of orientation and size - hence
the name scale-invariant feature transform (SIFT). In OpenCV 3, this function is part of the
xfeatures2d
module:
In [9]:
sift = cv2.xfeatures2d.SIFT_create()
The algorithm works in two steps:
Keypoints can be detected with a single line of code:
In [10]:
kp = sift.detect(img_bgr)
We can use the drawKeypoints
function to visualize identified keypoints.
We can also pass
an optional flag cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
to surround every keypoint
with a circle whose size denotes its importance, and a radial line that indicates the orientation of the
keypoint:
In [11]:
import numpy as np
img_kp = np.zeros_like(img_bgr)
img_kp = cv2.drawKeypoints(img_rgb, kp, img_kp, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
In [12]:
plt.figure(figsize=(12, 6))
plt.imshow(img_kp)
Out[12]:
Feature descriptors can be computed with compute
:
In [13]:
kp, des = sift.compute(img_bgr, kp)
Typically you get 128 feature values for every keypoint found:
In [14]:
des.shape
Out[14]:
You can do these two steps in one go, too:
In [15]:
kp2, des2 = sift.detectAndCompute(img_bgr, None)
And the result should be the same:
In [16]:
np.allclose(des, des2)
Out[16]:
SIFT has proved to be really good, but it is not fast enough for most applications. This is where speeded up robust features (SURF) come in, which replaced the computationally expensive computations of SIFT with a box filter. In OpenCV, SURF works in the exact same way as SIFT:
In [17]:
surf = cv2.xfeatures2d.SURF_create()
In [18]:
kp = surf.detect(img_bgr)
In [19]:
img_kp = np.zeros_like(img_bgr)
img_kp = cv2.drawKeypoints(img_rgb, kp, img_kp,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
In [20]:
plt.figure(figsize=(12, 6))
plt.imshow(img_kp)
Out[20]:
In [21]:
kp, des = surf.compute(img_bgr, kp)
SURF finds more keypoints, and usually returns 64 feature values per keypoint:
In [22]:
des.shape
Out[22]: