This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

Representing images

One of the most common and important data types for computer vision are, of course, images. The most straightforward way to represent images is probably by using the grayscale value of each pixel in the image. Usually, grayscale values are not very indicative of the data they describe. For example, if we saw a single pixel with grayscale value 128, could we tell what object this pixel belonged to? Probably not. Therefore, grayscale values are not very effective image features.

Using color spaces

Alternatively, we might find that colors contain some information that raw grayscale values cannot capture. Most often, images come in the conventional RGB color space, where every pixel in the image gets an intensity value for its apparent redness (R), greenness (G), and blueness (B). However, OpenCV offers a whole range of other color spaces, such as Hue Saturation Value (HSV), Hue Saturation Lightness (HSL), and the Lab color space. Let's have a quick look at them.

Encoding images in RGB space

I am sure that you are already familiar with the RGB color space, which uses additive mixing of different shades of red, green, and blue to produce different composite colors. The RGB color space is useful in everyday life, because it covers a large part of the color space that the human eye can see. This is why color television sets or color computer monitors only need to care about producing mixtures of red, green, and blue light.

We can load a sample image in BGR format using cv2.imread:


In [1]:
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
img_bgr = cv2.imread('data/lena.jpg')

If you have ever tried to display a BGR image using Matplotlib or similar libraries, you might have noticed a weird blue tint to the image. This is due to the fact that Matplotlib expects an RGB image. To achieve this, we have to permute the color channels using cv2.cvtColor:


In [3]:
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

Then we can use Matplotlib to plot the images (BGR on the left, RGB on the right):


In [4]:
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(img_bgr)
plt.subplot(122)
plt.imshow(img_rgb);


Encoding images in HSV and HLS space

However, ever since the RGB color space was created, people have realized that it is actually quite a poor representation of human vision. Therefore, researchers have developed two alternative representations. One of them is called HSV, which stands for hue, saturation, and value, and the other one is called HLS, which stands for hue, lightness, and saturation. You might have seen these color spaces in color pickers and common image editing software. In these color spaces, the hue of the color is captured by a single hue channel, the colorfulness is captured by a saturation channel, and the lightness or brightness is captured by a lightness or value channel.

In OpenCV, an RGB image can easily be converted to HSV color space using cv2.cvtColor:


In [5]:
img_hsv = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)

The same is true for the HLS color space. In fact, OpenCV provides a whole range of additional color spaces, which are available via cv2.cvtColor. All we need to do is to replace the color flag with one of the following:

  • HLS (hue, lightness, saturation) using cv2.COLOR_BGR2HLS
  • LAB (lightness, green-red, and blue-yellow) using cv2.COLOR_BGR2LAB
  • YUV (overall luminance, blue-luminance, red-luminance) using cv2.COLOR_BGR2YUV

Detecting corners in images

One of the most straightforward features to find in an image are corners.

OpenCV provides at least two different algorithms to find corners in an image:

  • Harris corner detection: Knowing that edges are areas with high-intensity changes in all directions, Harris and Stephens came up with a fast way of finding such areas. This algorithm is implemented as cv2.cornerHarris in OpenCV.
  • Shi-Tomasi corner detection: Shi and Tomasi have their own idea of what are good features to track, and they usually do better than Harris corner detection by finding the N strongest corners. This algorithm is implemented as cv2.goodFeaturesToTrack in OpenCV.

Harris corner detection works only on grayscale images, so we first want to convert our BGR image to grayscale:


In [6]:
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)

We specify the pixel neighborhood size considered for corner detection (blockSize), an aperture parameter for the edge detection (ksize), and the so-called Harris detector-free parameter (k):


In [7]:
corners = cv2.cornerHarris(img_gray, 2, 3, 0.04)

Let's have a look at the result:


In [8]:
plt.figure(figsize=(12, 6))
plt.imshow(corners, cmap='gray');


Using the Scale-Invariant Feature Transform (SIFT)

However, corner detection is not sufficient when the scale of an image changes. To this end, David Lowe came up with a method to describe interesting points in image independently of orientation and size - hence the name scale-invariant feature transform (SIFT). In OpenCV 3, this function is part of the xfeatures2d module:


In [9]:
sift = cv2.xfeatures2d.SIFT_create()

The algorithm works in two steps:

  • detect: This step identifies interesting points in an image (also known as keypoints)
  • compute: This step computes the actual feature values for every keypoint.

Keypoints can be detected with a single line of code:


In [10]:
kp = sift.detect(img_bgr)

We can use the drawKeypoints function to visualize identified keypoints. We can also pass an optional flag cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS to surround every keypoint with a circle whose size denotes its importance, and a radial line that indicates the orientation of the keypoint:


In [11]:
import numpy as np
img_kp = np.zeros_like(img_bgr)
img_kp = cv2.drawKeypoints(img_rgb, kp, img_kp, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

In [12]:
plt.figure(figsize=(12, 6))
plt.imshow(img_kp)


Out[12]:
<matplotlib.image.AxesImage at 0x7fa2b4c0bfd0>

Feature descriptors can be computed with compute:


In [13]:
kp, des = sift.compute(img_bgr, kp)

Typically you get 128 feature values for every keypoint found:


In [14]:
des.shape


Out[14]:
(238, 128)

You can do these two steps in one go, too:


In [15]:
kp2, des2 = sift.detectAndCompute(img_bgr, None)

And the result should be the same:


In [16]:
np.allclose(des, des2)


Out[16]:
True

Using Speeded Up Robust Features (SURF)

SIFT has proved to be really good, but it is not fast enough for most applications. This is where speeded up robust features (SURF) come in, which replaced the computationally expensive computations of SIFT with a box filter. In OpenCV, SURF works in the exact same way as SIFT:


In [17]:
surf = cv2.xfeatures2d.SURF_create()

In [18]:
kp = surf.detect(img_bgr)

In [19]:
img_kp = np.zeros_like(img_bgr)
img_kp = cv2.drawKeypoints(img_rgb, kp, img_kp,
                        flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

In [20]:
plt.figure(figsize=(12, 6))
plt.imshow(img_kp)


Out[20]:
<matplotlib.image.AxesImage at 0x7fa2b4b906a0>

In [21]:
kp, des = surf.compute(img_bgr, kp)

SURF finds more keypoints, and usually returns 64 feature values per keypoint:


In [22]:
des.shape


Out[22]:
(351, 64)