This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

< Classifying Emails Using the Naive Bayes Classifier | Contents | Understanding k-means clustering >

Discovering Hidden Structures with Unsupervised Learning

So far, we have focused our attention exclusively on supervised learning problems, where every data point in the dataset had a known label or target value. However, what do we do when there is no known output, or no teacher to supervise the learning algorithm?

This is what unsupervised learning is all about. In unsupervised learning, the learning is shown only in the input data and is asked to extract knowledge from this data without further instruction. We have already talked about one of the many forms that unsupervised learning comes in—dimensionality reduction. Another popular domain is cluster analysis, which aims to partition data into distinct groups of similar items.

In this chapter, we want to understand how different clustering algorithms can be used to extract hidden structures in simple, unlabeled datasets. These hidden structures have many benefits, whether they are used in feature extraction, image processing, or even as a preprocessing step for supervised learning tasks. As a concrete example, we will learn how to apply clustering to images in order to reduce their color spaces to 16 bits.

More specifically, we want to answer the following questions:

What are $k$-means clustering and expectation-maximization, and how do I implement these things in OpenCV?
How can I arrange clustering algorithms in hierarchical trees, and what are the benefits that come from that?
How can I use unsupervised learning for preprocessing, image processing, and classification?

Outline

Let's get started!

The book features a detailed treatment of different unsupervised learning algorithms, including a walk-through of k-means and expectation maximization. For more information on the topic, please refer to the book.

< Classifying Emails Using the Naive Bayes Classifier | Contents | Understanding k-means clustering >