Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!
So far, we have focused our attention exclusively on supervised learning problems, where every data point in the dataset had a known label or target value. However, what do we do when there is no known output, or no teacher to supervise the learning algorithm?
This is what unsupervised learning is all about. In unsupervised learning, the learning is shown only in the input data and is asked to extract knowledge from this data without further instruction. We have already talked about one of the many forms that unsupervised learning comes in—dimensionality reduction. Another popular domain is cluster analysis, which aims to partition data into distinct groups of similar items.
In this chapter, we want to understand how different clustering algorithms can be used to extract hidden structures in simple, unlabeled datasets. These hidden structures have many benefits, whether they are used in feature extraction, image processing, or even as a preprocessing step for supervised learning tasks. As a concrete example, we will learn how to apply clustering to images in order to reduce their color spaces to 16 bits.
More specifically, we want to answer the following questions:
Let's get started!
The book features a detailed treatment of different unsupervised learning algorithms, including a walk-through of k-means and expectation maximization. For more information on the topic, please refer to the book.