This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

Implementing Agglomerative Hierarchical Clustering

Although OpenCV does not provide an implementation of agglomerative hierarchical clustering, it is a popular algorithm that should, by all means, belong to our machine learning repertoire.

We start out by generating 10 random data points, just like in the previous figure:


In [1]:
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=10, random_state=100)

Using the familiar statistical modeling API, we import the AgglomerativeClustering algorithm and specify the desired number of clusters:


In [2]:
from sklearn import cluster
agg = cluster.AgglomerativeClustering(n_clusters=3)

Fitting the model to the data works, as usual, via the fit_predict method:


In [3]:
labels = agg.fit_predict(X)

We can generate a scatter plot where every data point is colored according to the predicted label:


In [4]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, s=100)


Out[4]:
<matplotlib.collections.PathCollection at 0x23811a45908>

That's it! This marks the end of another wonderful adventure.