Title: Agglomerative Clustering
Slug: agglomerative_clustering
Summary: How to conduct agglomerative clustering in scikit-learn.
Date: 2017-09-22 12:00
Category: Machine Learning
Tags: Clustering
Authors: Chris Albon

Preliminaries


In [1]:
# Load libraries
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering

Load Iris Flower Data


In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data

Standardize Features


In [3]:
# Standarize features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

Conduct Agglomerative Clustering

In scikit-learn, AgglomerativeClustering uses the linkage parameter to determine the merging strategy to minimize the 1) variance of merged clusters (ward), 2) average of distance between observations from pairs of clusters (average), or 3) maximum distance between observations from pairs of clusters (complete).

Two other parameters are useful to know. First, the affinity parameter determines the distance metric used for linkage (minkowski, euclidean, etc.). Second, n_clusters sets the number of clusters the clustering algorithm will attempt to find. That is, clusters are successively merged until there are only n_clusters remaining.


In [4]:
# Create meanshift object
clt = AgglomerativeClustering(linkage='complete', 
                              affinity='euclidean', 
                              n_clusters=3)

# Train model
model = clt.fit(X_std)

Show Cluster Membership


In [5]:
# Show cluster membership
model.labels_


Out[5]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 0, 0, 0, 0, 2, 2,
       2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 0, 2, 2, 0,
       2, 2, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])