Using dstoolbox cluster

HierarchicalClustering

Imports



In [1]:

    
import numpy as np



In [2]:

    
from dstoolbox.cluster import HierarchicalClustering



In [3]:

    
np.random.seed(0)

HierarchicalClustering

A variant of sklearn.cluster.AgglomerativeClustering that returns a dynamic number of labels.

HierarchicalClustering uses the same scipy algorithms as sklearn but sklearn requires to determine beforehand how many clusters you want. With HierarchicalClustering we can set the max_dist parameter and let the data decide how many clusters occur. This way, HierarchicalClustering is similar to sklearn.cluster.DBSCAN, which also returns a variable amount of different clusters.



In [4]:

    
X = np.random.random((100, 5))



In [5]:

    
labels = HierarchicalClustering(max_dist=0.5).fit_predict(X)



In [6]:

    
len(set(labels))









    Out[6]:





74



In [7]:

    
labels = HierarchicalClustering(max_dist=0.9).fit_predict(X)



In [8]:

    
len(set(labels))









    Out[8]:





38



In [9]:

    
labels = HierarchicalClustering(max_dist=1.1).fit_predict(X)



In [10]:

    
len(set(labels))









    Out[10]:





23



In [ ]:

Using dstoolbox cluster

Table of contents

Imports

HierarchicalClustering