Cluster Validity

SSE

Using the elbow method to find the number of clusters

Statistical Framework for Cluster Validity

  • Randomy generate data, and apply your clustering solution

External Methods

You need some ground-truth labels

  • Purity of cluster $i$: $\displaystyle \max_{j}\left[\frac{m_{ij}}{\sum_k m_{ik}}\right]$
  • Entropy of cluster $i$: $-\displaystyle \sum_{j=1}^C\frac{m_{ij}}{\sum_k m_{ik}}\log_2 \frac{m_{ij}}{\sum_k m_{ik}}$

    Lower entropy indicates higher purity.


In [ ]: