Constrained Clustering

Forms of constraints:

  • Must-Link $ML(a,b)$
    • also refered to as positive constraint
  • Cannot-Link $CL(a,b)$
    • also known as negative constraint

Properties of constraints:

  • Transitive property: $ML(a,b)\ \&\ ML(b,c)\ \rightarrow\ ML(a,c)$
  • Symmetric
  • Reflexive

$CL(a,b) \ \&\ ML(b,c) \ \rightarrow \ CL(a,c)$

Applications and Advantages

  • Easy t o account for domain knowledge and different kinds of clues
  • No need to have lots of constraints, even a small number could be effective

Obtaining (extracting) constrinats

  • manual approach: asking a human expert whether pairs of items must be linked or not
  • automatic approach: using internal or external clues

Robustness of constrained clustering algorithms

  • effect of different sets of constraints

Constraint feasibility

  • related to the correctness of constraints
  • Example of infeasible constraints:
    • $X=\{x_1, x_2, x_3\}$ with $k=2$
    • Constraints: $\{CL\{x_1,x_2\}, CL\{x_2,x_3\}, \{x_1,x_3\}\}$
  • Testing the feasibility
    • positive constraints: $P$ problem
    • negative constraints: $NP$ problem

Characterizing constraint

  • Informativeness: measure the amount of information gained by constraints that could not ba gained by clustering itself
    • low informativeness indicates little value for constraints
  • Coherence: measure the agreement between constraints w.r.t. a distance measure; i.e. nearby points are not part of Cannot-Link, etc.

    • low coherence causes confusion for the clustering algorithm
  • High informtauveness and high coherence $\rightarrow$ high gain

Constrained Clustering Algorithms

  • constraint-based methods directly altering the clustering flow
  • distance-based methods learning a distance metric that accomodates for the given constraints.

Constrained K-Means by Wagstaff et al. (2001)

In the cluster assignment phase of K-means algorithm, for any point, check if it violates the constraints or not.

  • Positive constraints can be easily satisfied
  • Blindly assigns points correspond to a positive (Must-link) constraint to the same cluster
  • There can be a situation in which a set of negative (Cannot-link) constraints can make the clustering infeasible

Pairwise Constrained K-Means (PCKM) by Basu et al. (2004)

  • A new objective function $\phi_{pckm}$

    $$\phi_{pckm}(\Omega) = SSE(\Omega) + \text{penalty}_{ML}(\Omega) + \text{penalty}_{CL} (\Omega)$$

  • $SSE=\sum \sum_{}^{} |x_i - \omega_j|$: compactness of clusters
  • $\text{penalty}_{ML}= \omega_{ij} 1[l_i \ne l_j]$
  • $\text{penalty}_{CL}= \bar{\omega}_{ij} 1[l_i = l_j]$

HMRF K-Means

HMRF = Hidden Markov Random Field

A hybrid approach, between constraint-based and distance-based

$$\phi_{hmrfkm}(\Omega) = \sum \sum D(x,\omega_j) + \sum_{x_i,x_j\in ML} \omega_{ij} \xi_D 1[l_i \ne l_j] + \\ \sum_{x_i,x_j\in CL} \bar{\omega}_{ij} \left(\xi(D_{\max}) - \xi_D(x_i,x_j)\right) 1[l_i = l_j] + \\ \log Z$$

Constrained Normalized Cut

Spectral clustering

Transform the data into a weighted graph $G=(V,E,W)$ where the set of vertices $V=\{v_1, v_2, .. v_n\}$ represent the original data points, and weights of edges represent the similarity between pairs of points.

Graph-cut problem: partition he graph into k connected components such that the weigth of edges, $Cut(C_a, C_b)=\displaystyle \sum_{i\in C_a, j\in C_b} w_{ij}$

Normalized Cut For a clustering $C=\{C_1, C_2, ..C_k\}$, the NCut value is defined by $NCut(C) = \displaystyle \sum_{j=1}^k \frac{Cut(C_j,\bar{C}_j)}{Vol(C_j)}$

Volume of a partition $Vol(C_a)=\displaystyle \sum_{i\in C_a} \sum_{j=1}^{n} w_{ij}$

  • Membership Matrix $H=[h_{ij}]_{n\times k}$ where $h_{ij} = \left\{\begin{array}{cc}\displaystyle \frac{1}{\sqrt{Vol(C_j)}} & \text{if }i\in C_j\\ 0 & \text{otherwise}\end{array}\right.$
  • Diagonal Matrix $D=[d_{ij}]_{n\times n}$ where $d_{ii} = \text{degree}(i)=\sum_{j=1}^n w_{ij}$
  • Laplacian Matrix $L = D - W$

    $$\displaystyle \min_{C_1, C_2, ..C_k} \text{Trace}\left(H^T L H \right) \text{ s.t. } H^T D H = I$$

  • If trying to each discrete membership values $\rightarrow$ NP-hard
  • Relaxing the discrete membership, allowing them to be in $R^k$
  • $Y = D^{1/2} H$ and solve the following $$\displaystyle \min_{Y\in R^{n\times k}} \text{Trace}\left(Y^T \left[ D^{-1/2} L D^{-1/2}\right] Y\right) \text{ s.t. } Y^T Y = I$$
  • Then, use K-Means to get discrete cluster memberships

Distance between two clusters in complete link: $$\text{dist}(\omega_i, \omega_j) = \max(\text{dist}(x, x') \ | \ x \in \omega_i, x' \in \omega_j)$$

  • Distance-based
  • Modifying the distance between clusters as follows
    • If two clusters are affected by a positive constraint $\rightarrow$ set their distance to be $0$.
    • If two clusters are affected by a negative constraint $\rightarrow$ set their distance to be $\infty$

Constrained Spectral Clustering

  • $A$ affinity matrix:
    • $a_{ij}$ is related to the similarity between points $i$ and $j$
  • $D$ is a diagonal matrix
    • $d_{ii}= \sum_{j=1}^n a_{ij}$
    • $d_{max}$
  • $B$ normalized additive normalization matrix
    • $N = \frac{1}{d_{max}} \left(A + d_{max} I - D \right)$
  • Impose Constraints:
    • ML: set $a_{ij}$ and $a_{ji}$ to 1
    • CL: set $a_{ij}$ and $a_{ji}$ to 0
    • No further propagation

Avoiding Bias usin Constrained Clustering

Coordinated Conditional Information Bottleneck (CCIB) [Gondek and Hufmann, 2004]

  • Conditional Mutual Information $$MI(A;B|C) = MI(A;B,C) - MI(A;C)$$

  • Optimzation problem $$P^*_{C|X} = \displaystyle \underset{P_{C|X}\in \mathcal{P}}{\mathrm{argmax}} MI(C; Y|Z)$$

  • Hierarchical clustering with average-link
  • Choose two pairs of clusters to be merged:
    • The closest pair $q_1, q_2$
    • The dissimilar pair $o_1, o_2$
    • Based on the ratio of distances $\frac{d(q_1,q_2)}{d(o_1,o_2)}$:
      • if $\frac{d(q_1,q_2)}{d(o_1,o_2)} < \text{threshold}$ do not merge $o_1, o_2$
      • otherwise merge both pairs

Alternative Clustering

  • Given an already partiotioning of data $\Omega_{avoid}$
  • Generate a new partitioning $\Omega_{alt}$ that is different from $\Omega_{avoid}$

Soft Constrained K-means (SCKM)

Ares et al. 2009

  • Constraints:
    • May-link $MayL(x,y)$: points x,y are likely to be in the same cluster
    • May-not-link $MayNL(x,y)$: points x,y are unlikely to be in the same cluster
  • Constraint properties:

    • Not symmetrical
    • Not transitive
  • Cluster assignement

    • Calculate a score for a point to be assigned to each cluster
    • The score is initialized as the similiarity of point with the cluster centroid
    • The score is actualized based on the number of $MayL$ and $MayNL$ links that each candidate assigment obeys or infringes
      • increase the score by $\beta$ if it obeys a $MayL$ constraint
      • decrease the score by $\beta$ if it infinges a $MayNL$ constraint
    • Finally, the point is assiged to the cluster with highest score

Extracting constraints

Automatic constraint extraction

  • Random selection of pairs, with user feedback
  • Using social tags

In [ ]: