Constrained Clustering

Forms of constraints:

Must-Link $ML(a,b)$
- also refered to as positive constraint
Cannot-Link $CL(a,b)$
- also known as negative constraint

Properties of constraints:

Transitive property: $ML(a,b)\ \&\ ML(b,c)\ \rightarrow\ ML(a,c)$
Symmetric
Reflexive

$CL(a,b) \ \&\ ML(b,c) \ \rightarrow \ CL(a,c)$

Applications and Advantages

Easy t o account for domain knowledge and different kinds of clues
No need to have lots of constraints, even a small number could be effective

Obtaining (extracting) constrinats

manual approach: asking a human expert whether pairs of items must be linked or not
automatic approach: using internal or external clues

Robustness of constrained clustering algorithms

effect of different sets of constraints

Constraint feasibility

related to the correctness of constraints
Example of infeasible constraints:
- $X=\{x_1, x_2, x_3\}$ with $k=2$
- Constraints: $\{CL\{x_1,x_2\}, CL\{x_2,x_3\}, \{x_1,x_3\}\}$
Testing the feasibility
- positive constraints: $P$ problem
- negative constraints: $NP$ problem

Characterizing constraint

Informativeness: measure the amount of information gained by constraints that could not ba gained by clustering itself
- low informativeness indicates little value for constraints
Coherence: measure the agreement between constraints w.r.t. a distance measure; i.e. nearby points are not part of Cannot-Link, etc.
- low coherence causes confusion for the clustering algorithm
High informtauveness and high coherence $\rightarrow$ high gain

Constrained Clustering Algorithms

constraint-based methods directly altering the clustering flow
distance-based methods learning a distance metric that accomodates for the given constraints.

Constrained K-Means by Wagstaff et al. (2001)

In the cluster assignment phase of K-means algorithm, for any point, check if it violates the constraints or not.

Positive constraints can be easily satisfied
Blindly assigns points correspond to a positive (Must-link) constraint to the same cluster
There can be a situation in which a set of negative (Cannot-link) constraints can make the clustering infeasible

Pairwise Constrained K-Means (PCKM) by Basu et al. (2004)

A new objective function $\phi_{pckm}$

$$\phi_{pckm}(\Omega) = SSE(\Omega) + \text{penalty}_{ML}(\Omega) + \text{penalty}_{CL} (\Omega)$$

$SSE=\sum \sum_{}^{} |x_i - \omega_j|$: compactness of clusters
$\text{penalty}_{ML}= \omega_{ij} 1[l_i \ne l_j]$
$\text{penalty}_{CL}= \bar{\omega}_{ij} 1[l_i = l_j]$

HMRF K-Means

HMRF = Hidden Markov Random Field

A hybrid approach, between constraint-based and distance-based

$$\phi_{hmrfkm}(\Omega) = \sum \sum D(x,\omega_j) + \sum_{x_i,x_j\in ML} \omega_{ij} \xi_D 1[l_i \ne l_j] + \\ \sum_{x_i,x_j\in CL} \bar{\omega}_{ij} \left(\xi(D_{\max}) - \xi_D(x_i,x_j)\right) 1[l_i = l_j] + \\ \log Z$$

Constrained Normalized Cut

Spectral clustering

Transform the data into a weighted graph $G=(V,E,W)$ where the set of vertices $V=\{v_1, v_2, .. v_n\}$ represent the original data points, and weights of edges represent the similarity between pairs of points.

Graph-cut problem: partition he graph into k connected components such that the weigth of edges, $Cut(C_a, C_b)=\displaystyle \sum_{i\in C_a, j\in C_b} w_{ij}$

Normalized Cut For a clustering $C=\{C_1, C_2, ..C_k\}$, the NCut value is defined by $NCut(C) = \displaystyle \sum_{j=1}^k \frac{Cut(C_j,\bar{C}_j)}{Vol(C_j)}$

Volume of a partition $Vol(C_a)=\displaystyle \sum_{i\in C_a} \sum_{j=1}^{n} w_{ij}$

Membership Matrix $H=[h_{ij}]_{n\times k}$ where $h_{ij} = \left\{\begin{array}{cc}\displaystyle \frac{1}{\sqrt{Vol(C_j)}} & \text{if }i\in C_j\\ 0 & \text{otherwise}\end{array}\right.$
Diagonal Matrix $D=[d_{ij}]_{n\times n}$ where $d_{ii} = \text{degree}(i)=\sum_{j=1}^n w_{ij}$
Laplacian Matrix $L = D - W$

$$\displaystyle \min_{C_1, C_2, ..C_k} \text{Trace}\left(H^T L H \right) \text{ s.t. } H^T D H = I$$

If trying to each discrete membership values $\rightarrow$ NP-hard
Relaxing the discrete membership, allowing them to be in $R^k$
$Y = D^{1/2} H$ and solve the following $$\displaystyle \min_{Y\in R^{n\times k}} \text{Trace}\left(Y^T \left[ D^{-1/2} L D^{-1/2}\right] Y\right) \text{ s.t. } Y^T Y = I$$
Then, use K-Means to get discrete cluster memberships

Constrained Complete Link

Distance between two clusters in complete link: $$\text{dist}(\omega_i, \omega_j) = \max(\text{dist}(x, x') \ | \ x \in \omega_i, x' \in \omega_j)$$

Distance-based
Modifying the distance between clusters as follows
- If two clusters are affected by a positive constraint $\rightarrow$ set their distance to be $0$.
- If two clusters are affected by a negative constraint $\rightarrow$ set their distance to be $\infty$

Constrained Spectral Clustering

$A$ affinity matrix:
- $a_{ij}$ is related to the similarity between points $i$ and $j$
$D$ is a diagonal matrix
- $d_{ii}= \sum_{j=1}^n a_{ij}$
- $d_{max}$
$B$ normalized additive normalization matrix
- $N = \frac{1}{d_{max}} \left(A + d_{max} I - D \right)$
Impose Constraints:
- ML: set $a_{ij}$ and $a_{ji}$ to 1
- CL: set $a_{ij}$ and $a_{ji}$ to 0
- No further propagation

Avoiding Bias usin Constrained Clustering

Coordinated Conditional Information Bottleneck (CCIB) [Gondek and Hufmann, 2004]

Conditional Mutual Information $$MI(A;B|C) = MI(A;B,C) - MI(A;C)$$
Optimzation problem $$P^*_{C|X} = \displaystyle \underset{P_{C|X}\in \mathcal{P}}{\mathrm{argmax}} MI(C; Y|Z)$$

Constrained Orthogonal Average Link Algorithm (COALA)

Hierarchical clustering with average-link
Choose two pairs of clusters to be merged:
- The closest pair $q_1, q_2$
- The dissimilar pair $o_1, o_2$
- Based on the ratio of distances $\frac{d(q_1,q_2)}{d(o_1,o_2)}$:
  - if $\frac{d(q_1,q_2)}{d(o_1,o_2)} < \text{threshold}$ do not merge $o_1, o_2$
  - otherwise merge both pairs

Alternative Clustering

Given an already partiotioning of data $\Omega_{avoid}$
Generate a new partitioning $\Omega_{alt}$ that is different from $\Omega_{avoid}$

Soft Constrained K-means (SCKM)

Ares et al. 2009

Constraints:
- May-link $MayL(x,y)$: points x,y are likely to be in the same cluster
- May-not-link $MayNL(x,y)$: points x,y are unlikely to be in the same cluster
Constraint properties:
- Not symmetrical
- Not transitive
Cluster assignement
- Calculate a score for a point to be assigned to each cluster
- The score is initialized as the similiarity of point with the cluster centroid
- The score is actualized based on the number of $MayL$ and $MayNL$ links that each candidate assigment obeys or infringes
  - increase the score by $\beta$ if it obeys a $MayL$ constraint
  - decrease the score by $\beta$ if it infinges a $MayNL$ constraint
- Finally, the point is assiged to the cluster with highest score

Extracting constraints

Automatic constraint extraction

Random selection of pairs, with user feedback
Using social tags



In [ ]: