Constrained KMeans with Background Knowledge

Modified k-means to account for background knowledge in form of instance-level constraints
Applying COBWEB to real-world dataset

COP K-means Algorithm

COPKmeans(Dataset $D$, $ML$ constraints, $CL$ constraints)

Let $c_1, .. c_k$ be initial centroids
Assign each data point to its closest centroid such that
- it does not violate ML constraints
- it does not violate CL constraints
- if no such cluster is found, clustering has failed
Update the new cluster centroids by averaging ffeature vectors of the cluster members

Rand index
Held-out: to see if contraints generalize well or not
- 10-fold cross validation
- Generate random constraints in 9 folds
- Cluster the entire dataset
- Evaluate the clustering solution on 10th fold

$\mathcal{M}$: set of must-link constraints
$\mathcal{C}$: set of cannot-link constraints
$W$: the weight of each must-link constraint
$\bar{W}$: the weight of each cannot constraint
defining the cost:
- if $(x_i,x_j)\in \mathcal{M}$ then the cost is $w_{ij}\mathbb{1}(l_i \ne l_j)$
- if $(x_i,x_j)\in \mathcal{C}$ then the cost is $\bar{w}_{ij}\mathbb{1}(l_i = l_j)$ $$\displaystyle \mathcal{J}_{pckm} = \frac{1}{2} \sum_{x_i\in \mathcal{X}} \|x_i - \mu_{l_i}\|^2 \\ + \sum_{(x_i,x_j)\in \mathcal{M}}w_{ij}\mathbb{1}(l_i \ne l_j) \\ +\sum_{(x_i,x_j)\in \mathcal{C}}\bar{w}_{ij}\mathbb{1}(l_i = l_j)$$

PCKmeans
- Greedy optimization of $\mathcal{J}_{pckm}$
- Iterations similar to Kmeans algorithm, with modified cluster assignment
- For text documents, used a Spherical Kmeans (SPKmeans)
  - SPKmeans: a varient of kmeans more efficient for sparse data vectors
  - Kmeans: $\mathcal{O}(nkd)$
  - SPKmeans: $\mathcal{O}(nl)$ where $l$ is the number of non-zeor elements in input data matrix

PCKmeans algorithm
- Initialization
  - Using transitive closure must-link constraints, create $\lambda$ neighborhoods ${N_p}_{p=1}^\lambda$
  - Find the initial centroids based on $\lambda$
    - if $\lambda>k$, choose the centroid k largest neighborhoods
    - if $\lambda<k$:
      - look for point $x$ that cannot-link to the neighborhood sets
      - still need more, choose randomly from $\mathcal{X}$
- Repeat until convergence
  - cluster_assignment step:
    - assign each data point to the cluster $h^*$ that minimizes $\mathcal{J}_{pckm}$
    - $h^* = \displaystyle \underset{h}{\mathrm{argmin}}(J_{pckm}(x,h))$
  - estimate_means step



In [ ]: