Introduction

While spectral clustering is usually an unsupervised operation, there are circumstances in which we have prior belief that pairs of samples should (or should not) be assigned with the same cluster. Constrained spectral clustering aims to exploit this prior belief as constraint (or weak supervision) to influence the cluster formation so as to obtain a structure more closely resembling human perception. Two important issues remain open: (1) how to propagate sparse constraints effectively, (2) how to handle ill-conditioned/noisy constraints generated by imperfect oracles. In this paper we present a unified framework to address the above issues. Specifically, in contrast to existing constrained spectral clustering approaches that blindly rely on all features for constructing the spectral, our approach searches for neighbours driven by discriminative feature selection for more effective constraint diffusion. Crucially, we formulate a novel data-driven filtering approach to handle the noisy constraint problem, which has been unrealistically ignored in constrained spectral clustering literature.

Contribution Highlights

  • We formulate a novel discriminative-feature driven approach for effective sparse constraint propagation. Existing methods fundamentally ignore the role of feature selection in this problem.
  • We propose a data-driven method to filter potentially noisy constraints, a problem that is largely unaddressed by existing constrained clustering algorithms.

Citation

  1. Constrained Clustering with Imperfect Oracles
    X. Zhu, C. C. Loy, S. Gong
    IEEE Transactions on Neural Networks and Learning Systems, 2015 (TNNLS)
    [PDF]  
  2. Constrained Clustering: Effective Constraint Propagation with Imperfect Oracles
    X. Zhu, C. C. Loy, S. Gong
    in Proceedings of IEEE International Conference on Data Mining, 2013 (ICDM)
    [PDF]  

Images

Overview:

Our model, COP-RF, is capable of discovering data partitions close to the ground truth clusters despite it is provided only with sparse and noisy constraints. (a) Ground truth cluster formation, with invalid pairwise constraints highlighted in red colour; must- and cannot-links are represented by solid and dashed lines respectively; (b) the clustering result obtained using unsupervised clustering; (c) the result obtained using our method.

Overview of our approach:

We propose a model that can flexibly generate a constraint-aware affinity matrix, which is directly employed by existing spectral clustering methods as input for constrained clustering.

Dataset:

Example frames from the ERCe dataset. It contain six event classes including (a) Student Orientation, (b) Cleaning, (c) Career Fair, (d) Group Study, (e) Gun Forum, and (f) Scholarship Competition.

Comparison using Adjusted Rand Index (ARI):

Comparison of clustering performance between different methods given a varying number of perfect pairwise constraints.

Datasets