Constrained Coclustering for Textual Documents
Links to Files
Collections
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Song, Yangqiu, Shimei Pan, Shixia Liu, Furu Wei, Michelle Zhou, and Weihong Qian. “Constrained Coclustering for Textual Documents.” Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 3, 2010): 581–86. https://doi.org/10.1609/aaai.v24i1.7680.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.
