Constrained Coclustering for Textual Documents

dc.contributor.authorSong, Yangqiu
dc.contributor.authorPan, Shimei
dc.contributor.authorLiu, Shixia
dc.contributor.authorWei, Furu
dc.contributor.authorZhou, Michelle
dc.contributor.authorQian, Weihong
dc.date.accessioned2025-06-05T14:03:45Z
dc.date.available2025-06-05T14:03:45Z
dc.date.issued2010-07-03
dc.descriptionProceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010
dc.description.abstractIn this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.
dc.description.urihttps://ojs.aaai.org/index.php/AAAI/article/view/7680
dc.format.extent6 pages
dc.genreconference papers and proceedings
dc.genrepostprints
dc.identifierdoi:10.13016/m2dv6n-2amh
dc.identifier.citationSong, Yangqiu, Shimei Pan, Shixia Liu, Furu Wei, Michelle Zhou, and Weihong Qian. “Constrained Coclustering for Textual Documents.” Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 3, 2010): 581–86. https://doi.org/10.1609/aaai.v24i1.7680.
dc.identifier.urihttps://doi.org/10.1609/aaai.v24i1.7680
dc.identifier.urihttp://hdl.handle.net/11603/38750
dc.language.isoen_US
dc.publisherAAAI
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectsemi-supervised learning
dc.titleConstrained Coclustering for Textual Documents
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5989-8543

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ConstrainedCoclusteringforTextualDocuments.pdf
Size:
642.23 KB
Format:
Adobe Portable Document Format