The Stability and Usability of Statistical Topic Models

Yang, Yi; Pan, Shimei; Lu, Jie; Topkara, Mercan; Song, Yangqiu

The Stability and Usability of Statistical Topic Models

dc.contributor.author	Yang, Yi
dc.contributor.author	Pan, Shimei
dc.contributor.author	Lu, Jie
dc.contributor.author	Topkara, Mercan
dc.contributor.author	Song, Yangqiu
dc.date.accessioned	2025-01-08T15:08:55Z
dc.date.available	2025-01-08T15:08:55Z
dc.date.issued	2016-07-20
dc.description.abstract	Statistical topic models have become a useful and ubiquitous tool for analyzing large text corpora. One common application of statistical topic models is to support topic-centric navigation and exploration of document collections. Existing work on topic modeling focuses on the inference of model parameters so the resulting model fits the input data. Since the exact inference is intractable, statistical inference methods, such as Gibbs Sampling, are commonly used to solve the problem. However, most of the existing work ignores an important aspect that is closely related to the end user experience: topic model stability. When the model is either re-trained with the same input data or updated with new documents, the topic previously assigned to a document may change under the new model, which may result in a disruption of end users’ mental maps about the relations between documents and topics, thus undermining the usability of the applications. In this article, we propose a novel user-directed non-disruptive topic model update method that balances the tradeoff between finding the model that fits the data and maintaining the stability of the model from end users’ perspective. It employs a novel constrained LDA algorithm to incorporate pairwise document constraints, which are converted from user feedback about topics, to achieve topic model stability. Evaluation results demonstrate the advantages of our approach over previous methods.
dc.description.sponsorship	The authors thank Doug Downey for his kind support and insightful discussions. The authors also thank the editors and anonymous reviewers for their comments and suggestions, which have significantly helped to improve the quality of this article.
dc.description.uri	https://dl.acm.org/doi/10.1145/2954002
dc.format.extent	23 pages
dc.genre	journal articles
dc.identifier	doi:10.13016/m21ft2-nazc
dc.identifier.citation	Yang, Yi, Shimei Pan, Jie Lu, Mercan Topkara, and Yangqiu Song. “The Stability and Usability of Statistical Topic Models.” ACM Trans. Interact. Intell. Syst. 6, no. 2 (July 20, 2016): 14:1-14:23. https://doi.org/10.1145/2954002.
dc.identifier.uri	https://doi.org/10.1145/2954002
dc.identifier.uri	http://hdl.handle.net/11603/37207
dc.language.iso	en_US
dc.publisher	ACM
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subject	Mental Map Disruption
dc.subject	Non-Disruptive Topic Model Update (nTMU)
dc.subject	Latent Dirichlet Allocation (LDA)
dc.subject	Document Clustering
dc.subject	Statistical Topic Models
dc.title	The Stability and Usability of Statistical Topic Models
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-5989-8543

Collections

UMBC Information Systems Department
UMBC Faculty Collection

The Stability and Usability of Statistical Topic Models

Files

Collections