Scalable Collapsed Inference for High-Dimensional Topic Models

dc.contributor.authorIslam, Rashidul
dc.contributor.authorFoulds, James
dc.date.accessioned2019-10-22T15:13:21Z
dc.date.available2019-10-22T15:13:21Z
dc.date.issued2019-06
dc.descriptionProceedings of NAACL-HLT 2019, Minneapolis, Minnesota, June 2 - June 7, 2019.
dc.description.abstractThe bigger the corpus, the more topics it can potentially support. To truly make full use of massive text corpora, a topic model inference algorithm must therefore scale efficiently in 1) documents and 2) topics, while 3) achieving accurate inference. Previous methods have achieved two out of three of these criteria simultaneously, but never all three at once. In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsity to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance. We use a Monte Carlo inner loop in the online setting to approximate the collapsed variational Bayes updates in a sparse and efficient way, which we accomplish via the MetropolisHastings Walker method. We showcase our algorithm on LDA and the recently proposed mixed membership skip-gram topic model. Our method requires only amortized O(kd) computation per word token instead of O(K) operations, where the number of topics occurring for a particular document kd≪ the total number of topics in the corpus K, to converge to a high-quality solution.en_US
dc.description.urihttps://www.aclweb.org/anthology/N19-1291en_US
dc.format.extent10 pagesen_US
dc.genreconference papers and proceedingsen_US
dc.identifierdoi:10.13016/m2js9h-ztxk
dc.identifier.citationRashidul Islam, James Foulds, Scalable Collapsed Inference for High-Dimensional Topic Models, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), http://dx.doi.org/10.18653/v1/N19-1291en_US
dc.identifier.urihttp://dx.doi.org/10.18653/v1/N19-1291
dc.identifier.urihttp://hdl.handle.net/11603/15949
dc.language.isoen_USen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectdocumentsen_US
dc.subjectLDAen_US
dc.subjecthigh-quality solutionen_US
dc.subjecthigh-dimensional topic modelsen_US
dc.titleScalable Collapsed Inference for High-Dimensional Topic Modelsen_US
dc.typeTexten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
N19-1291.pdf
Size:
747.75 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: