The Keyword Explorer Suite: A Toolkit for Understanding Online Populations

Department

Program

Citation of Original Publication

Philip G Feldman, Shimei Pan, and James Foulds. 2023. The Keyword Explorer Suite: A Toolkit for Understanding Online Populations. In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI '23 Companion). Association for Computing Machinery, New York, NY, USA, 21–24. https://doi.org/10.1145/3581754.3584122

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Abstract

We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI’s GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3 embedding and manifold reduction. Corpora are then created to fine-tune GPT-2 models to explore latent information via prompt-based queries. These tools allow researchers and practitioners to gain valuable insights into population subgroups online.