INFORMATION RETRIEVAL IN LARGE LANGUAGE MODELS

Author/Creator

Author/Creator ORCID

Date

2023-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

Large Language Models (LLMs) have been used to retrieve information. While this is an exciting opportunity to reduce costs in performing user studies and speeding up social science research (which is often bottlenecked by user study costs), this also opens up opportunities for harm. In particular, LLMs have been shown to generate random outputs. In this project, we aim to investigate how skewed LLMs are in their demographic predictions of the US population by comparing them to PEW research surveys. For this research, we are using one of the biggest and most commonly used LLMs, GPT 3.5 (text-DaVinci-003). I hope that this result sheds light on the gap that exists in information retrieval with LLMs and shows some room for more future work as we try to improve the use of LLMs, in this case, ChatGPT.