Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context

Das, Nilanjana; Raff, Edward; Gaur, Manas

Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context

dc.contributor.author	Das, Nilanjana
dc.contributor.author	Raff, Edward
dc.contributor.author	Gaur, Manas
dc.date.accessioned	2025-01-31T18:24:27Z
dc.date.available	2025-01-31T18:24:27Z
dc.date.issued	2024-12-20
dc.description.abstract	Previous research on LLM vulnerabilities often relied on nonsensical adversarial prompts, which were easily detectable by automated methods. We address this gap by focusing on human-readable adversarial prompts, a more realistic and potent threat. Our key contributions are situation-driven attacks leveraging movie scripts to create contextually relevant, human-readable prompts that successfully deceive LLMs, adversarial suffix conversion to transform nonsensical adversarial suffixes into meaningful text, and AdvPrompter with p-nucleus sampling, a method to generate diverse, human-readable adversarial suffixes, improving attack efficacy in models like GPT-3.5 and Gemma 7B. Our findings demonstrate that LLMs can be tricked by sophisticated adversaries into producing harmful responses with human-readable adversarial prompts and that there exists a scope for improvement when it comes to robust LLMs.
dc.description.uri	http://arxiv.org/abs/2412.16359
dc.format.extent	18 pages
dc.genre	journal articles
dc.genre	preprints
dc.identifier	doi:10.13016/m2lz3f-h1gx
dc.identifier.uri	https://doi.org/10.48550/arXiv.2412.16359
dc.identifier.uri	http://hdl.handle.net/11603/37607
dc.language.iso	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Data Science
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Computer Science - Artificial Intelligence
dc.subject	UMBC Ebiquity Research Group
dc.subject	UMBC Discovery, Research, and Experimental Analysis of Malware Lab (DREAM Lab)
dc.subject	UMBC Interactive Robotics and Language Lab (IRAL Lab)
dc.subject	Computer Science - Computation and Language
dc.title	Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-9900-1972
dcterms.creator	https://orcid.org/0000-0002-5411-2230

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2412.16359v1.pdf
Size:: 1.78 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Data Science
UMBC Faculty Collection