Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context

dc.contributor.authorDas, Nilanjana
dc.contributor.authorRaff, Edward
dc.contributor.authorGaur, Manas
dc.date.accessioned2025-01-31T18:24:27Z
dc.date.available2025-01-31T18:24:27Z
dc.date.issued2024-12-20
dc.description.abstractPrevious research on LLM vulnerabilities often relied on nonsensical adversarial prompts, which were easily detectable by automated methods. We address this gap by focusing on human-readable adversarial prompts, a more realistic and potent threat. Our key contributions are situation-driven attacks leveraging movie scripts to create contextually relevant, human-readable prompts that successfully deceive LLMs, adversarial suffix conversion to transform nonsensical adversarial suffixes into meaningful text, and AdvPrompter with p-nucleus sampling, a method to generate diverse, human-readable adversarial suffixes, improving attack efficacy in models like GPT-3.5 and Gemma 7B. Our findings demonstrate that LLMs can be tricked by sophisticated adversaries into producing harmful responses with human-readable adversarial prompts and that there exists a scope for improvement when it comes to robust LLMs.
dc.description.urihttp://arxiv.org/abs/2412.16359
dc.format.extent18 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2lz3f-h1gx
dc.identifier.urihttps://doi.org/10.48550/arXiv.2412.16359
dc.identifier.urihttp://hdl.handle.net/11603/37607
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Data Science
dc.relation.ispartofUMBC Faculty Collection
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectComputer Science - Artificial Intelligence
dc.subjectUMBC Ebiquity Research Group
dc.subjectUMBC Discovery, Research, and Experimental Analysis of Malware Lab (DREAM Lab)
dc.subjectUMBC Interactive Robotics and Language Lab (IRAL Lab)
dc.subjectComputer Science - Computation and Language
dc.titleHuman-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-9900-1972
dcterms.creatorhttps://orcid.org/0000-0002-5411-2230

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2412.16359v1.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format