Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
dc.contributor.author | Das, Nilanjana | |
dc.contributor.author | Raff, Edward | |
dc.contributor.author | Gaur, Manas | |
dc.date.accessioned | 2025-01-31T18:24:27Z | |
dc.date.available | 2025-01-31T18:24:27Z | |
dc.date.issued | 2024-12-20 | |
dc.description.abstract | Previous research on LLM vulnerabilities often relied on nonsensical adversarial prompts, which were easily detectable by automated methods. We address this gap by focusing on human-readable adversarial prompts, a more realistic and potent threat. Our key contributions are situation-driven attacks leveraging movie scripts to create contextually relevant, human-readable prompts that successfully deceive LLMs, adversarial suffix conversion to transform nonsensical adversarial suffixes into meaningful text, and AdvPrompter with p-nucleus sampling, a method to generate diverse, human-readable adversarial suffixes, improving attack efficacy in models like GPT-3.5 and Gemma 7B. Our findings demonstrate that LLMs can be tricked by sophisticated adversaries into producing harmful responses with human-readable adversarial prompts and that there exists a scope for improvement when it comes to robust LLMs. | |
dc.description.uri | http://arxiv.org/abs/2412.16359 | |
dc.format.extent | 18 pages | |
dc.genre | journal articles | |
dc.genre | preprints | |
dc.identifier | doi:10.13016/m2lz3f-h1gx | |
dc.identifier.uri | https://doi.org/10.48550/arXiv.2412.16359 | |
dc.identifier.uri | http://hdl.handle.net/11603/37607 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
dc.relation.ispartof | UMBC Data Science | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.rights | Attribution 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Computer Science - Artificial Intelligence | |
dc.subject | UMBC Ebiquity Research Group | |
dc.subject | UMBC Discovery, Research, and Experimental Analysis of Malware Lab (DREAM Lab) | |
dc.subject | UMBC Interactive Robotics and Language Lab (IRAL Lab) | |
dc.subject | Computer Science - Computation and Language | |
dc.title | Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context | |
dc.type | Text | |
dcterms.creator | https://orcid.org/0000-0002-9900-1972 | |
dcterms.creator | https://orcid.org/0000-0002-5411-2230 |
Files
Original bundle
1 - 1 of 1