Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation

Jiang, Yuxuan; Ferraro, Francis

Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation

dc.contributor.author	Jiang, Yuxuan
dc.contributor.author	Ferraro, Francis
dc.date.accessioned	2025-01-22T21:25:27Z
dc.date.available	2025-01-22T21:25:27Z
dc.date.issued	2024-12-30
dc.description.abstract	Recently, Large Language Models (LLMs) have shown impressive performance in character understanding tasks, such as analyzing the roles, personalities, and relationships of fictional characters. However, the extensive pre-training corpora used by LLMs raise concerns that they may rely on memorizing popular fictional works rather than genuinely understanding and reasoning about them. In this work, we argue that 'gist memory'-capturing essential meaning - should be the primary mechanism for character understanding tasks, as opposed to 'verbatim memory' - exact match of a string. We introduce a simple yet effective method to mitigate mechanized memorization in character understanding evaluations while preserving the essential implicit cues needed for comprehension and reasoning. Our approach reduces memorization-driven performance on popular fictional works from 96% accuracy to 72% and results in up to an 18% drop in accuracy across various character understanding tasks. These findings underscore the issue of data contamination in existing benchmarks, which often measure memorization rather than true character understanding.
dc.description.uri	http://arxiv.org/abs/2412.14368
dc.format.extent	17 pages
dc.genre	journal articles
dc.genre	preprints
dc.identifier	doi:10.13016/m24iq1-qjfx
dc.identifier.uri	https://doi.org/10.48550/arXiv.2412.14368
dc.identifier.uri	http://hdl.handle.net/11603/37488
dc.language.iso	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Computer Science - Computation and Language
dc.title	Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation
dc.type	Text
dcterms.creator	https://orcid.org/0000-0003-2413-9368
dcterms.creator	https://orcid.org/0009-0007-8488-3056

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2412.14368v3.pdf
Size:: 438.1 KB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection
UMBC Student Collection