Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution

Saxena, Yash; Bommireddy, Raviteja; Padia, Ankur; Gaur, Manas

Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution

dc.contributor.author	Saxena, Yash
dc.contributor.author	Bommireddy, Raviteja
dc.contributor.author	Padia, Ankur
dc.contributor.author	Gaur, Manas
dc.date.accessioned	2026-02-12T16:44:15Z
dc.date.issued	2025-12-18
dc.description	NeurIPS 2025 LLM Evaluation Workshop, December 7th, 2025, San Diego, CA
dc.description.abstract	Trustworthy Large Language Models (LLMs) must cite human-verifiable sources in high-stakes domains such as healthcare, law, academia, and finance, where even small errors can have severe consequences. Practitioners and researchers face a choice: let models generate citations during decoding, or let models draft answers first and then attach appropriate citations. To clarify this choice, we introduce two paradigms: Generation-Time Citation (G-Cite), which produces the answer and citations in one pass, and Post-hoc Citation (P-Cite), which adds or verifies citations after drafting. We conduct a comprehensive evaluation from zero-shot to advanced retrieval-augmented methods across four popular attribution datasets and provide evidence-based recommendations that weigh trade-offs across use cases. Our results show a consistent trade-off between coverage and citation correctness, with retrieval as the main driver of attribution quality in both paradigms. P-Cite methods achieve high coverage with competitive correctness and moderate latency, whereas G-Cite methods prioritize precision at the cost of coverage and speed. We recommend a retrieval-centric, P-Cite-first approach for high-stakes applications, reserving G-Cite for precision-critical settings such as strict claim verification. Our codes and human evaluation results are available at this https URL
dc.description.sponsorship	We gratefully acknowledge support from the UMBC Faculty Start-up, Cybersecurity Leadership-Exploratory Grant and the USISTEF Award. The opinions, conclusions, and recommendations expressed here are solely those of the authors and do not necessarily reflect the views of USISTEF or UMBC.
dc.description.uri	https://arxiv.org/abs/2509.21557
dc.format.extent	11 pages
dc.genre	conference papers and proceedings
dc.genre	preprints
dc.identifier	doi:10.13016/m2l19h-mtre
dc.identifier.uri	https://doi.org/10.48550/arXiv.2509.21557
dc.identifier.uri	http://hdl.handle.net/11603/41872
dc.language.iso	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	UMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subject	UMBC Ebiquity Research Group
dc.subject	UMBC KAI2 Knowledge-infused AI and Inference lab
dc.title	Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-5411-2230

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2509.21557v2.pdf
Size:: 605.7 KB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection
UMBC Student Collection