Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution

dc.contributor.authorSaxena, Yash
dc.contributor.authorBommireddy, Raviteja
dc.contributor.authorPadia, Ankur
dc.contributor.authorGaur, Manas
dc.date.accessioned2026-02-12T16:44:15Z
dc.date.issued2025-12-18
dc.descriptionNeurIPS 2025 LLM Evaluation Workshop, December 7th, 2025, San Diego, CA
dc.description.abstractTrustworthy Large Language Models (LLMs) must cite human-verifiable sources in high-stakes domains such as healthcare, law, academia, and finance, where even small errors can have severe consequences. Practitioners and researchers face a choice: let models generate citations during decoding, or let models draft answers first and then attach appropriate citations. To clarify this choice, we introduce two paradigms: Generation-Time Citation (G-Cite), which produces the answer and citations in one pass, and Post-hoc Citation (P-Cite), which adds or verifies citations after drafting. We conduct a comprehensive evaluation from zero-shot to advanced retrieval-augmented methods across four popular attribution datasets and provide evidence-based recommendations that weigh trade-offs across use cases. Our results show a consistent trade-off between coverage and citation correctness, with retrieval as the main driver of attribution quality in both paradigms. P-Cite methods achieve high coverage with competitive correctness and moderate latency, whereas G-Cite methods prioritize precision at the cost of coverage and speed. We recommend a retrieval-centric, P-Cite-first approach for high-stakes applications, reserving G-Cite for precision-critical settings such as strict claim verification. Our codes and human evaluation results are available at this https URL
dc.description.sponsorshipWe gratefully acknowledge support from the UMBC Faculty Start-up, Cybersecurity Leadership-Exploratory Grant and the USISTEF Award. The opinions, conclusions, and recommendations expressed here are solely those of the authors and do not necessarily reflect the views of USISTEF or UMBC.
dc.description.urihttps://arxiv.org/abs/2509.21557
dc.format.extent11 pages
dc.genreconference papers and proceedings
dc.genrepreprints
dc.identifierdoi:10.13016/m2l19h-mtre
dc.identifier.urihttps://doi.org/10.48550/arXiv.2509.21557
dc.identifier.urihttp://hdl.handle.net/11603/41872
dc.language.isoen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectUMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subjectUMBC Ebiquity Research Group
dc.subjectUMBC KAI2 Knowledge-infused AI and Inference lab
dc.titleGeneration-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5411-2230

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2509.21557v2.pdf
Size:
605.7 KB
Format:
Adobe Portable Document Format