Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution
| dc.contributor.author | Saxena, Yash | |
| dc.contributor.author | Bommireddy, Raviteja | |
| dc.contributor.author | Padia, Ankur | |
| dc.contributor.author | Gaur, Manas | |
| dc.date.accessioned | 2026-02-12T16:44:15Z | |
| dc.date.issued | 2025-12-18 | |
| dc.description | NeurIPS 2025 LLM Evaluation Workshop, December 7th, 2025, San Diego, CA | |
| dc.description.abstract | Trustworthy Large Language Models (LLMs) must cite human-verifiable sources in high-stakes domains such as healthcare, law, academia, and finance, where even small errors can have severe consequences. Practitioners and researchers face a choice: let models generate citations during decoding, or let models draft answers first and then attach appropriate citations. To clarify this choice, we introduce two paradigms: Generation-Time Citation (G-Cite), which produces the answer and citations in one pass, and Post-hoc Citation (P-Cite), which adds or verifies citations after drafting. We conduct a comprehensive evaluation from zero-shot to advanced retrieval-augmented methods across four popular attribution datasets and provide evidence-based recommendations that weigh trade-offs across use cases. Our results show a consistent trade-off between coverage and citation correctness, with retrieval as the main driver of attribution quality in both paradigms. P-Cite methods achieve high coverage with competitive correctness and moderate latency, whereas G-Cite methods prioritize precision at the cost of coverage and speed. We recommend a retrieval-centric, P-Cite-first approach for high-stakes applications, reserving G-Cite for precision-critical settings such as strict claim verification. Our codes and human evaluation results are available at this https URL | |
| dc.description.sponsorship | We gratefully acknowledge support from the UMBC Faculty Start-up, Cybersecurity Leadership-Exploratory Grant and the USISTEF Award. The opinions, conclusions, and recommendations expressed here are solely those of the authors and do not necessarily reflect the views of USISTEF or UMBC. | |
| dc.description.uri | https://arxiv.org/abs/2509.21557 | |
| dc.format.extent | 11 pages | |
| dc.genre | conference papers and proceedings | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m2l19h-mtre | |
| dc.identifier.uri | https://doi.org/10.48550/arXiv.2509.21557 | |
| dc.identifier.uri | http://hdl.handle.net/11603/41872 | |
| dc.language.iso | en | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.rights | Attribution 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | UMBC Accelerated Cognitive Cybersecurity Laboratory | |
| dc.subject | UMBC Ebiquity Research Group | |
| dc.subject | UMBC KAI2 Knowledge-infused AI and Inference lab | |
| dc.title | Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-5411-2230 |
Files
Original bundle
1 - 1 of 1
