MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

Gokhale, Tejas; Banerjee, Pratyay; Baral, Chitta; Yang, Yezhou

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

dc.contributor.author	Gokhale, Tejas
dc.contributor.author	Banerjee, Pratyay
dc.contributor.author	Baral, Chitta
dc.contributor.author	Yang, Yezhou
dc.date.accessioned	2025-06-05T14:02:45Z
dc.date.available	2025-06-05T14:02:45Z
dc.date.issued	2020-11
dc.description	Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
dc.description.abstract	While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.
dc.description.sponsorship	The authors acknowledge support from the NSF Robust Intelligence Program project #1816039, the DARPA KAIROS program (LESTAT project), the DARPA SAIL-ON program, and ONR award N00014-20-1-2332.
dc.description.uri	https://aclanthology.org/2020.emnlp-main.63/
dc.format.extent	15 pages
dc.genre	conference papers and proceedings
dc.identifier	doi:10.13016/m2wipi-badt
dc.identifier.citation	Gokhale, Tejas, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. “MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering.” Edited by Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), November 2020, 878–92. https://doi.org/10.18653/v1/2020.emnlp-main.63.
dc.identifier.uri	https://doi.org/10.18653/v1/2020.emnlp-main.63
dc.identifier.uri	http://hdl.handle.net/11603/38587
dc.language.iso	en_US
dc.publisher	ACL
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/deed.en
dc.title	MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2020.emnlpmain.63.pdf
Size:: 5.05 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department