Semantically Distributed Robust Optimization for Vision-and-Language Inference

dc.contributor.authorGokhale, Tejas
dc.contributor.authorChaudhary, Abhishek
dc.contributor.authorBanerjee, Pratyay
dc.contributor.authorBaral, Chitta
dc.contributor.authorYang, Yezhou
dc.date.accessioned2025-06-05T14:02:57Z
dc.date.available2025-06-05T14:02:57Z
dc.date.issued2022-05
dc.description.abstractAnalysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms. While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored. In this paper, we present SDRO, a model-agnostic method that utilizes a set linguistic transformations in a distributed robust optimization setting, along with an ensembling technique to leverage these transformations during inference.Experiments on benchmark datasets with images (NLVR²) and video (VIOLIN) demonstrate performance improvements as well as robustness to adversarial attacks.Experiments on binary VQA explore the generalizability of this method to other V&L tasks.
dc.description.sponsorshipThis work was funded in part by National Science Foundation grants 2132724, 1816039 and 1750082, DARPA SAIL-ON program (W911NF2020006), and DARPA CHESS program (FA875019C0003)
dc.description.urihttps://aclanthology.org/2022.findings-acl.118/
dc.format.extent21 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2xc6r-lurk
dc.identifier.citationGokhale, Tejas, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. “Semantically Distributed Robust Optimization for Vision-and-Language Inference.” Edited by Smaranda Muresan, Preslav Nakov, and Aline Villavicencio. Findings of the Association for Computational Linguistics: ACL 2022, May 2022, 1493–1513. https://doi.org/10.18653/v1/2022.findings-acl.118.
dc.identifier.urihttps://doi.org/10.18653/v1/2022.findings-acl.118
dc.identifier.urihttp://hdl.handle.net/11603/38631
dc.language.isoen_US
dc.publisherACL
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/deed.en
dc.titleSemantically Distributed Robust Optimization for Vision-and-Language Inference
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2022.findingsacl.118.pdf
Size:
3.18 MB
Format:
Adobe Portable Document Format