CRIPP-VQA: Counterfactual reasoning about implicit physical properties via video question answering

dc.contributor.authorPatel, Maitreya
dc.contributor.authorGokhale, Tejas
dc.contributor.authorBaral, Chitta
dc.contributor.authorYang, Yezhou
dc.date.accessioned2024-02-27T22:51:17Z
dc.date.available2024-02-27T22:51:17Z
dc.date.issued2022-12
dc.descriptionProceedings of the 2022 Conference on Empirical Methods in Natural Language, Abu Dhabi, United Arab Emirates, December 7-11, 2022
dc.description.abstractVideos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings – videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).
dc.description.sponsorshipThis work was supported by NSF RI grants #1750082, #1816039 and #2132724, and the DARPA GAILA ADAM project.
dc.description.urihttps://aclanthology.org/2022.emnlp-main.670/
dc.format.extent15 pages
dc.genreconference papers and proceedings
dc.identifierdoi:10.13016/m2yrda-veot
dc.identifier.citationMaitreya Patel, Tejas Gokhale, Chitta Baral, and Yezhou Yang. 2022. CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9856–9870, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
dc.identifier.urihttps://doi.org/10.18653/v1/2022.emnlp-main.670
dc.identifier.urihttp://hdl.handle.net/11603/31730
dc.publisherACL
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rightsCC BY 4.0 DEED Attribution 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleCRIPP-VQA: Counterfactual reasoning about implicit physical properties via video question answering
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2022.emnlp-main.670.pdf
Size:
3.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: