Cooking With Blocks : A Recipe for Visual Reasoning on Image-Pairs

dc.contributor.authorGokhale, Tejas
dc.contributor.authorSampat, Shailaja
dc.contributor.authorFang, Zhiyuan
dc.contributor.authorYang, Yezhou
dc.contributor.authorBaral, Chitta
dc.date.accessioned2025-06-05T14:03:46Z
dc.date.available2025-06-05T14:03:46Z
dc.date.issued2019
dc.descriptionIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019
dc.description.abstractThe ability of identifying changes or transformations in a scene and to reason about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.
dc.description.sponsorshipWe acknowledge support from NSF Grant 1816039.
dc.description.urihttps://openaccess.thecvf.com/content_CVPRW_2019/html/Vision_Meets_Cognition_Camera_Ready/Gokhale_Cooking_With_Blocks__A_Recipe_for_Visual_Reasoning_on_CVPRW_2019_paper.html
dc.format.extent4 pages
dc.genreconference papers and proceedings
dc.genrepostprints
dc.identifierdoi:10.13016/m2fcs9-up2u
dc.identifier.citationGokhale, Tejas, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, and Chitta Baral. “Cooking With Blocks : A Recipe for Visual Reasoning on Image-Pairs,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 5-8. https://openaccess.thecvf.com/content_CVPRW_2019/html/Vision_Meets_Cognition_Camera_Ready/Gokhale_Cooking_With_Blocks__A_Recipe_for_Visual_Reasoning_on_CVPRW_2019_paper.html
dc.identifier.urihttp://hdl.handle.net/11603/38753
dc.language.isoen_US
dc.publisherIEEE
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.titleCooking With Blocks : A Recipe for Visual Reasoning on Image-Pairs
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5593-2804

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GokhaleCookingWithBlock.pdf
Size:
422.69 KB
Format:
Adobe Portable Document Format