Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Loading...
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
2021-09-27
Type of Work
Department
Program
Citation of Original Publication
Tang, Xuejiao et al.; Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory; The 23rd International Conference on Big Data Analytics and Knowledge Discovery (DaWaK2021), September 27, 2021;
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced
visual scene understanding task with a wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task generally rely on pre-training
or exploiting memory with long dependency relationship encoded models. However, these approaches suffer from a lack of generalizability and prior knowledge. In this paper we propose a dynamic working memory based cognitive VCR
network, which stores accumulated commonsense between sentences to provide
prior knowledge for inference. Extensive experiments show that the proposed
model yields significant improvements over existing methods on the benchmark
VCR dataset. Moreover, the proposed model provides intuitive interpretation into
visual commonsense reasoning. A Python implementation of our mechanism is
publicly available at https://github.com/tanjatang/DMVCR