SAM-VQA: Supervised Attention-Based Visual Question Answering Model for Post-Disaster Damage Assessment on Remote Sensing Imagery
Loading...
Links to Files
Author/Creator ORCID
Date
2023-05-15
Type of Work
Department
Program
Citation of Original Publication
A. Sarkar, T. Chowdhury, R. Murphy, A. Gangopadhyay and M. Rahnemoonfar, "SAM-VQA: Supervised Attention-Based Visual Question Answering Model for Post-Disaster Damage Assessment on Remote Sensing Imagery," in IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2023.3276293.
Rights
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects
Abstract
Each natural disaster leaves a trail of destruction and damage that must be effectively managed to reduce its negative impact on human life. Any delay in making proper decisions at the post-disaster managerial level can increase human suffering and waste resources. Proper managerial decisions after any natural disaster rely on an appropriate assessment of damages using data-driven approaches, which are needed to be efficient, fast, and interactive. The goal of this study is to incorporate a deep interactive data-driven framework for proper damage assessment to speed up the response and recovery phases after a natural disaster. Hence, this article focuses on introducing and implementing the visual question answering (VQA) framework for post-disaster damage assessment based on drone imagery, namely supervised attention-based VQA (SAM-VQA). In VQA, query-based answers from images regarding the situation in disaster-affected areas can provide valuable information for decision-making. Unlike other computer vision tasks, VQA is more interactive and allows one to get instant and effective scene information by asking questions in natural language from images. In this work, we present a VQA dataset and propose a novel SAM-VQA framework for post-disaster damage assessment on remote sensing images. Our model outperforms state-of-the-art attention-based VQA techniques, including stacked attention networks (SANs) and multimodal factorized bilinear (MFB) with Co-Attention. Furthermore, our proposed model can derive appropriate visual attention based on questions to predict answers, making our approach trustworthy.