Towards Explainable Machine Learning Models for Remote Sensing: Multi-modal and Uni-modal Applications for Natural Disaster
Links to Files
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Information Systems
Program
Information Systems
Citation of Original Publication
Rights
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Distribution Rights granted to UMBC by the author.
Abstract
Natural disasters leave a path of devastation that must be managed efficiently to minimize their impact on human lives. Estimation of the damages and taking action based on those assessments are the most important two-step processes in post-disaster management efforts. With the recent progress of Artificial Intelligence (AI), many machine learning algorithms are utilizing to assess the damage. However, existing methods are less efficient and provide limited scene information. This thesis proposes a unique Vision-Language-Based multi-modal task namely Visual Question Answering (VQA) for efficient and comprehensive damage assessment.  VQA enables the extraction of diverse information from images through natural language queries. This high-level scene information has the potential to optimize decision support systems, leading to increased efficiency and a reduction in the time required for search and rescue operations. On the other hand, when incorporating machine learning models into smart decision support systems, the issue of explainability in model outcomes becomes significant. In remote sensing, visual content is complex, and the available contextual information is often limited compared to the overall size of the images. In such scenarios, the model's performance may be susceptible to shortcut learning and lead to misleading results. Thus, ensuring proper explanations for model outputs becomes crucial. Motivated by the above issues, this thesis is dedicated to addressing two crucial aspects of remote sensing applications. Firstly, this thesis focused on developing an image-based question-answering framework for efficient damage assessment on remote sensing imagery. Secondly, this thesis focused on enhancing the trustworthiness of model outcomes by developing novel machine learning frameworks designed for remote sensing applications in multi-modal and uni-modal contexts. To achieve the first goal, two unique large-scale benchmark visual question-answering datasets for damage assessment namely FloodNet-VQA and RescueNet-VQA are proposed. For proper visual explanations of model outcomes, this thesis proposed novel supervised attention modules for the VQA. Proposed supervised attention modules provide auxiliary supervision in the attention-obtaining process so that the model can learn where to focus in the image content for a question to provide a rational answer. Proposed approaches showed improved explanations and achieved higher accuracy compared to the state-of-the-art VQA algorithms. Finally, for consistent and robust visual explanations in a uni-modal remote sensing task (e.g., image classification) a novel strategy is proposed. Within this framework, two distinct losses have been proposed to ensure consistency and robustness in visual explanations.
