Learning Explainable Models using Self-Supervised Learning

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Explainability
Explainable AI
Explainable models
Interpretability
Interpretation consistency
Self-supervised learning

Abstract

The last decade has witnessed an exponential rise in the research and deployment of Deep Neural Networks (DNNs) for widespread applications spanning various domains such as computer vision, natural language processing, speech recognition, statistical analysis, and most recently generative AI applications. Given such wide-ranging deployments impacting the day-to-day lives of people across the globe, it is imperative to develop mechanisms to understand the decision-making process of the underlying DNNs. Moreover, safety critical deployments such as medical diagnosis, self-driving cars, law enforcement applications make it crucial to be able to understand and explain each individual decision, rather than relying on them as black box algorithms. For computer vision applications such as image classification, various explainability algorithms have been introduced in the last few years for attributing DNN decisions back to the input image regions. In this dissertation, we scrutinize the reliability of existing explanation algorithms and push the state-of-the-art by introducing novel methods for learning explainable models such that they are not only accurate, but also explainable by design. We first study the reliability of existing explanation algorithms and observe that they might not always explain the true cause of a network's prediction. Although DNN decisions have been shown to be vulnerable to adversarial attacks, we show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction. We introduce our attack as a controlled setting to measure the accuracy of interpretation algorithms and benchmark the resiliency of explanation algorithms on ImageNet and PASCAL-VOC datasets. We then explore methods towards improving the interpretability of DNNs by learning explainable models. Obtaining annotations for explanations to train explanation algorithms is not trivial since the explanation depends on both the input and the model under consideration. To this end, we introduce an algorithm to improve the interpretability of deep neural networks for a given explanation method. Our method encourages the network to learn consistent interpretations together with maximizing the log-likelihood of the correct class. We also introduce new evaluation metrics to benchmark the quality of explanation heatmaps obtained by explanation algorithms and show that our method outperforms the baseline on ImageNet and MS-COCO datasets. Building upon this work, we introduce another novel method to train models to produce consistent explanations across image transformations. Self-supervised training has emerged as a viable alternative to supervised training in the absence of ground truth by leveraging large-scale unlabeled data to learn features that generalize across various tasks. Since obtaining the ground truth for a desired model explanation is not a well-defined task, we adopt ideas from contrastive self-supervised learning and apply them to the interpretations of the model rather than its embeddings. We perform extensive experiments and show that our method results in models with improved interpretability, while also acting as a regularizer and improving the accuracy on limited-data, fine-grained classification settings. We believe our methods will serve as a strong foundation and will encourage the community to develop models such that they are not just accurate, but also explainable by design.

Learning Explainable Models using Self-Supervised Learning

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract