NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks
Loading...
Links to Files
Author/Creator ORCID
Date
2023-03-09
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
"CC0 1.0 Universal (CC0 1.0) Public Domain Dedication"
"CC0 1.0 Universal (CC0 1.0) Public Domain Dedication"
Subjects
Abstract
Deep Learning (DL) and Deep Neural Networks
(DNNs) are widely used in various domains. However, adversarial
attacks can easily mislead a neural network and lead to wrong
decisions. Defense mechanisms are highly preferred in safetycritical applications. In this paper, firstly, we use the gradient class activation map (GradCAM) to analyze the behavior
deviation of the VGG-16 network when its inputs are mixed
with adversarial perturbation or Gaussian noise. In particular,
our method can locate vulnerable layers that are sensitive to
adversarial perturbation and Gaussian noise. We also show that
the behavior deviation of vulnerable layers can be used to detect
adversarial examples. Secondly, we propose a novel NoiseCAM
algorithm that integrates information from globally and pixellevel weighted class activation maps. Our algorithm is highly
sensitive to adversarial perturbations and will not respond to
Gaussian random noise mixed in the inputs. Third, we compare
detecting adversarial examples using both behavior deviation and
NoiseCAM, and we show that NoiseCAM outperforms behavior
deviation modeling in its overall performance. Our work could
provide a useful tool to defend against certain types of adversarial
attacks on deep neural networks.