Towards Hiding Adversarial Examples from Network Interpretation

Files
Links to Files
https://arxiv.org/abs/1812.02843Permanent Link
http://hdl.handle.net/11603/14342Metadata
Show full item recordDate
2018-12-06Type of Work
10 pagesText
conference papers and proceedings preprints
Citation of Original Publication
Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash, Towards Hiding Adversarial Examples from Network Interpretation, Computer Vision and Pattern Recognition , 2018, https://arxiv.org/abs/1812.02843Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.Abstract
Deep networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be
extremely effective in causing misclassification. However,
these patches can be highlighted using standard network
interpretation algorithms, thus revealing the identity of the
adversary. We show that it is possible to create adversarial
patches which not only fool the prediction, but also change
what we interpret regarding the cause of prediction. We
show that our algorithms can empower adversarial patches,
by hiding them from network interpretation tools. We believe our algorithms can facilitate developing more robust
network interpretation tools that truly explain the network’s
underlying decision making process.