BACKDOOR ATTACKS IN COMPUTER VISION: TOWARDS ADVERSARIALLY ROBUST MACHINE LEARNING MODELS

Distribution Rights granted to UMBC by the author.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Subjects

Adversarial Robustness
Backdoor Attacks
Computer Vision
Data Poisoning
Machine Learning
Trustworthy Machine Learning

Abstract

Deep Neural Networks (DNNs) have become the standard building block in numerous machine learning applications, including computer vision, speech recognition, machine translation, and robotic manipulation, achieving state-of-the-art performance on complex tasks. The widespread success of these networks has driven their deployment in sensitive domains like health care, finance, autonomous driving, and defense-related applications. However, DNNs are vulnerable to adversarial attacks. An adversary is a person with malicious intent whose goal is to disrupt the normal functioning of a machine learning pipeline. Research has shown that an adversary can tamper with the training process of a model by injecting misrepresentative data (poisons) into the training set. The manipulation is done in a way that the victim’s model will malfunction only when a trigger modifies a test input. These are called backdoor attacks. For instance, a backdoored model in a self-driving car might work accurately for days before it suddenly fails to detect a pedestrian when the adversary decides to exploit the backdoor. Vulnerability to backdoor attacks is dangerous when deep learning models are deployed in safety-critical applications. This dissertations studies ways in which state-of-the-art deep learning methods for computer vision are vulnerable to backdoor attacks and proposes defense methods to remedy the vulnerabilities. We push the limits of our current understanding of backdoors and address the following research questions. Can we design practical backdoor attacks? We propose Hidden Trigger Backdoor Attack - a novel clean-label backdoor attack where the poisoned images do not contain a visible trigger. This enables the attacker to keep the trigger hidden until its use at test-time. Is it secure to train models on large-scale public data? Self-supervised learning (SSL) methods for vision have utilized large-scale unlabeled public data to learn rich visual representations. We show that if a small part of the unlabeled training data is poisoned, SSL methods are vulnerable to backdoor attacks. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. Can we design efficient and generalizable backdoor detection methods? We propose a backdoor detection method that optimizes for a set of images which, when forwarded through any model, indicates successfully whether the model contains a backdoor. Our "litmus” test for backdoored models improves on state-of-the-art methods without requiring access to clean data during detection. It is computationally efficient and generalizes to new triggers as well as new architectures.

BACKDOOR ATTACKS IN COMPUTER VISION: TOWARDS ADVERSARIALLY ROBUST MACHINE LEARNING MODELS

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract