A Robust Federated Learning against Cyber Intrusions to Ensure Data Confidentiality and Model Integrity

Ovi, Pretom Roy

A Robust Federated Learning against Cyber Intrusions to Ensure Data Confidentiality and Model Integrity

dc.contributor.advisor	Gangopadhyay, Aryya AG
dc.contributor.author	Ovi, Pretom Roy
dc.contributor.department	Information Systems
dc.contributor.program	Information Systems
dc.date.accessioned	2024-09-06T14:30:50Z
dc.date.available	2024-09-06T14:30:50Z
dc.date.issued	2024/01/01
dc.description.abstract	Federated Learning (FL), a type of distributed machine learning, enables collaborative model building among a large number of participants without revealing sensitive data to the central server. In FL, data remains locally on workers’ devices, and every participating worker shares only gradient updates or weight updates with the server after each round of local training. Because of its distributed architecture, FL has limited control over the local data and corresponding training processes. Therefore, it is susceptible to attacks like targeted data poisoning, in which attacked workers use poisonous training samples to train the local model, and the contributions (weight or gradient updates) from these compromised workers poison the global model, resulting in incorrect classifications. Therefore, safeguarding local workers from data poisoning attacks and also detecting those that have already been poisoned are crucial to build a robust federated learning framework. To address it, we propose a prevention strategy, namely `Confident Federated Learning', to prevent the workers from such poisoning attacks. According to our proposed approach, it involves a stratified verification step that validates the label quality of local training data by characterizing and identifying label errors in the local dataset. And experimental results on the MNIST, Fashion-MNIST, and CIFAR-10 datasets suggest that the proposed method can successfully detect the potential mislabeled training samples with above $85$\% accuracy and finally exclude them from local training to prevent data poisoning attacks. However, this strategy is effective with a certain percentage of poisonous local data. And so, in addition to the prevention strategy, we also propose a novel detection method that aims to create a class-wise cluster representation for every participating worker by utilizing the neurons’ activation maps of the local model and finally analyze the resulting clusters to filter out the attacked workers before model aggregation. We experimentally demonstrate the efficacy of our proposed detection strategy in detecting workers affected by data poisoning attacks, along with the attack types, e.g., label-flipping or dirty labeling. Secondly, due to the presence of a gradient sharing protocol as part of the training process, FL is at higher risk in the event of gradient inversion attacks that leak data confidentiality. The most alarming characteristic of this attack is its ability to operate covertly, seamlessly avoiding any degradation to the training performance. Such attacks allow attackers to backtrack from the gradients, eventually reconstructing the private training data. As a countermeasure, we propose a mixed quantization enabled FL scheme that is built upon the concepts of scaler quantization and dequantization to defend against the attack. We experimentally demonstrate the applicability and generalizability of our proposed defense algorithm across computer vision, natural language processing, and audio domains. We utilize eight different datasets of image, audio, and text modalities and cover both iteration and recursion driven gradient inversion attacks. The evaluation of our proposed FL on these benchmark datasets demonstrates the superiority of our method over the existing baseline defense approaches. In addition, our approach can also be considered a communication-efficient federated learning framework because it transforms the high-precision gradients into low-bit precision, resulting in faster training, less transmission bandwidth, and lower communication costs. This dissertation contributes to developing effective defense strategies to defend against these mentioned attack methods in federated learning for maintaining the purity and integrity of the machine learning models, with the aim of safeguarding data, models, and systems.
dc.format	application:pdf
dc.genre	dissertation
dc.identifier	doi:10.13016/m2g6e5-bvbq
dc.identifier.other	12927
dc.identifier.uri	http://hdl.handle.net/11603/36084
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.source	Original File Name: Ovi_umbc_0434D_12927.pdf
dc.title	A Robust Federated Learning against Cyber Intrusions to Ensure Data Confidentiality and Model Integrity
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ovi_umbc_0434D_12927.pdf
Size:: 25.5 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Ovi-Pretom_Roy_Open.pdf
Size:: 242.06 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Graduate School
UMBC Information Systems Department
UMBC Student Collection