A Robust Federated Learning against Cyber Intrusions to Ensure Data Confidentiality and Model Integrity

dc.contributor.advisorGangopadhyay, Aryya AG
dc.contributor.authorOvi, Pretom Roy
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2024-09-06T14:30:50Z
dc.date.available2024-09-06T14:30:50Z
dc.date.issued2024/01/01
dc.description.abstractFederated Learning (FL), a type of distributed machine learning, enables collaborative model building among a large number of participants without revealing sensitive data to the central server. In FL, data remains locally on workers’ devices, and every participating worker shares only gradient updates or weight updates with the server after each round of local training. Because of its distributed architecture, FL has limited control over the local data and corresponding training processes. Therefore, it is susceptible to attacks like targeted data poisoning, in which attacked workers use poisonous training samples to train the local model, and the contributions (weight or gradient updates) from these compromised workers poison the global model, resulting in incorrect classifications. Therefore, safeguarding local workers from data poisoning attacks and also detecting those that have already been poisoned are crucial to build a robust federated learning framework. To address it, we propose a prevention strategy, namely `Confident Federated Learning', to prevent the workers from such poisoning attacks. According to our proposed approach, it involves a stratified verification step that validates the label quality of local training data by characterizing and identifying label errors in the local dataset. And experimental results on the MNIST, Fashion-MNIST, and CIFAR-10 datasets suggest that the proposed method can successfully detect the potential mislabeled training samples with above $85$\% accuracy and finally exclude them from local training to prevent data poisoning attacks. However, this strategy is effective with a certain percentage of poisonous local data. And so, in addition to the prevention strategy, we also propose a novel detection method that aims to create a class-wise cluster representation for every participating worker by utilizing the neurons’ activation maps of the local model and finally analyze the resulting clusters to filter out the attacked workers before model aggregation. We experimentally demonstrate the efficacy of our proposed detection strategy in detecting workers affected by data poisoning attacks, along with the attack types, e.g., label-flipping or dirty labeling. Secondly, due to the presence of a gradient sharing protocol as part of the training process, FL is at higher risk in the event of gradient inversion attacks that leak data confidentiality. The most alarming characteristic of this attack is its ability to operate covertly, seamlessly avoiding any degradation to the training performance. Such attacks allow attackers to backtrack from the gradients, eventually reconstructing the private training data. As a countermeasure, we propose a mixed quantization enabled FL scheme that is built upon the concepts of scaler quantization and dequantization to defend against the attack. We experimentally demonstrate the applicability and generalizability of our proposed defense algorithm across computer vision, natural language processing, and audio domains. We utilize eight different datasets of image, audio, and text modalities and cover both iteration and recursion driven gradient inversion attacks. The evaluation of our proposed FL on these benchmark datasets demonstrates the superiority of our method over the existing baseline defense approaches. In addition, our approach can also be considered a communication-efficient federated learning framework because it transforms the high-precision gradients into low-bit precision, resulting in faster training, less transmission bandwidth, and lower communication costs. This dissertation contributes to developing effective defense strategies to defend against these mentioned attack methods in federated learning for maintaining the purity and integrity of the machine learning models, with the aim of safeguarding data, models, and systems.
dc.formatapplication:pdf
dc.genredissertation
dc.identifierdoi:10.13016/m2g6e5-bvbq
dc.identifier.other12927
dc.identifier.urihttp://hdl.handle.net/11603/36084
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Ovi_umbc_0434D_12927.pdf
dc.titleA Robust Federated Learning against Cyber Intrusions to Ensure Data Confidentiality and Model Integrity
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ovi_umbc_0434D_12927.pdf
Size:
25.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ovi-Pretom_Roy_Open.pdf
Size:
242.06 KB
Format:
Adobe Portable Document Format
Description: