Privacy-Preserving Data Sharing Using Generative Models

dc.contributor.advisorJoshi, Anupam
dc.contributor.authorKotal, Anantaa
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2024-09-06T14:27:58Z
dc.date.available2024-09-06T14:27:58Z
dc.date.issued2024/01/01
dc.description.abstractThe modern era heavily relies on data for decision-making in areas like cybersecurity, healthcare, and social sciences. However, this abundance of data raises significant privacy issues, leading to regulations such as GDPR and COPPA that are designed to protect individual privacy but also create barriers to data access for researchers. Organizations, both private and governmental, collect vast amounts of data, but sharing this data is often restricted, especially in specialized fields like cybersecurity and healthcare. These privacy concerns hinder collaboration, limit data sharing, and ultimately impede research progress, making it challenging for researchers to leverage the full potential of available data. Consequently, there is a critical need to find ways to make privately collected data available for public use without compromising individual privacy. Existing methods to balance privacy and data utility, like cryptographic techniques, noise addition, and distributed modeling, often fall short, either by failing to provide strong privacy guarantees or by sacrificing data usability. Synthetic data generation offers a promising solution by creating artificial datasets that mirror the statistical properties of original data without exposing sensitive information. Generative models like GANs can produce such synthetic data while preserving underlying patterns and protecting privacy. However, generating realistic tabular data in specialized domains like cybersecurity and healthcare remains challenging due to the complexity of the data and the scarcity of diverse training samples. This work advances the development of privacy-preserving data generation using GANs, which are bound by privacy constraints to produce shareable datasets that protect privacy. Our technique has demonstrated that data generated in this manner can effectively replace original datasets for training machine learning models, with minimal accuracy loss when tested on the original data. We have also developed a novel model for data generation enhanced with domain-specific knowledge to improve the realism and accuracy of synthetic data. Furthermore, our prior research has shown how organizational policies constrain data sharing and how policy ambiguity complicates automatic enforcement. We developed synthetic data generation models that enforce privacy policies during the data generation process, ensuring compliance with regulations and reducing the risk of privacy breaches. By applying and validating these models across various domains, including cybersecurity and healthcare, we demonstrate their effectiveness in addressing privacy concerns while maintaining data utility. This research offers practical solutions for secure data sharing across different fields, advancing privacy-preserving data practices and supporting collaborative, data-driven research.
dc.formatapplication:pdf
dc.genredissertation
dc.identifierdoi:10.13016/m2lcpg-jcxw
dc.identifier.other12922
dc.identifier.urihttp://hdl.handle.net/11603/36069
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Kotal_umbc_0434D_12922.pdf
dc.subjectArtificial Intelligence
dc.subjectGenerative Model
dc.subjectPrivacy
dc.titlePrivacy-Preserving Data Sharing Using Generative Models
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kotal_umbc_0434D_12922.pdf
Size:
2.1 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kotal-Anantaa_Open.pdf
Size:
271.58 KB
Format:
Adobe Portable Document Format
Description: