Mitigate: An Adaptive Network Data Anonymization Tool Using Condensation-Based Differential Privacy

dc.contributor.authorKarabatis, George
dc.contributor.authorChen, Zhiyuan
dc.contributor.authorAleroud, Ahmed
dc.date.accessioned2024-02-24T22:41:18Z
dc.date.available2024-02-24T22:41:18Z
dc.date.issued2022-03-14
dc.description.abstractModern network devices collect a large amount of data that can be analyzed to identify bottlenecks, anomalies, cyber-attacks, etc. Therefore, there is often a need to analyze such collections of network data quite often by an external expert or by the research community. However, these collections of data contain sensitive, proprietary information. In order for the network data to be shared, it must first be anonymized. The overall objective of this project is to develop an innovative privacy management tool to anonymize network data and achieve sufficient privacy, acceptable data utility, and efficient data analysis at the same time. No existing anonymization methods can achieve all of these at the same time. The core of this technology is a differential private clustering algorithm that provides strong privacy protection, preserves data properties important for subsequent analysis, and allows the party receiving the anonymized data to conduct analysis directly on anonymized data without the need of decryption or any extra processing. The research carried out was to design, implement and verify a solution to this problem by completing the following tasks: 1) developing the core technology; 2) developing a context based method that automatically recommends fields that must be anonymized; 3) conducted experiments showing superior results using our approach compared to existing tools, and 4) developed an intuitive but basic user interface. The research that was conducted generated novel algorithmic techniques that utilize state-of-theart methods such as condensation, differential privacy preservation, clustering, automated tuning based on contextual awareness, and recommendation techniques to specify columns to users for anonymization leading to optimal privacy that allows research analysis on the dataset. Experiments were conducted to evaluate the efficacy of these novel algorithmic techniques by performing analysis on original non-anonymized datasets, then conducting analysis on the same yet anonymized datasets and comparing the results of the analyses. Overall, the anonymized analysis results were within 1% of the original results, verifying that the generated technology not only guarantees a high level of privacy but also enables research analysis as if it were conducted on the original dataset. Potential applications of this technology include anonymization of any type of structured network datasets that contain sensitive identifiers, such as IP addresses, that can be used in multiple applications. For example, to create an AI or machine learning model for cyber security, e.g., to detect attacks, or for performance analysis, e.g., identify bottlenecks or predict performance. In addition, a market analysis that was conducted for potential applications of this technology identified a broader range of applications of our anonymization technology beyond the network sector that includes healthcare, banking, insurance, securities, finance (FISB), data brokering, cloud services, ad sales, and government.
dc.description.sponsorshipUSDOE Office of Science (SC)
dc.description.urihttps://www.osti.gov/biblio/1854575
dc.format.extent16 pages
dc.genreTechnical Reports
dc.genrep
dc.identifierdoi:10.13016/m2iwyf-2ljd
dc.identifier.urihttps://doi.org/10.2172/1854575
dc.identifier.urihttp://hdl.handle.net/11603/31695
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.titleMitigate: An Adaptive Network Data Anonymization Tool Using Condensation-Based Differential Privacy
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-2208-0801
dcterms.creatorhttps://orcid.org/0000-0002-6984-7248

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1854575.pdf
Size:
1.19 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: