Elluri, LavanyaJoshi, Karuna PandeKotal, Anantaa2020-12-142020-12-142020-12-13L. Elluri, K. Pande Joshi and A. Kotal, "Measuring Semantic Similarity across EU GDPR Regulation and Cloud Privacy Policies," 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 3963-3978, doi: 10.1109/BigData50022.2020.9377864.http://hdl.handle.net/11603/20257https://doi.org/10.1109/BigData50022.2020.93778647th International Workshop on Privacy and Security of Big Data (PSBD 2020), in conjunction with 2020 IEEE International Conference on Big Data (IEEE BigData 2020)2020 IEEE International Conference on Big Data (Big Data), 10-13 December 2020, Atlanta, GA, USAData protection authorities formulate policies and rules which the service providers have to comply with to ensure security and privacy when they perform Big Data analytics using users Personally Identifiable Information (PII). The knowledge contained in the data regulations and organizational privacy policies are typically maintained as short unstructured text in HTML or PDF formats. Hence it is an open challenge to determine the specific regulation rules that are being addressed by a provider’s privacy policies. We have developed a semantically rich framework, using techniques from Semantic Web and Natural Language Processing, to extract and compare the context of a short text in real-time. This framework allows automated incremental text comparison and identifying context from short text policy documents by determining the semantic similarity score and extracting semantically similar key terms. Additionally, we also created a knowledge graph to store the semantically similar comparison results while evaluating our framework across EU GDPR and privacy policies of 20 organizations complying with this regulation associated with various categories apply to Big Data stored in the cloud. Our approach can be utilized by Big Data practitioners to update their referential documents regularly based on the authority documents.10 pagesen-USThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.© 2020 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.UMBC Ebiquity Research GroupMeasuring Semantic Similarity across EU GDPR Regulation and Cloud Privacy PoliciesText