The Effect of Text Ambiguity on creating Policy Knowledge Graphs
| dc.contributor.author | Kotal, Anantaa | |
| dc.contributor.author | Joshi, Anupam | |
| dc.contributor.author | Joshi, Karuna | |
| dc.date.accessioned | 2021-08-24T18:51:36Z | |
| dc.date.available | 2021-08-24T18:51:36Z | |
| dc.date.issued | 2021-09-30 | |
| dc.description | IEEE International Conference on Big Data and Cloud Computing (BDCloud 2021) | en_US |
| dc.description.abstract | A growing number of web and cloud-based products and services rely on data sharing between consumers, service providers, and their subsidiaries and third parties. There is a growing concern around the security and privacy of data in such large-scale shared architectures. Most organizations have a human-written privacy policy that discloses all the ways that data is shared, stored, and used. The organizational privacy policies must also be compliant with government and administrative regulations. This raises a major challenge for providers as they try to launch new services. Thus they are moving towards a system of automatic policy maintenance and regulatory compliance. This requires extracting policy from text documents and representing it in a semi-structured, machine-processable framework. The most popular method to this end is extracting policy information into a Knowledge Graph (KG). There exists a significant body of work that converts text descriptions of regulations into policies expressed in languages such as OWL and XACML and is grounded in the control-based schema by using NLP approaches. In this paper, we show that the NLP-based approaches to extract knowledge from written policy documents and representing them in enforceable Knowledge Graphs fail when the text policies are ambiguous. Ambiguity can arise from lack of clarity, misuse of syntax, and/or the use of complex language. We describe a system to extract features from a policy document that affect its ambiguity and classify the documents based on the level of ambiguity present. We validate this approach using human annotators. We show that a large number of documents in a popular privacy policy corpus (OPP-115) are ambiguous. This affects the ability to automatically monitor privacy policies. We show that for policies that are more ambiguous according to our proposed measure, NLP-based text segment classifiers are less accurate. | en_US |
| dc.description.sponsorship | This research was partially supported by a DoD supplement to the NSF award 1747724, Phase I IUCRC UMBC: Center for Accelerated Real time Analytics (CARTA). We would like to thank all the survey respondents for their participation and contribution to our study. | en_US |
| dc.description.uri | https://ieeexplore.ieee.org/document/9644852 | en_US |
| dc.format.extent | 10 pages | en_US |
| dc.genre | conference papers and proceedings | en_US |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m2qt7r-unmb | |
| dc.identifier.citation | Kotal, Anantaa; Joshi, Anupam; Joshi, Karuna; The Effect of Text Ambiguity on creating Policy Knowledge Graphs; IEEE International Conference on Big Data and Cloud Computing (BDCloud 2021); | en_US |
| dc.identifier.uri | http://hdl.handle.net/11603/22670 | |
| dc.identifier.uri | https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00201 | |
| dc.language.iso | en_US | en_US |
| dc.publisher | IEEE | en_US |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Information Systems Department | |
| dc.rights | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works | |
| dc.subject | UMBC Ebiquity Research Group | |
| dc.title | The Effect of Text Ambiguity on creating Policy Knowledge Graphs | en_US |
| dc.type | Text | en_US |
| dcterms.creator | https://orcid.org/0000-0002-8641-3193 | en_US |
| dcterms.creator | https://orcid.org/0000-0002-6354-1686 | en_US |
