A SEMANTICALLY RICH FRAMEWORK TO ENABLE REAL-TIME KNOWLEDGE EXTRACTION AND CLASSIFICATION FROM SHORT LENGTH SEMI-STRUCTURED DOCUMENTS

Author/Creator

Author/Creator ORCID

Date

2021-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

Regulatory bodies have power or control in a domain or sphere that they monitor and administrate. To ensure the smooth and secure operation of their sphere, authorities formulate policies and rules governing the domain which the other organizations and individuals, operating in that sphere, must comply with. The knowledge about the Authority's policies and rules is typically maintained as a large volume of unstructured text data in books, laws, and regulations, academic and scientific reports, etc. Most of these text documents are often not machine-processable. Hence it is hard to find relevant information from these texts quickly. Extracting and categorizing knowledge from the text of these numerous authority documents requires significant manual effort and time and organizations often spend significant resources in complying with the authority controls. Organizations that adhere to the authority policies, often refer to short sections of the authority's documents in the documents they create for their internal consumption or for their clients. However, these short sections in the referring documents do not include the full context of that section in the authority document. Thus, a person relying on the referring document must manually reference the authority's document to determine the complete context of the authority. As both documents are not machine-processable, it is difficult to determine the context of the referring section in real-time.We propose a semantically rich framework to extract and classify the context of a short text in real-time, to help enable users that update their referential documents regularly based on the authority documents. An open challenge that we will address is automated text classification and identifying context from short text documents. Additionally, we will also populate the knowledge extracted from the authority and the referencing documents in the knowledge graphs. We use techniques from Semantic Web, Natural Language Processing, Machine Learning, and Deep Learning to build this framework. Our objectives include representing Knowledge in Cloud compliance or legal texts to create and populate a knowledge graph based on data protection regulations.