Joint Models to Refine Knowledge Graphs

Padia, Ankur Sukhalal

Joint Models to Refine Knowledge Graphs

dc.contributor.advisor	Finin, Tim
dc.contributor.advisor	Ferraro, Francis
dc.contributor.author	Padia, Ankur Sukhalal
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Computer Science
dc.date.accessioned	2021-09-01T13:55:49Z
dc.date.available	2021-09-01T13:55:49Z
dc.date.issued	2019-01-01
dc.description.abstract	A knowledge graph can be viewed as a structural representation of beliefs with nodes and edges in which the nodes represent real-world entities or events and the edges are relations believed to hold between pairs of entities. Multiple levels of processes are involved in extracting such knowledge graphs from natural language text, starting with reading and understanding the text, then constructing a graph of the entities found and the relations between them, and inferring missing relations that are very likely to be true. However, current knowledge graph extraction systems are not perfect and make mistakes at each level of processing. These mistakes greatly reduce the utility of the extracted knowledge graph and motivate us to develop models to refine a knowledge graph at each level. In this dissertations, we demonstrate several joint models to refine a knowledge graph under different settings, ranging from validating inferred facts to extracting and justifying beliefs from text. We first consider verifying an existing knowledge graph without any additional or supporting provenance information. We develop unsupervised models using knowledge-enriched tensor factorization to determine the validity of the inferred facts by learning entity and relation embeddings. Compared to previous approaches, our model depends on neither external schema nor a corpus to guide the learning of the embeddings. Rather, it constrains the embeddings using graph structures computed using data driven approaches. We introduce four models, two quadratic and two linear in number of entities, and study the effect of incorporating graph structures. We also provide a convergence proof for one of our models, demonstrating that the linear model with more than two variables converges. Compared with other baselines we found the models with prior information to achieve better performance and generalization especially when the graph is very sparse. Secondly, we consider verifying an existing knowledge graph, but we assume that we may make use of text-based provenance. In the previous problem, we assumed the underlying knowledge graph does not contain errors. However it is rare to obtain such a good quality extraction from text. Hence, we explore the reading consistency of a machine to extract beliefs from given provenance sentences to construct a knowledge graph. We describe an approach to jointly determine if an existing knowledge graph belief was read consistently or not and suggest a potential fix when it was not read consistently. Unlike previous approaches, ours does not depend on opaque web search engines, does not make use of schema, and does not assume an ensemble of IE systems. By conducting experiments on different IE and human generated datasets, we found that most of the errors made by information extraction systems are due to choosing an incorrect relation given provenance information, and a simple model can perform comparably well with complex or more expressive model. As the errors made by IE systems are mostly lexical or syntactic in nature, the word order (or composability) can be ignored for the task. We finally consider how to verify beliefs represented in natural language. This deviates from the assumptions of our previous contributions---namely that here we are not working with tuples but rather the text used to generate tuples. Most of the current information extraction systems do not question the quality of the input sentences and process/extract facts from it. In case of misinformed articles, incorrect facts could be learned which might conflict with existing knowledge graph facts. To fill the gap we propose a novel model to determine the validity of the input sentence and provide interpretable, evidence justifications to explain the classifier's prediction. Compared to previous work, which are focused on specific datasets and use dataset specific heuristics, we focus on studying the effectiveness of frame-based semantics to narrow the search space of evidence sentences to provide better explanations and study the effect of utilizing discrete inference for claim validity task, and benefit of jointly learning the claim classification and explanation task. We found joint modeling performs better compared to single task. Also, better evidence sentences are retrieved when semantic-frames from FrameNet are considered achieving significant performance gain nearly double the performance in the retreival and the classification task.
dc.format	application:pdf
dc.genre	dissertations
dc.identifier	doi:10.13016/m23mvx-dwjb
dc.identifier.other	12097
dc.identifier.uri	http://hdl.handle.net/11603/22903
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.source	Original File Name: Padia_umbc_0434D_12097.pdf
dc.subject	embedding
dc.subject	information extraction
dc.subject	joint model
dc.subject	Knowledge graph
dc.subject	refine
dc.subject	verfication
dc.title	Joint Models to Refine Knowledge Graphs
dc.type	Text
dcterms.accessRights	Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Padia_umbc_0434D_12097.pdf
Size:: 3.74 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Theses and Dissertations
UMBC Computer Science and Electrical Engineering Department
UMBC Graduate School
UMBC Student Collection