Joint Models to Refine Knowledge Graphs

dc.contributor.advisorFinin, Tim
dc.contributor.advisorFerraro, Francis
dc.contributor.authorPadia, Ankur Sukhalal
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:49Z
dc.date.available2021-09-01T13:55:49Z
dc.date.issued2019-01-01
dc.description.abstractA knowledge graph can be viewed as a structural representation of beliefs with nodes and edges in which the nodes represent real-world entities or events and the edges are relations believed to hold between pairs of entities. Multiple levels of processes are involved in extracting such knowledge graphs from natural language text, starting with reading and understanding the text, then constructing a graph of the entities found and the relations between them, and inferring missing relations that are very likely to be true. However, current knowledge graph extraction systems are not perfect and make mistakes at each level of processing. These mistakes greatly reduce the utility of the extracted knowledge graph and motivate us to develop models to refine a knowledge graph at each level. In this dissertations, we demonstrate several joint models to refine a knowledge graph under different settings, ranging from validating inferred facts to extracting and justifying beliefs from text. We first consider verifying an existing knowledge graph without any additional or supporting provenance information. We develop unsupervised models using knowledge-enriched tensor factorization to determine the validity of the inferred facts by learning entity and relation embeddings. Compared to previous approaches, our model depends on neither external schema nor a corpus to guide the learning of the embeddings. Rather, it constrains the embeddings using graph structures computed using data driven approaches. We introduce four models, two quadratic and two linear in number of entities, and study the effect of incorporating graph structures. We also provide a convergence proof for one of our models, demonstrating that the linear model with more than two variables converges. Compared with other baselines we found the models with prior information to achieve better performance and generalization especially when the graph is very sparse. Secondly, we consider verifying an existing knowledge graph, but we assume that we may make use of text-based provenance. In the previous problem, we assumed the underlying knowledge graph does not contain errors. However it is rare to obtain such a good quality extraction from text. Hence, we explore the reading consistency of a machine to extract beliefs from given provenance sentences to construct a knowledge graph. We describe an approach to jointly determine if an existing knowledge graph belief was read consistently or not and suggest a potential fix when it was not read consistently. Unlike previous approaches, ours does not depend on opaque web search engines, does not make use of schema, and does not assume an ensemble of IE systems. By conducting experiments on different IE and human generated datasets, we found that most of the errors made by information extraction systems are due to choosing an incorrect relation given provenance information, and a simple model can perform comparably well with complex or more expressive model. As the errors made by IE systems are mostly lexical or syntactic in nature, the word order (or composability) can be ignored for the task. We finally consider how to verify beliefs represented in natural language. This deviates from the assumptions of our previous contributions---namely that here we are not working with tuples but rather the text used to generate tuples. Most of the current information extraction systems do not question the quality of the input sentences and process/extract facts from it. In case of misinformed articles, incorrect facts could be learned which might conflict with existing knowledge graph facts. To fill the gap we propose a novel model to determine the validity of the input sentence and provide interpretable, evidence justifications to explain the classifier's prediction. Compared to previous work, which are focused on specific datasets and use dataset specific heuristics, we focus on studying the effectiveness of frame-based semantics to narrow the search space of evidence sentences to provide better explanations and study the effect of utilizing discrete inference for claim validity task, and benefit of jointly learning the claim classification and explanation task. We found joint modeling performs better compared to single task. Also, better evidence sentences are retrieved when semantic-frames from FrameNet are considered achieving significant performance gain nearly double the performance in the retreival and the classification task.
dc.formatapplication:pdf
dc.genredissertations
dc.identifierdoi:10.13016/m23mvx-dwjb
dc.identifier.other12097
dc.identifier.urihttp://hdl.handle.net/11603/22903
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Padia_umbc_0434D_12097.pdf
dc.subjectembedding
dc.subjectinformation extraction
dc.subjectjoint model
dc.subjectKnowledge graph
dc.subjectrefine
dc.subjectverfication
dc.titleJoint Models to Refine Knowledge Graphs
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Padia_umbc_0434D_12097.pdf
Size:
3.74 MB
Format:
Adobe Portable Document Format