Neurosymbolic Narrative Construction from Open Source Intelligence

Author/Creator

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

Breaking real-world events are typically reported via Open Source Intelligence (OSINT) sources such as news websites, blogs, and discussion forums. Search engines such as Google, Yahoo, and Bing aggregate this volume of information by relevance using a variety of ranking algorithms. End users manually parse search engine results pages and organize event incidents to form coherent explanations about associated themes and temporal relationships surrounding a queried topic. The process of piecing event incidents into a chronological and contextually-coherent sequence is known as narrative construction. Narrative construction across open events is generally a convoluted process, as events are reported and instantiated by potentially many users at a time, introducing noise such as misinformation, redundant or outdated data, and conflicting reporting. Conventional search engines lack construction abilities that address these challenges. The advent of generative Large Language Models (LLMs) such as ChatGPT and Gemini has revolutionized the synthesis of dynamic, disparate information during semantic search, diminishing the requirement for end users to manually parse results about topics of interest. This form of chatbot-based multi-document collation and summarization closely resembles the narrative construction process. Though this method provides potential benefits in condensing complex content and increasing overall event situational awareness, it still has several limitations: (i) difficult and sometimes impossible to integrate updated content dynamically, (ii) generation of non-factual, irrelevant, or incomplete information, (iii) high variability and subjectivity in responses, and (iv) lack of provenance, attribution, and trust for knowledge sources used to generate responses. This thesis presents a set of approaches in neurosymbolic representation and construction of disparately sourced event narratives. The primary contribution of this work is a novel Retrieval Augmented Generation (RAG) framework, FABULA, which implements a Web Ontology Language (OWL) schema as an intermediary plot representation to guide a decoder LLM to generate delineated event incidents as a constructed narrative about a queried topic. FABULA is composed of: (i) An Information Extraction (IE) approach for parsing unstructured event incidents, (ii) Integration of event incidents into a novel semantic web ontology, Event Narrative Ontology (ENO), and (iii) Schema-derived prompting during autoregressive narrative construction. Qualitative and quantitative approaches are used to evaluate the implementation of a plot schema as a search function. Experiments are performed across generic public news events and cybersecurity domain use cases.