Neurosymbolic Narrative Construction from Open Source Intelligence

dc.contributor.advisorJoshi, Anupam
dc.contributor.authorRanade, Priyanka
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2024-09-06T14:28:05Z
dc.date.available2024-09-06T14:28:05Z
dc.date.issued2024/01/01
dc.description.abstractBreaking real-world events are typically reported via Open Source Intelligence (OSINT) sources such as news websites, blogs, and discussion forums. Search engines such as Google, Yahoo, and Bing aggregate this volume of information by relevance using a variety of ranking algorithms. End users manually parse search engine results pages and organize event incidents to form coherent explanations about associated themes and temporal relationships surrounding a queried topic. The process of piecing event incidents into a chronological and contextually-coherent sequence is known as narrative construction. Narrative construction across open events is generally a convoluted process, as events are reported and instantiated by potentially many users at a time, introducing noise such as misinformation, redundant or outdated data, and conflicting reporting. Conventional search engines lack construction abilities that address these challenges. The advent of generative Large Language Models (LLMs) such as ChatGPT and Gemini has revolutionized the synthesis of dynamic, disparate information during semantic search, diminishing the requirement for end users to manually parse results about topics of interest. This form of chatbot-based multi-document collation and summarization closely resembles the narrative construction process. Though this method provides potential benefits in condensing complex content and increasing overall event situational awareness, it still has several limitations: (i) difficult and sometimes impossible to integrate updated content dynamically, (ii) generation of non-factual, irrelevant, or incomplete information, (iii) high variability and subjectivity in responses, and (iv) lack of provenance, attribution, and trust for knowledge sources used to generate responses. This thesis presents a set of approaches in neurosymbolic representation and construction of disparately sourced event narratives. The primary contribution of this work is a novel Retrieval Augmented Generation (RAG) framework, FABULA, which implements a Web Ontology Language (OWL) schema as an intermediary plot representation to guide a decoder LLM to generate delineated event incidents as a constructed narrative about a queried topic. FABULA is composed of: (i) An Information Extraction (IE) approach for parsing unstructured event incidents, (ii) Integration of event incidents into a novel semantic web ontology, Event Narrative Ontology (ENO), and (iii) Schema-derived prompting during autoregressive narrative construction. Qualitative and quantitative approaches are used to evaluate the implementation of a plot schema as a search function. Experiments are performed across generic public news events and cybersecurity domain use cases.
dc.formatapplication:pdf
dc.genredissertation
dc.identifierdoi:10.13016/m2de0r-gq78
dc.identifier.other12924
dc.identifier.urihttp://hdl.handle.net/11603/36077
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Ranade_umbc_0434D_12924.pdf
dc.subjectInformation Retrieval
dc.subjectKnowledge Representation
dc.subjectLarge Language Models
dc.subjectNarratives
dc.subjectNeruosymbolic
dc.subjectSemantic Web
dc.titleNeurosymbolic Narrative Construction from Open Source Intelligence
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ranade_umbc_0434D_12924.pdf
Size:
3.73 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ranade-Priyanka_Lim.pdf
Size:
242.9 KB
Format:
Adobe Portable Document Format
Description: