Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

dc.contributor.authorBarron, Ryan
dc.contributor.authorEren, Maksim
dc.contributor.authorSerafimova, Olga M.
dc.contributor.authorMatuszek, Cynthia
dc.contributor.authorAlexandrov, Boian S.
dc.date.accessioned2025-04-01T14:55:21Z
dc.date.available2025-04-01T14:55:21Z
dc.date.issued2025-02-27
dc.descriptionThe 20th International Conference on Artificial Intelligence and Law (ICAIL 2025)
dc.description.abstractAgentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs), represents a transformative technology applicable to specialized domains such as legal systems, research, recommender systems, cybersecurity, and global security, including proliferation research. This technology excels at inferring relationships within vast unstructured or semi-structured datasets. The legal domain here comprises complex data characterized by extensive, interrelated, and semi-structured knowledge systems with complex relations. It comprises constitutions, statutes, regulations, and case law. Extracting insights and navigating the intricate networks of legal documents and their relations is crucial for effective legal research. Here, we introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF), to enhance legal information retrieval and AI reasoning and minimize hallucinations. In the legal system, these technologies empower AI agents to identify and analyze complex connections among cases, statutes, and legal precedents, uncovering hidden relationships and predicting legal trends-challenging tasks that are essential for ensuring justice and improving operational efficiency. Our system employs web scraping techniques to systematically collect legal texts, such as statutes, constitutional provisions, and case law, from publicly accessible platforms like Justia. It bridges the gap between traditional keyword-based searches and contextual understanding by leveraging advanced semantic representations, hierarchical relationships, and latent topic discovery. This framework supports legal document clustering, summarization, and cross-referencing, for scalable, interpretable, and accurate retrieval for semi-structured data while advancing computational law and AI.
dc.description.sponsorshipThis research was funded by the U.S. Department of Energy National Nuclear Security Administration's Office of Defense Nuclear Nonproliferation Research and Development (DNN R&D), supported by the U.S. DOE NNSA under Contract No. 89233218CNA000001,as well as by the LANL Institutional Computing Program.
dc.description.urihttp://arxiv.org/abs/2502.20364
dc.format.extent10 pages
dc.genreconference papers and proceedings
dc.genrepreprints
dc.identifierdoi:10.13016/m2phyd-f2cx
dc.identifier.urihttps://doi.org/10.48550/arXiv.2502.20364
dc.identifier.urihttp://hdl.handle.net/11603/37887
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/
dc.subjectlaw
dc.subjectllm
dc.subjecttopic labeling
dc.subjectchain of thought
dc.subjectComputer Science
dc.subjectUMBC Interactive Robotics and Language Lab
dc.subjectArtificial Intelligence
dc.subjectlegal knowledge
dc.subjectprompt tuning
dc.subjectnmf
dc.subjectinformation retrieval
dc.titleBridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization
dc.typeText
dcterms.creatorhttps://orcid.org/0009-0005-5045-9527
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bridging_Legal.pdf
Size:
6.58 MB
Format:
Adobe Portable Document Format