Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization
| dc.contributor.author | Barron, Ryan | |
| dc.contributor.author | Eren, Maksim | |
| dc.contributor.author | Serafimova, Olga M. | |
| dc.contributor.author | Matuszek, Cynthia | |
| dc.contributor.author | Alexandrov, Boian S. | |
| dc.date.accessioned | 2025-04-01T14:55:21Z | |
| dc.date.available | 2025-04-01T14:55:21Z | |
| dc.date.issued | 2025-02-27 | |
| dc.description | The 20th International Conference on Artificial Intelligence and Law (ICAIL 2025) | |
| dc.description.abstract | Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs), represents a transformative technology applicable to specialized domains such as legal systems, research, recommender systems, cybersecurity, and global security, including proliferation research. This technology excels at inferring relationships within vast unstructured or semi-structured datasets. The legal domain here comprises complex data characterized by extensive, interrelated, and semi-structured knowledge systems with complex relations. It comprises constitutions, statutes, regulations, and case law. Extracting insights and navigating the intricate networks of legal documents and their relations is crucial for effective legal research. Here, we introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF), to enhance legal information retrieval and AI reasoning and minimize hallucinations. In the legal system, these technologies empower AI agents to identify and analyze complex connections among cases, statutes, and legal precedents, uncovering hidden relationships and predicting legal trends-challenging tasks that are essential for ensuring justice and improving operational efficiency. Our system employs web scraping techniques to systematically collect legal texts, such as statutes, constitutional provisions, and case law, from publicly accessible platforms like Justia. It bridges the gap between traditional keyword-based searches and contextual understanding by leveraging advanced semantic representations, hierarchical relationships, and latent topic discovery. This framework supports legal document clustering, summarization, and cross-referencing, for scalable, interpretable, and accurate retrieval for semi-structured data while advancing computational law and AI. | |
| dc.description.sponsorship | This research was funded by the U.S. Department of Energy National Nuclear Security Administration's Office of Defense Nuclear Nonproliferation Research and Development (DNN R&D), supported by the U.S. DOE NNSA under Contract No. 89233218CNA000001,as well as by the LANL Institutional Computing Program. | |
| dc.description.uri | http://arxiv.org/abs/2502.20364 | |
| dc.format.extent | 10 pages | |
| dc.genre | conference papers and proceedings | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m2phyd-f2cx | |
| dc.identifier.uri | https://doi.org/10.48550/arXiv.2502.20364 | |
| dc.identifier.uri | http://hdl.handle.net/11603/37887 | |
| dc.language.iso | en_US | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.relation.ispartof | UMBC Student Collection | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.rights | This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law. | |
| dc.rights | Public Domain | |
| dc.rights.uri | https://creativecommons.org/publicdomain/mark/1.0/ | |
| dc.subject | law | |
| dc.subject | llm | |
| dc.subject | topic labeling | |
| dc.subject | chain of thought | |
| dc.subject | Computer Science | |
| dc.subject | UMBC Interactive Robotics and Language Lab | |
| dc.subject | Artificial Intelligence | |
| dc.subject | legal knowledge | |
| dc.subject | prompt tuning | |
| dc.subject | nmf | |
| dc.subject | information retrieval | |
| dc.title | Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0009-0005-5045-9527 | |
| dcterms.creator | https://orcid.org/0000-0003-1383-8120 |
Files
Original bundle
1 - 1 of 1
