HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

dc.contributor.authorBhattarai, Manish
dc.contributor.authorBarron, Ryan
dc.contributor.authorEren, Maksim
dc.contributor.authorVu, Minh
dc.contributor.authorGrantcharov, Vesselin
dc.contributor.authorBoureima, Ismael
dc.contributor.authorStanev, Valentin
dc.contributor.authorMatuszek, Cynthia
dc.contributor.authorValtchinov, Vladimir
dc.contributor.authorRasmussen, Kim
dc.contributor.authorAlexandrov, Boian
dc.date.accessioned2025-01-22T21:24:55Z
dc.date.available2025-01-22T21:24:55Z
dc.date.issued2024-12-05
dc.description.abstractRetrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
dc.description.urihttp://arxiv.org/abs/2412.04661
dc.format.extent12 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2dhtr-mnnl
dc.identifier.urihttps://doi.org/10.48550/arXiv.2412.04661
dc.identifier.urihttp://hdl.handle.net/11603/37426
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/
dc.subjectComputer Science - Artificial Intelligence
dc.subjectComputer Science - Information Retrieval
dc.subjectUMBC Interactive Robotics and Language Lab
dc.titleHEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2412.04661v1.pdf
Size:
3.34 MB
Format:
Adobe Portable Document Format