HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

Bhattarai, Manish; Barron, Ryan; Eren, Maksim; Vu, Minh; Grantcharov, Vesselin; Boureima, Ismael; Stanev, Valentin; Matuszek, Cynthia; Valtchinov, Vladimir; Rasmussen, Kim; Alexandrov, Boian

HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

dc.contributor.author	Bhattarai, Manish
dc.contributor.author	Barron, Ryan
dc.contributor.author	Eren, Maksim
dc.contributor.author	Vu, Minh
dc.contributor.author	Grantcharov, Vesselin
dc.contributor.author	Boureima, Ismael
dc.contributor.author	Stanev, Valentin
dc.contributor.author	Matuszek, Cynthia
dc.contributor.author	Valtchinov, Vladimir
dc.contributor.author	Rasmussen, Kim
dc.contributor.author	Alexandrov, Boian
dc.date.accessioned	2025-01-22T21:24:55Z
dc.date.available	2025-01-22T21:24:55Z
dc.date.issued	2024-12-05
dc.description.abstract	Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
dc.description.uri	http://arxiv.org/abs/2412.04661
dc.format.extent	12 pages
dc.genre	journal articles
dc.genre	preprints
dc.identifier	doi:10.13016/m2dhtr-mnnl
dc.identifier.uri	https://doi.org/10.48550/arXiv.2412.04661
dc.identifier.uri	http://hdl.handle.net/11603/37426
dc.language.iso	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rights	Public Domain
dc.rights.uri	https://creativecommons.org/publicdomain/mark/1.0/
dc.subject	Computer Science - Artificial Intelligence
dc.subject	Computer Science - Information Retrieval
dc.subject	UMBC Interactive Robotics and Language Lab
dc.title	HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
dc.type	Text
dcterms.creator	https://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2412.04661v1.pdf
Size:: 3.34 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection