Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models

Khan, Md Azim; Gangopadhyay, Aryya; Wang, Jianwu; Erbacher, Robert F.

Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models

dc.contributor.author	Khan, Md Azim
dc.contributor.author	Gangopadhyay, Aryya
dc.contributor.author	Wang, Jianwu
dc.contributor.author	Erbacher, Robert F.
dc.date.accessioned	2025-04-23T20:31:54Z
dc.date.available	2025-04-23T20:31:54Z
dc.date.issued	2025-03-08
dc.description.abstract	Situational awareness applications rely heavily on real-time processing of visual and textual data to provide actionable insights. Vision language models (VLMs) have become essential tools for interpreting complex environments by connecting visual inputs with natural language descriptions. However, these models often face computational challenges, especially when required to perform efficiently in real environments. This research presents a novel vision language model (VLM) framework that leverages frequency domain transformations and low-rank adaptation (LoRA) to enhance feature extraction, scalability, and efficiency. Unlike traditional VLMs, which rely solely on spatial-domain representations, our approach incorporates Discrete Fourier Transform (DFT) based low-rank features while retaining pretrained spatial weights, enabling robust performance in noisy or low visibility scenarios. We evaluated the proposed model on caption generation and Visual Question Answering (VQA) tasks using benchmark datasets with varying levels of Gaussian noise. Quantitative results demonstrate that our model achieves evaluation metrics comparable to state-of-the-art VLMs, such as CLIP ViT-L/14 and SigLIP. Qualitative analysis further reveals that our model provides more detailed and contextually relevant responses, particularly for real-world images captured by a RealSense camera mounted on an Unmanned Ground Vehicle (UGV).
dc.description.sponsorship	This work is supported by U.S. Army Grant No. W911NF2120076
dc.description.uri	https://arxiv.org/abs/2503.06003
dc.format.extent	8 pages
dc.genre	journal articles
dc.genre	preprints
dc.identifier	doi:10.13016/m2ppw2-prf3
dc.identifier.uri	https://doi.org/10.48550/arXiv.2503.06003
dc.identifier.uri	http://hdl.handle.net/11603/38085
dc.language.iso	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Joint Center for Earth Systems Technology (JCET)
dc.relation.ispartof	UMBC Information Systems Department
dc.relation.ispartof	UMBC Student Collection
dc.relation.ispartof	UMBC Center for Accelerated Real Time Analysis
dc.relation.ispartof	UMBC Center for Real-time Distributed Sensing and Autonomy
dc.relation.ispartof	UMBC GESTAR II
dc.relation.ispartof	UMBC College of Engineering and Information Technology Dean's Office
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rights	Public Domain
dc.rights.uri	https://creativecommons.org/publicdomain/mark/1.0/
dc.subject	UMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subject	UMBC Center for Cybersecurity
dc.subject	UMBC Big Data Analytics Lab
dc.title	Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-7553-7932
dcterms.creator	https://orcid.org/0000-0002-9933-1170

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2503.06003v1.pdf
Size:: 2.71 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Center for Accelerated Real Time Analysis
UMBC Center for Real-time Distributed Sensing and Autonomy
UMBC College of Engineering and Information Technology Dean's Office
UMBC Faculty Collection
UMBC GESTAR II