Frequency-Aware Mixture of Experts Model for Robust Multimodal Perception

Khan, Azim

Frequency-Aware Mixture of Experts Model for Robust Multimodal Perception

dc.contributor.advisor	Gangopadhyay, Aryya
dc.contributor.author	Khan, Azim
dc.contributor.department	Information Systems
dc.contributor.program	Information Systems
dc.date.accessioned	2025-09-24T14:07:10Z
dc.date.issued	2025-01-01
dc.description.abstract	Robust multimodal perception is essential to understand real-world scenes, particularly under degraded, noisy, or low-visibility conditions. This dissertation introduces a Frequency-Aware Mixture-of-Experts model that combines structural features from the frequency domain with semantic and spatial representations in RGB, infrared (IR), text, and audio modalities. The work advances through a progression of perception tasks, beginning with single-modality perception, extending to vision-language modeling, and culminating in a four-modality adaptive model. We begin by addressing domain-specific perception using single-modality visual learning, which highlights the limitations of relying on a single source of information in complex environments. This motivates the integration of frequencydomain reasoning into multimodal architectures. In the next stage, we enhance vision-language modeling by introducing frequency-based low-rank features into pretrained visual encoders. These features provide noise-resilient representations while maintaining compatibility with language models, leading to improved performance in caption generation and visual question answering (VQA), particularly under visual degradation. Finally, we propose a hybrid Frequency-Aware Mixture-of-Experts (FreqMoE) model that dynamically fuses RGB and IR image features, guided by synchronized text and audio signals. A frequency domain gating mechanism that computes reliability scores from log-magnitude spectral features and a feature-wise modulation module that adapts visual features based on fused semantic embeddings. To support this four-modality setup, we extend three public RGB-IR datasets—M3FD, RoadScene, and MSRS—by adding aligned textual and audio annotations. This results in a synchronized four-modality setup that includes RGB images, IR data, captions, and audio, without requiring new data collection. Experimental results demonstrate that our method outperforms state-of-the-art baselines in both detection and fusion quality metrics. Ablation studies further validate the contributions of frequency-aware gating and semantic conditioning. Our approach offers an interpretable and adaptive solution for robust cross-modal perception under real-world constraints.
dc.format	application:pdf
dc.genre	dissertation
dc.identifier	doi:10.13016/m2zyry-oslb
dc.identifier.other	13091
dc.identifier.uri	http://hdl.handle.net/11603/40268
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Information Systems Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.source	Original File Name: Khan_umbc_0434D_13091.pdf
dc.subject	Discreate Fourier Transform
dc.subject	Feature Modulation
dc.subject	Mixture of Experts
dc.subject	Multimodal AI
dc.subject	Singular Value Decomposition
dc.subject	Vision Language Model
dc.title	Frequency-Aware Mixture of Experts Model for Robust Multimodal Perception
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.

Files

License bundle

Now showing 1 - 1 of 1

Name:: Khan-Azim_Open.pdf
Size:: 1.57 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations
UMBC Graduate School
UMBC Information Systems Department
UMBC Student Collection