A Survey on Efficient Vision-Language Models

Shinde, Gaurav; Ravi, Anuradha; Dey, Emon; Sakib, Shadman; Rampure, Milind; Roy, Nirmalya

A Survey on Efficient Vision-Language Models

dc.contributor.author	Shinde, Gaurav
dc.contributor.author	Ravi, Anuradha
dc.contributor.author	Dey, Emon
dc.contributor.author	Sakib, Shadman
dc.contributor.author	Rampure, Milind
dc.contributor.author	Roy, Nirmalya
dc.date.accessioned	2025-06-05T14:02:46Z
dc.date.available	2025-06-05T14:02:46Z
dc.date.issued	2025-04-13
dc.description.abstract	Vision-language models (VLMs) integrate visual and textual information, enabling a wide range of applications such as image captioning and visual question answering, making them crucial for modern AI systems. However, their high computational demands pose challenges for real-time applications. This has led to a growing focus on developing efficient vision language models. In this survey, we review key techniques for optimizing VLMs on edge and resource-constrained devices. We also explore compact VLM architectures, frameworks and provide detailed insights into the performance-memory trade-offs of efficient VLMs. Furthermore, we establish a GitHub repository at https://github.com/MPSCUMBC/Efficient-Vision-Language-Models-A-Survey to compile all surveyed papers, which we will actively update. Our objective is to foster deeper research in this area.
dc.description.sponsorship	This work has been partially supported by NSF CAREER Award 1750936 NSF REU Site Grant 2050999 NSF CNS EAGER Grant 2233879 and ONR Grant N000142312119
dc.description.uri	http://arxiv.org/abs/2504.09724
dc.format.extent	35 pages
dc.genre	journal articles
dc.genre	preprints
dc.identifier	doi:10.13016/m2vhh3-8yac
dc.identifier.uri	https://doi.org/10.48550/arXiv.2504.09724
dc.identifier.uri	http://hdl.handle.net/11603/38594
dc.language.iso	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Information Systems Department
dc.relation.ispartof	UMBC Center for Real-time Distributed Sensing and Autonomy
dc.relation.ispartof	UMBC Student Collection
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	UMBC Mobile, Pervasive and Sensor Computing Lab (MPSC Lab)
dc.subject	Computer Science - Computer Vision and Pattern Recognition
dc.title	A Survey on Efficient Vision-Language Models
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-1290-0378
dcterms.creator	https://orcid.org/0009-0007-5268-4562

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2504.09724v1.pdf
Size:: 1.34 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Faculty Collection
UMBC Center for Real-time Distributed Sensing and Autonomy
UMBC Information Systems Department
UMBC Student Collection