A Survey on Efficient Vision-Language Models
dc.contributor.author | Shinde, Gaurav | |
dc.contributor.author | Ravi, Anuradha | |
dc.contributor.author | Dey, Emon | |
dc.contributor.author | Sakib, Shadman | |
dc.contributor.author | Rampure, Milind | |
dc.contributor.author | Roy, Nirmalya | |
dc.date.accessioned | 2025-06-05T14:02:46Z | |
dc.date.available | 2025-06-05T14:02:46Z | |
dc.date.issued | 2025-04-13 | |
dc.description.abstract | Vision-language models (VLMs) integrate visual and textual information, enabling a wide range of applications such as image captioning and visual question answering, making them crucial for modern AI systems. However, their high computational demands pose challenges for real-time applications. This has led to a growing focus on developing efficient vision language models. In this survey, we review key techniques for optimizing VLMs on edge and resource-constrained devices. We also explore compact VLM architectures, frameworks and provide detailed insights into the performance-memory trade-offs of efficient VLMs. Furthermore, we establish a GitHub repository at https://github.com/MPSCUMBC/Efficient-Vision-Language-Models-A-Survey to compile all surveyed papers, which we will actively update. Our objective is to foster deeper research in this area. | |
dc.description.sponsorship | This work has been partially supported by NSF CAREER Award 1750936 NSF REU Site Grant 2050999 NSF CNS EAGER Grant 2233879 and ONR Grant N000142312119 | |
dc.description.uri | http://arxiv.org/abs/2504.09724 | |
dc.format.extent | 35 pages | |
dc.genre | journal articles | |
dc.genre | preprints | |
dc.identifier | doi:10.13016/m2vhh3-8yac | |
dc.identifier.uri | https://doi.org/10.48550/arXiv.2504.09724 | |
dc.identifier.uri | http://hdl.handle.net/11603/38594 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Information Systems Department | |
dc.relation.ispartof | UMBC Center for Real-time Distributed Sensing and Autonomy | |
dc.relation.ispartof | UMBC Student Collection | |
dc.rights | Attribution 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | UMBC Mobile, Pervasive and Sensor Computing Lab (MPSC Lab) | |
dc.subject | Computer Science - Computer Vision and Pattern Recognition | |
dc.title | A Survey on Efficient Vision-Language Models | |
dc.type | Text | |
dcterms.creator | https://orcid.org/0000-0002-1290-0378 | |
dcterms.creator | https://orcid.org/0009-0007-5268-4562 |
Files
Original bundle
1 - 1 of 1