Towards Comprehensive Benchmarking of Medical Vision Language Models

dc.contributor.authorKhatri, Dimple
dc.contributor.authorGupta, Sanjan TP
dc.date.accessioned2026-01-22T16:19:02Z
dc.date.issued2025-12-01
dc.description.abstractMedical imaging workflows integrate radiology images with their corresponding free-text reports. Large language models (LLMs) and large vision–language models (LVLMs) achieve strong results but face deployment barriers in hospitals due to computational demands, privacy risks and infrastructure needs. Small language models (SLMs) and small vision–language models (SVLMs), typically under 10 billion parameters, provide a more efficient and auditable alternative for on-premise, privacy-preserving applications in radiology. Recent advancements, including CheXzero, MedCLIP, XrayGPT, LLaVA-Med, MedFILIP and MedBridge, show that smaller multimodal models support classification, retrieval and report generation. Complementary baselines from lightweight SLMs such as DistilBERT, TinyBERT, BioClinicalBERT and T5-Small highlight opportunities for radiology report understanding.Building on these efforts, we propose a reproducible evaluation framework anchored on IU-CXR (for Indiana University Chest X-ray dataset), with potential extensions to CT, MRI and ophthalmology datasets. Our framework integrates task metrics such as ROUGE, F1-score and AUROC, together with efficiency measures including VRAM usage, latency, and model size; alongside trust dimensions like factuality, bias, and robustness. We also conduct ablation studies on model architecture, tokenizers and parameter-efficient fine-tuning (e.g. qLoRA), while analyzing trade-offs between accuracy, efficiency, and stability. This work establishes reproducible baselines and guidance for deploying radiology AI, while also advancing open-source research (available at https://github.com/dimplek0424/MedVLMBenchPhase1).
dc.description.urihttps://academic.oup.com/bib/article/26/Supplement_1/i44/8378055
dc.format.extent2 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2s6gm-0kh6
dc.identifier.citationKhatri, Dimple, and Sanjan TP Gupta. “Towards Comprehensive Benchmarking of Medical Vision Language Models.” Briefings in Bioinformatics 26, no. Supplement_1 (2025): i44-45. https://doi.org/10.1093/bib/bbaf631.077.
dc.identifier.urihttps://doi.org/10.1093/bib/bbaf631.077
dc.identifier.urihttp://hdl.handle.net/11603/41534
dc.language.isoen
dc.publisherOxford Academic
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Student Collection
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleTowards Comprehensive Benchmarking of Medical Vision Language Models
dc.typeText

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bbaf631.035.pdf
Size:
267.92 KB
Format:
Adobe Portable Document Format