Towards Comprehensive Benchmarking of Medical Vision Language Models

Khatri, Dimple, and Sanjan TP Gupta. “Towards Comprehensive Benchmarking of Medical Vision Language Models.” Briefings in Bioinformatics 26, no. Supplement_1 (2025): i44-45. https://doi.org/10.1093/bib/bbaf631.077.

Rights

Attribution 4.0 International

Abstract

Medical imaging workflows integrate radiology images with their corresponding free-text reports. Large language models (LLMs) and large vision–language models (LVLMs) achieve strong results but face deployment barriers in hospitals due to computational demands, privacy risks and infrastructure needs. Small language models (SLMs) and small vision–language models (SVLMs), typically under 10 billion parameters, provide a more efficient and auditable alternative for on-premise, privacy-preserving applications in radiology. Recent advancements, including CheXzero, MedCLIP, XrayGPT, LLaVA-Med, MedFILIP and MedBridge, show that smaller multimodal models support classification, retrieval and report generation. Complementary baselines from lightweight SLMs such as DistilBERT, TinyBERT, BioClinicalBERT and T5-Small highlight opportunities for radiology report understanding.Building on these efforts, we propose a reproducible evaluation framework anchored on IU-CXR (for Indiana University Chest X-ray dataset), with potential extensions to CT, MRI and ophthalmology datasets. Our framework integrates task metrics such as ROUGE, F1-score and AUROC, together with efficiency measures including VRAM usage, latency, and model size; alongside trust dimensions like factuality, bias, and robustness. We also conduct ablation studies on model architecture, tokenizers and parameter-efficient fine-tuning (e.g. qLoRA), while analyzing trade-offs between accuracy, efficiency, and stability. This work establishes reproducible baselines and guidance for deploying radiology AI, while also advancing open-source research (available at https://github.com/dimplek0424/MedVLMBenchPhase1).

Towards Comprehensive Benchmarking of Medical Vision Language Models

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract