Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing

dc.contributor.authorAshqar, Huthaifa
dc.contributor.authorJaber, Ahmed
dc.contributor.authorAlhadidi, Taqwa I.
dc.contributor.authorElhenawy, Mohammed
dc.date.accessioned2025-10-16T15:27:09Z
dc.date.issued2025-06-03
dc.description.abstractThis study aims to comprehensively review and empirically evaluate the application of multimodal large language models (MLLMs) and Large Vision Models (VLMs) in object detection for transportation systems. In the first fold, we provide a background about the potential benefits of MLLMs in transportation applications and conduct a comprehensive review of current MLLM technologies in previous studies. We highlight their effectiveness and limitations in object detection within various transportation scenarios. The second fold involves providing an overview of the taxonomy of end-to-end object detection in transportation applications and future directions. Building on this, we proposed empirical analysis for testing MLLMs on three real-world transportation problems that include object detection tasks, namely, road safety attribute extraction, safety-critical event detection, and visual reasoning of thermal images. Our findings provide a detailed assessment of MLLM performance, uncovering both strengths and areas for improvement. Finally, we discuss practical limitations and challenges of MLLMs in enhancing object detection in transportation, thereby offering a roadmap for future research and development in this critical area.
dc.description.urihttps://www.mdpi.com/2079-3197/13/6/133
dc.format.extent24 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2vp1u-qsdu
dc.identifier.citationAshqar, Huthaifa I., Ahmed Jaber, Taqwa I. Alhadidi, and Mohammed Elhenawy. “Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing.” Computation 13, no. 6 (2025): 133. https://doi.org/10.3390/computation13060133.
dc.identifier.urihttps://doi.org/10.3390/computation13060133
dc.identifier.urihttp://hdl.handle.net/11603/40439
dc.language.isoen
dc.publisherMDPI
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Data Science
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectintelligent transportation systems (ITS)
dc.subjectmultimodal large language models (MLLMs)
dc.subjectend-to-end object detection
dc.subjectautonomous driving
dc.subjectlarge vision models (VLMs)
dc.titleAdvancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-6835-8338

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
computation1300133.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format