Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

dc.contributor.authorJaradat, Shadi
dc.contributor.authorElhenawy, Mohammed
dc.contributor.authorNayak, Richi
dc.contributor.authorPaz, Alexander
dc.contributor.authorAshqar, Huthaifa
dc.contributor.authorGlaser, Sebastien
dc.date.accessioned2025-10-16T15:27:11Z
dc.date.issued2025-04-07
dc.description.abstractIn traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narratives by leveraging advanced Large Language Models (LLMs), such as GPT-2, GPT-3.5, and GPT-4.5, using zero-shot (ZS), few-shot (FS), and fine-tuning (FT) learning strategies. We employed few-shot learning with GPT-4.5 to generate new labels for traffic crash analysis, such as driver fault, driver actions, and crash factors, alongside the existing label for severity. Our methodology was tested on crash data from the Missouri State Highway Patrol, demonstrating significant improvements in model performance. GPT-2 (fine-tuned) was used as the baseline model, against which more advanced models were evaluated. GPT-4.5 few-shot learning achieved 98.9% accuracy for crash severity prediction and 98.1% accuracy for driver fault classification. In crash factor extraction, GPT-4.5 few-shot achieved the highest Jaccard score (82.9%), surpassing GPT-3.5 and fine-tuned GPT-2 models. Similarly, in driver actions extraction, GPT-4.5 few-shot attained a Jaccard score of 73.1%, while fine-tuned GPT-2 closely followed with 72.2%, demonstrating that task-specific fine-tuning can achieve performance close to state-of-the-art models when adapted to domain-specific data. These findings highlight the superior performance of GPT-4.5 few-shot learning, particularly in classification and information extraction tasks, while also underscoring the effectiveness of fine-tuning on domain-specific datasets to bridge performance gaps with more advanced models. The MDF framework’s success demonstrates its potential for broader applications beyond traffic crash analysis, particularly in domains where labeled data are scarce and predictive modeling is essential.
dc.description.sponsorshipThis research was supported by Queensland University of Technology (QUT).
dc.description.urihttps://www.mdpi.com/2673-2688/6/4/72
dc.format.extent35 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2gnqx-muau
dc.identifier.citationJaradat, Shadi, Mohammed Elhenawy, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, and Sebastien Glaser. “Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models.” AI 6, no. 4 (2025): 72. https://doi.org/10.3390/ai6040072.
dc.identifier.urihttps://doi.org/10.3390/ai6040072
dc.identifier.urihttp://hdl.handle.net/11603/40447
dc.language.isoen
dc.publisherMDPI
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Data Science
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectLarge Language Model (LLM)
dc.subjectzero-shot learning
dc.subjectGenerative Pre-Trained Transformer (GPT)
dc.subjectfew-shot learning
dc.subjectMultimodal Data Fusion
dc.subjecttraffic crash analysis
dc.titleMultimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-6835-8338

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ai0600072.pdf
Size:
7.19 MB
Format:
Adobe Portable Document Format