Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

Jaradat, Shadi, Mohammed Elhenawy, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, and Sebastien Glaser. “Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models.” AI 6, no. 4 (2025): 72. https://doi.org/10.3390/ai6040072.

Rights

Attribution 4.0 International

Subjects

Large Language Model (LLM)
zero-shot learning
Generative Pre-Trained Transformer (GPT)
few-shot learning
Multimodal Data Fusion
traffic crash analysis

Abstract

In traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narratives by leveraging advanced Large Language Models (LLMs), such as GPT-2, GPT-3.5, and GPT-4.5, using zero-shot (ZS), few-shot (FS), and fine-tuning (FT) learning strategies. We employed few-shot learning with GPT-4.5 to generate new labels for traffic crash analysis, such as driver fault, driver actions, and crash factors, alongside the existing label for severity. Our methodology was tested on crash data from the Missouri State Highway Patrol, demonstrating significant improvements in model performance. GPT-2 (fine-tuned) was used as the baseline model, against which more advanced models were evaluated. GPT-4.5 few-shot learning achieved 98.9% accuracy for crash severity prediction and 98.1% accuracy for driver fault classification. In crash factor extraction, GPT-4.5 few-shot achieved the highest Jaccard score (82.9%), surpassing GPT-3.5 and fine-tuned GPT-2 models. Similarly, in driver actions extraction, GPT-4.5 few-shot attained a Jaccard score of 73.1%, while fine-tuned GPT-2 closely followed with 72.2%, demonstrating that task-specific fine-tuning can achieve performance close to state-of-the-art models when adapted to domain-specific data. These findings highlight the superior performance of GPT-4.5 few-shot learning, particularly in classification and information extraction tasks, while also underscoring the effectiveness of fine-tuning on domain-specific datasets to bridge performance gaps with more advanced models. The MDF framework’s success demonstrates its potential for broader applications beyond traffic crash analysis, particularly in domains where labeled data are scarce and predictive modeling is essential.

Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract