Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data

dc.contributor.authorJaradat, Shadi
dc.contributor.authorNayak, Richi
dc.contributor.authorPaz, Alexander
dc.contributor.authorAshqar, Huthaifa
dc.contributor.authorElhenawy, Mohammad
dc.date.accessioned2024-10-28T14:31:01Z
dc.date.available2024-10-28T14:31:01Z
dc.date.issued2024-09-01
dc.description.abstractRoad traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study introduces a novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related tweets from May 2022 to May 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks and nine information retrieval tasks. These features were then used to fine-tune GPT-2 for language modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across most tasks. Unlike traditional single-task classifiers that may miss critical details, our MTL approach simultaneously classifies RTC-related tweets and extracts detailed information in natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% across the six classification tasks, surpassing the baseline GPT-4o mini model’s 64% and XGBoost’s 83.5%. In information retrieval tasks, our fine-tuned GPT-2 model achieved a BLEU-4 score of 0.22, a ROUGE-I score of 0.78, and a WER of 0.30, significantly outperforming the baseline GPT-4 mini model’s BLEU-4 score of 0.0674, ROUGE-I score of 0.2992, and WER of 2.0715. These results demonstrate the efficacy of our fine-tuned GPT-2 model in enhancing both classification and information retrieval, offering valuable insights for data-driven decision-making to improve road safety. This study is the first to explicitly apply social media data and LLMs within an MTL framework to enhance traffic safety.
dc.description.sponsorshipThis research was supported by a PhD scholarship from Queensland University of Technology (QUT). The Article Processing Charge (APC) was funded by the Centre of Data Science at QUT.
dc.description.urihttps://www.mdpi.com/2624-6511/7/5/95
dc.format.extent44 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2or58-mao7
dc.identifier.citationJaradat, Shadi, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, and Mohammad Elhenawy. “Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data.” Smart Cities 7, no. 5 (October 2024): 2422–65. https://doi.org/10.3390/smartcities7050095.
dc.identifier.urihttps://doi.org/10.3390/smartcities7050095
dc.identifier.urihttp://hdl.handle.net/11603/36791
dc.language.isoen_US
dc.publisherMDPI
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Data Science
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectGPT
dc.subjectlarge language models
dc.subjectmultitask learning
dc.subjectroad traffic crashes
dc.subjectsocial media data analysis
dc.titleMultitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-6835-8338

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
smartcities0700095v4.pdf
Size:
4.27 MB
Format:
Adobe Portable Document Format