Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data
| dc.contributor.author | Jaradat, Shadi | |
| dc.contributor.author | Nayak, Richi | |
| dc.contributor.author | Paz, Alexander | |
| dc.contributor.author | Ashqar, Huthaifa | |
| dc.contributor.author | Elhenawy, Mohammad | |
| dc.date.accessioned | 2024-10-28T14:31:01Z | |
| dc.date.available | 2024-10-28T14:31:01Z | |
| dc.date.issued | 2024-09-01 | |
| dc.description.abstract | Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study introduces a novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related tweets from May 2022 to May 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks and nine information retrieval tasks. These features were then used to fine-tune GPT-2 for language modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across most tasks. Unlike traditional single-task classifiers that may miss critical details, our MTL approach simultaneously classifies RTC-related tweets and extracts detailed information in natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% across the six classification tasks, surpassing the baseline GPT-4o mini model’s 64% and XGBoost’s 83.5%. In information retrieval tasks, our fine-tuned GPT-2 model achieved a BLEU-4 score of 0.22, a ROUGE-I score of 0.78, and a WER of 0.30, significantly outperforming the baseline GPT-4 mini model’s BLEU-4 score of 0.0674, ROUGE-I score of 0.2992, and WER of 2.0715. These results demonstrate the efficacy of our fine-tuned GPT-2 model in enhancing both classification and information retrieval, offering valuable insights for data-driven decision-making to improve road safety. This study is the first to explicitly apply social media data and LLMs within an MTL framework to enhance traffic safety. | |
| dc.description.sponsorship | This research was supported by a PhD scholarship from Queensland University of Technology (QUT). The Article Processing Charge (APC) was funded by the Centre of Data Science at QUT. | |
| dc.description.uri | https://www.mdpi.com/2624-6511/7/5/95 | |
| dc.format.extent | 44 pages | |
| dc.genre | journal articles | |
| dc.identifier | doi:10.13016/m2or58-mao7 | |
| dc.identifier.citation | Jaradat, Shadi, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, and Mohammad Elhenawy. “Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data.” Smart Cities 7, no. 5 (October 2024): 2422–65. https://doi.org/10.3390/smartcities7050095. | |
| dc.identifier.uri | https://doi.org/10.3390/smartcities7050095 | |
| dc.identifier.uri | http://hdl.handle.net/11603/36791 | |
| dc.language.iso | en_US | |
| dc.publisher | MDPI | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Data Science | |
| dc.rights | Attribution 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | GPT | |
| dc.subject | large language models | |
| dc.subject | multitask learning | |
| dc.subject | road traffic crashes | |
| dc.subject | social media data analysis | |
| dc.title | Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-6835-8338 |
Files
Original bundle
1 - 1 of 1
