Data-Driven Insights into Vaccine Hesitancy: A Machine Learning Analysis of Health, Lifestyle, and Socioeconomic Factors
Links to Files
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Kharabsheh, Mohammad, Ali Alsarhan, Nadera Aljawabrah, et al. “Data-Driven Insights into Vaccine Hesitancy: A Machine Learning Analysis of Health, Lifestyle, and Socioeconomic Factors.” International Journal of Advances in Soft Computing and Its Applications 17, no. 2 (2025): 19–38. https://doi.org/10.15849/IJASCA.250730.02.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
The persistent challenge of COVID-19 vaccine hesitancy hinders global efforts to achieve herd immunity. As a public health barrier, vaccine hesitancy diminishes the impact of immunization programs and prolongs population-level vulnerability. This study investigates vaccine acceptance factors by developing a predictive model based on sociodemographic, health, and lifestyle variables, including smoking status. Data were collected through a structured survey of 500 participants representing diverse demographic backgrounds. The survey included questions on demographics, smoking status, prior COVID-19 infection, health conditions, and attitudes toward vaccine safety. We employed Google Cloud’s Vertex AI AutoML to train and evaluate multiple machine learning classification algorithms. Random Forest and Support Vector Machines (SVM) achieved the highest predictive performance among these. The final model demonstrated strong classification accuracy (93%) and a high AUC score (0.96), underscoring its robustness. Feature importance analysis revealed that individuals concerned about long-term vaccine safety were 2.5 times more likely to be vaccinehesitant. The perception of low personal risk from COVID-19 was also a major contributing factor. By contrast, lifestyle variables such as smoking status had a comparatively weaker association with hesitancy. This study contributes to the growing application of machine learning in public health by presenting a scalable, interpretable framework for identifying populations at higher risk of vaccine hesitancy. These findings provide actionable insights for health authorities, emphasizing the need for communication strategies that directly address safety concerns and risk misperception. Tailored outreach should prioritize individuals with lower educational attainment, where hesitancy was notably more prevalent. These contributions offer a foundation for more effective vaccine campaigns and broader pandemic response efforts.
