A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models
Loading...
Links to Files
Permanent Link
Author/Creator ORCID
Date
2024-09-23
Type of Work
Department
Program
Citation of Original Publication
Ahmed, Naeem, Rashid Amin, Hamza Aldabbas, Muhammad Saeed, Muhammad Bilal, and Houbing Song. “A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models.” ACM Trans. Asian Low-Resour. Lang. Inf. Process., September 23, 2024. https://doi.org/10.1145/3696789.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
Sentiment analysis is a process of dealing with people's opinions, remarks, and comments to extract valuable insights from them. Sentiment analysis can be used for various purposes like market analysis, campaign monitoring, decision-making, etc. In recent years, there has been much research on sentiment classification, particularly in English. However, these existing approaches used for the English language cannot be applied to the Urdu language. The substantial rise in communication traffic, including audio, text, video, and pictures, has significantly shifted the Internet of Things (IoT) from scalar to Multimedia Internet of Things (MIoT). So far, the integration of MIoT and NLP systems has received less attention, but it has evolved as a novel research paradigm for smart applications. This article proposes deep learning techniques for sentence-level Urdu sentiment analysis (Urdu SA) for MIoT. Our approach consists of various phases, i.e., data gathering, text preprocessing, model training, testing, and evaluation. A data set of 25 thousand Urdu reviews are used for training the proposed models. This data set is built by scraping various Urdu blogs and social media platforms, and some part of the IMDB data set is used after translating it into the Urdu language. Native Urdu speakers do data annotation, and various preprocessing techniques, i.e., tokenization, stemming, etc., are applied. The two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM), are trained on preprocessed Urdu reviews to find their sentiments in this article. Both models are tested using various combinations of hyperparameters, and each model's accuracy and F1 scores are evaluated. The study results show that the LSTM model outperforms the CNN model by achieving a 96% accuracy and 91% F1 score.