A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models

Ahmed, Naeem, Rashid Amin, Hamza Aldabbas, Muhammad Saeed, Muhammad Bilal, and Houbing Song. “A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models.” ACM Trans. Asian Low-Resour. Lang. Inf. Process., September 23, 2024. https://doi.org/10.1145/3696789.

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

UMBC Security and Optimization for Networked Globe Laboratory (SONG Lab)

Abstract

Sentiment analysis is a process of dealing with people's opinions, remarks, and comments to extract valuable insights from them. Sentiment analysis can be used for various purposes like market analysis, campaign monitoring, decision-making, etc. In recent years, there has been much research on sentiment classification, particularly in English. However, these existing approaches used for the English language cannot be applied to the Urdu language. The substantial rise in communication traffic, including audio, text, video, and pictures, has significantly shifted the Internet of Things (IoT) from scalar to Multimedia Internet of Things (MIoT). So far, the integration of MIoT and NLP systems has received less attention, but it has evolved as a novel research paradigm for smart applications. This article proposes deep learning techniques for sentence-level Urdu sentiment analysis (Urdu SA) for MIoT. Our approach consists of various phases, i.e., data gathering, text preprocessing, model training, testing, and evaluation. A data set of 25 thousand Urdu reviews are used for training the proposed models. This data set is built by scraping various Urdu blogs and social media platforms, and some part of the IMDB data set is used after translating it into the Urdu language. Native Urdu speakers do data annotation, and various preprocessing techniques, i.e., tokenization, stemming, etc., are applied. The two deep learning models, i.e., Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM), are trained on preprocessed Urdu reviews to find their sentiments in this article. Both models are tested using various combinations of hyperparameters, and each model's accuracy and F1 scores are evaluated. The study results show that the LSTM model outperforms the CNN model by achieving a 96% accuracy and 91% F1 score.

A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract