Detecting Offensive Social Media Text in Nepali Language

dc.contributor.advisorJoshi, Anupam
dc.contributor.authorTimilsina, Sandesh
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:57Z
dc.date.available2021-09-01T13:55:57Z
dc.date.issued2020-01-20
dc.description.abstractOver the recent years, there has been an enormous increase in user-generated content on the internet. As a result of sentiments and opinions being freely shared on social media platforms, readers are at the increased risk of being exposed to potentially offensive content. This content, while initially limited to English, is now in a large number of languages. Cyber-bullying and online harassment has also become a major problem. While sentiment analysis has always been an active research topic in the NLP community, it is even more important given the expansion of social media. Most of the existing research in sentiment analysis has been focused on English and other high resource languages. While there is emerging work in some low resource languages, very limited work has been done on sentiment analysis in the Nepali language. The preliminary works in this field in the Nepali language focused on document level classification of news stories, book, and movie reviews. However, to the best of our knowledge, no works have been done to analyze the sentiment using social media data. Also, the previous works are limited to binary classification of instances as positive or negative. No earlier works have focused on aspect-level sentiment analysis. In this theses, we present our work on offensive language detection in Nepali social media data. We focused on targeted aspect-based offensive language identification in YouTube comments written in Nepali and code switched language. Unlike previous works in the Nepali language in this field, this work trained deep learning models such as Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and BERT for the tasks. Moreover, we also created a dataset which is being released with this work.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2sdjw-07vs
dc.identifier.other12162
dc.identifier.urihttp://hdl.handle.net/11603/22928
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Timilsina_umbc_0434M_12162.pdf
dc.subjectAspect-based Sentiment Analysis
dc.subjectDetecting Offensive Language in Social Media
dc.subjectOffensive Language Identification
dc.subjectOffensive Language Identification in Nepali
dc.subjectSocial Media Analysis in Nepali
dc.subjectSocial Media Text Analysis
dc.titleDetecting Offensive Social Media Text in Nepali Language
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Timilsina_umbc_0434M_12162.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
Dataset.zip
Size:
931.63 KB
Format:
Unknown data format