A study on textual contents in online communities and social media using text mining approaches

Author/Creator

Author/Creator ORCID

Date

2018-05-29

Department

Towson University. Department of Computer and Information Sciences

Program

Citation of Original Publication

Rights

Subjects

Abstract

With the advent of Web 2.0, users have become more interactive, and the population of user-generated contents (UGC) has also increased drastically on the web. Among various Web 2.0 applications, we focus on textual contents in social media and online question answering communities. Twitter has become one of the fastest growing social media sites, and is serving as an electronic word-of-mouth (eWOM) that affects customers’ buying decisions by sharing opinions and information about brands. However, lexical ambiguity is an obstacle to analyzing the data in social media for online reputation management. The enormous amount of tweets makes it impossible for a human to manually disambiguate them. Therefore, we propose an automated company name discrimination using topic signatures. From the experiment, we found that news articles can be used to extract topic signatures, and these topic signatures improved the company name discrimination result as compared to the baseline. Community Question Answering (CQA) sites are knowledge sharing platforms that allow users to post questions and answer questions asked by other users. There is a time lag between questions and answers. Askers need to wait for answers, and some of the questions are never answered. To solve this problem, we propose a weighted question retrieval method using the relationship between titles and descriptions. From the experiment, we found that exploiting the question descriptions increased the ranks of the relevant questions while reducing the recalls of them. Software information sites such as Stack Overflow, Super User, and Ask Ubuntu are specific CQA sites that allow software related questions and tagging systems. Tagging systems help to organize, search, and explore their questions for future use. However, the tag explosion and tag synonym are common problems in tagging systems, because tags are added and created by non-expert users. To mitigate these problems, we propose a tag recommendation method using the highest topic filtering. From the experiment, we observed that our tag recommendation method considerably improved rank-related results and that recommended tags can improve the quality of their questions.