Search keyword selection for crawling tweets based on the keyword extraction

Author/Creator

Author/Creator ORCID

Date

2016-03-01

Department

Towson University. Department of Computer and Information Sciences

Program

Citation of Original Publication

Rights

Subjects

Abstract

As use of Social Network Services (SNSs) has been increased over and over, demands to derive meaningful information from them are continuing. In order to extract meaningful information from SNSs, to collect data from them should come up on a first step. In the data collection, keyword-based search is widely used to collect data from SNSs using Application Programming Interface (API). However, in this data collection, a lot of extraneous data can be collected according to a selected topic. For example, if using the topic term such as “Coach” (Fashion company) as a search keyword, extraneous data unrelated to the topic are collected as well because term “Coach” is homonym. This problem makes the data analysis more difficult and causes a waste of data storage space. Additionally, it causes a waste of limited resources to collect data such as search queries. For the topics in which the topic term is homonym, more terms for search keywords must be needed in order to collect data more accurately. Also, the terms should be extracted based on the real data. In this thesis, we propose a method to extract search keywords to be effective for collecting data related with a topic using tweets.