Blog Track Open Task: Spam Blog Classification

dc.contributor.authorKolari, Pranam
dc.contributor.authorJava, Akshay
dc.contributor.authorFinin, Tim
dc.contributor.authorMayfield, James
dc.contributor.authorJoshi, Anupam
dc.contributor.authorMartineau, Justin
dc.date.accessioned2018-11-29T19:12:26Z
dc.date.available2018-11-29T19:12:26Z
dc.date.issued2006-11-14
dc.descriptionTREC 2006 Blog Track Notebooken_US
dc.description.abstractSpam blogs or Splogs are blogs created for the sole purpose of hosting ads, promoting affiliate sites and getting new content indexed, with auto-generated or plagiarized content from other sources. Spammers equipped with readily available splog creation software inundate the blogosphere both at ping servers, and at systems that index and analyze blogs. Our own studies estimate these numbers to be around 75% at ping servers and 20% at popular blog search engines. In this open submission we hence propose Spam Blog Classification as a new task in the Blog Track. Splogs are a specific instance of the more general spam web-pages. While offline graph based mechanisms like TrustRank are quite effective and sufficient for the Web, the blogosphere demands new techniques. The quality of blog search engines is judged not just by their reach, but also by their ability to index recent (non-spam) posts. This requires that fast online splog detection/filtering be used prior to indexing new content, followed by offline techniques that exploit link graph anomalies. The nature of this problem makes splog detection challenging. This open task submission underscores the seriousness of the splog problem in the TREC 2006 collection, details how it impacts the primary task of Opinion Identification, and proposes multiple assessment and evaluation approaches for a Spam Blog Classification task in Blog Track 2007.en_US
dc.description.sponsorshipPartial support was provided by an IBM Fellowship and by NSF awards NSF-ITR-IIS00326460 and NSF-ITR-IDM-0219649en_US
dc.description.urihttps://ebiquity.umbc.edu/paper/html/id/318/Blog-Track-Open-Task-Spam-Blog-Classificationen_US
dc.format.extent7 pagesen_US
dc.genreconference papers and proceedings preprintsen_US
dc.identifierdoi:10.13016/M23B5WC51
dc.identifier.urihttp://hdl.handle.net/11603/12132
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rightsPublic Domain Mark 1.0*
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law
dc.rights.urihttp://creativecommons.org/publicdomain/mark/1.0/*
dc.subjectBlogen_US
dc.subjectTasken_US
dc.subjectSpamen_US
dc.subjectClassificationen_US
dc.subjectSocial Mediaen_US
dc.subjectUMBC Ebiquity Research Groupen_US
dc.titleBlog Track Open Task: Spam Blog Classificationen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
296.pdf
Size:
948.02 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: