Experiments on using Yahoo! categories to describe documents

Yannis Labrou and Tim Finin, Experiments on using Yahoo! categories to describe documents, IJCAI99 Workshop on Intelligent Information Integration, 1999, https://dblp.org/db/conf/ijcai/ijcai99iii

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

yahoo
documents
telltale
UMBC Ebiquity Research Group

Abstract

We suggest that one (or a collection) of names of Yahoo! (or any other WWW indexer’s) categories can be used to describe the content of a document. Such categories offer a standardized and universal way for referring to or describing the nature of real world objects, activities, documents and so on, and may be used (we suggest) to semantically characterize the content of documents. WWW indices, like Yahoo! provide a huge hierarchy of categories (topics) that touch every aspect of human endeavors. Such topics can be used as descriptors the way librarians use for example, the Library of Congress cataloging system to annotate and categorize books. In the course of investigating this idea, we address the problem of automatic categorization of webpages in the Yahoo! directory. We use Telltale as our classifier; Telltale uses n-grams to compute the similarity between documents. We experiment with various types of descriptions for the Yahoo! categories and the webpages to be categorized. Our findings suggest that the best results occur when using the very brief descriptions of the Yahoo! categorized entries; these brief descriptions, which are part of the Yahoo! index itself accompany most entries. We discuss further research and ways to improve on the performance of our method.

Experiments on using Yahoo! categories to describe documents

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract