Characterizing the Semantic Web on the Web

Author/Creator ORCID

Date

2006-11-05

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

Semantic Web languages are being used to represent, encode and exchange semantic data in many contexts beyond the Web -- in databases, multiagent systems, mobile computing, and ad hoc networking environments. The core paradigm, however, remains what we call the Web aspect of the Semantic Web -- its use by independent and distributed agents who publish and consume data on the World Wide Web. To better understand this central use case, we have harvested and analyzed a collection of Semantic Web documents from an estimated ten million available on the Web. Using a corpus of more than 1.7 million documents comprising over 300 million RDF triples, we describe a number of global metrics, properties and usage patterns. Most of the metrics, such as the size of Semantic Web documents and the use frequency of Semantic Web terms, were found to follow a power law distribution.