Cluster-Based Join for Geographically Distributed Big RDF Data

dc.contributor.authorYang, Fan
dc.contributor.authorCrainiceanu, Adina
dc.contributor.authorChen, Zhiyuan
dc.contributor.authorNeedham, Don
dc.date.accessioned2019-10-11T14:42:42Z
dc.date.available2019-10-11T14:42:42Z
dc.date.issued2019-08-29
dc.description2019 IEEE International Congress on Big Data (BigDataCongress)en
dc.description.abstractFederated RDF systems allow users to retrieve data from multiple independent sources without needing to have all the data in the same triple store. The performance of these systems can be poor for large and geographically distributed RDF data where network transfer costs are high. This paper introduces CBTP, a novel join algorithm that takes advantage of network topology to decrease the cost of processing SPARQL queries in a geographically distributed environment. Federation members are grouped in clusters, based on the network communication cost between the members, and the bulk of the join processing is pushed to the clusters. We use an overlap list to efficiently compute join results from triples in different clusters. We implement our algorithms in OpenRDF Sesame federated framework and use Apache Rya triple store instances as federation members. Experimental evaluation results show the advantages of our approach over existing techniques.en
dc.format.extent9 pagesen
dc.genreconference papers and proceedingsen
dc.identifierdoi:10.13016/m2m7s5-kl5n
dc.identifier.citationF. Yang, A. Crainiceanu, Z. Chen and D. Needham, "Cluster-Based Join for Geographically Distributed Big RDF Data," 2019 IEEE International Congress on Big Data (BigDataCongress), Milan, Italy, 2019, pp. 170-178. doi: 10.1109/BigDataCongress.2019.00037; URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8818183&isnumber=8818170en
dc.identifier.urihttps://doi.org/10.1109/BigDataCongress.2019.00037
dc.identifier.urihttp://hdl.handle.net/11603/15856
dc.language.isoenen
dc.publisherIEEEen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Media and Communication Studies Department
dc.relation.ispartofUMBC Faculty Collection
dc.rightsPublic Domain Mark 1.0*
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rights.urihttp://creativecommons.org/publicdomain/mark/1.0/*
dc.subjectSPARQLen
dc.subjectClusteren
dc.subjectJoinen
dc.subjectFederated Queriesen
dc.titleCluster-Based Join for Geographically Distributed Big RDF Dataen
dc.typeTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cluster-Based Join for Geographically Distributed Big RDF Data.pdf
Size:
179.06 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: