Coupling prefix caching and collective downloads for remote dataset access
| dc.contributor.author | Ma, Xiaosong | |
| dc.contributor.author | Freeh, Vincent W. | |
| dc.contributor.author | Yang, Tao | |
| dc.contributor.author | Vazhkudai, Sudharshan S. | |
| dc.contributor.author | Simon, Tyler A. | |
| dc.contributor.author | Scott, Stephen L. | |
| dc.date.accessioned | 2025-06-05T14:03:47Z | |
| dc.date.available | 2025-06-05T14:03:47Z | |
| dc.date.issued | 2006-06-28 | |
| dc.description | ICS '06: Proceedings of the 20th annual international conference on Supercomputing | |
| dc.description.abstract | Scientific datasets are typically archived at mass storage systems or data centers close to supercomputers/instruments. End-users of these datasets, however, usually perform parts of their workflows at their local computers. In such cases, client-side caching can offer significant gains by reducing the cost of wide-area data movement.Scientific data caches, however, traditionally cache entire data-sets, which may not be necessary. In this paper, we propose a novel combination of prefix caching and collective download. Prefix caching allows the bootstrapping of dataset downloads by caching only a prefix of the dataset, while collective download facilitates efficient parallel patching of the missing suffix from an external data source. To estimate the optimal prefix size, we further present an analytical model that considers both the initial download over-head and the downloading speed. We implemented our proposed approach in the FreeLoader distributed cache prototype. Experimental results (using multiple scientific data repositories and data transfer tools, as well as a real-world scientific dataset access trace) demonstrate that prefix caching and collective download can be implemented efficiently, our model can select an appropriate prefix size, and the cache hit rate can be improved significantly without hurting the local access rate of cached datasets. | |
| dc.description.sponsorship | This work is supported in part by a DOE ECPI Award DEFG0205ER25685 an NSF CAREER Award CNS0546301 an IBM UPP award a DOE contract with UTBattelle LLC DEAC0500OR2275 and Xiaosong Mas joint appointment between NCSU and ORNL The authors thank John Cobb and Greg Pike for access to TeraGrid resources | |
| dc.description.uri | https://dl.acm.org/doi/10.1145/1183401.1183435 | |
| dc.format.extent | 10 pages | |
| dc.genre | conference papers and proceedings | |
| dc.identifier | doi:10.13016/m23kpn-w4ay | |
| dc.identifier.citation | Ma, Xiaosong, Vincent W. Freeh, Tao Yang, Sudharshan S. Vazhkudai, Tyler A. Simon, and Stephen L. Scott. “Coupling Prefix Caching and Collective Downloads for Remote Dataset Access.” In Proceedings of the 20th Annual International Conference on Supercomputing, 229–38. Paper Presented at ICS ’06: Association for Computing Machinery, Cairns, Queensland, Australia, June 28-July 1, 2006. https://doi.org/10.1145/1183401.1183435. | |
| dc.identifier.uri | https://doi.org/10.1145/1183401.1183435 | |
| dc.identifier.uri | http://hdl.handle.net/11603/38755 | |
| dc.language.iso | en_US | |
| dc.publisher | ACM | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.rights | This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law. | |
| dc.rights | Public Domain | |
| dc.rights.uri | https://creativecommons.org/publicdomain/mark/1.0/ | |
| dc.title | Coupling prefix caching and collective downloads for remote dataset access | |
| dc.type | Text |
Files
Original bundle
1 - 1 of 1
