Constructing collaborative desktop storage caches for large scientific datasets

dc.contributor.authorVazhkudai, Sudharshan S.
dc.contributor.authorMa, Xiaosong
dc.contributor.authorFreeh, Vincent W.
dc.contributor.authorStrickland, Jonathan W.
dc.contributor.authorTammineedi, Nandan
dc.contributor.authorSimon, Tyler A.
dc.contributor.authorScott, Stephen L.
dc.date.accessioned2025-06-05T14:03:45Z
dc.date.available2025-06-05T14:03:45Z
dc.date.issued2006-08-01
dc.description.abstractHigh-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the “last mile” in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations---despite having more processing power than ever before---are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused.We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.
dc.description.sponsorshipThis work was supported in part by an IBM UPP award the US Department of Energy under contract no DE AC05 00OR2275 with UT Battelle LLC and Xiaosong Ma s joint appointment between NCSU and ORNL
dc.description.urihttps://dl.acm.org/doi/10.1145/1168910.1168911
dc.format.extent34 pages
dc.genrejournal articles
dc.identifierdoi:10.13016/m2hwti-cbm2
dc.identifier.citationVazhkudai, Sudharshan S., Xiaosong Ma, Vincent W. Freeh, Jonathan W. Strickland, Nandan Tammineedi, Tyler Simon, and Stephen L. Scott. “Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets.” ACM Trans. Storage 2, no. 3 (August 1, 2006): 221–54. https://doi.org/10.1145/1168910.1168911.
dc.identifier.urihttps://doi.org/10.1145/1168910.1168911
dc.identifier.urihttp://hdl.handle.net/11603/38751
dc.language.isoen_US
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/
dc.titleConstructing collaborative desktop storage caches for large scientific datasets
dc.typeText

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
11689101168911.pdf
Size:
924.37 KB
Format:
Adobe Portable Document Format