The Lightweight Virtual File System

Author/Creator

Author/Creator ORCID

Date

2017-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

A data center today is responsible for safely managing big data volumes and balancing the complex needs between data producers and consumers. This balance often involves reconciling the needs of easy access and rapid retrieval in ways desired by the consumers with the needs of long term availability, reliability, and expandability of data producers. The long term continuous support of data storage adds another layer of complexity for the file system. As storage architecture and big data volumes evolve, existing file system's primary focus is performance while less attention is payed to addressing the problems of the above long term servicing needs of their clients. I have developed the Lightweight Virtual File System (LVFS) to address these prob- lems through the unique conceptual approach of separating the most common tasks in- volved in a file system; namely storing data, locating data, and organizing data. Standard file systems are developed as single monolithic systems performing all three tasks. LVFS replaces these tasks with an architecture which enables the dynamic combination of dif- ferent algorithms for each of those tasks. Using this approach, LVFS is capable of con- structing a storage system, which allows for ready availability, reliability, expandability, and long term support while, simultaneously, assuring the performance of a stable system customizable to meet the needs of data consumers. After successful development and testing to allow for merging decades old storage ar- chitecture with new and incompatible ones, such as HGST Active Archive System, NASA Goddard Space Flight Center's Terrestrial Information Systems Laboratory adopted LVFS for their production environment to create a single, integrated storage system without any software modifications. UMBC's Center for Hybrid Multicore Productivity Research de- ployed an instance on the IBM iDataPlex ?BlueWave? cluster to utilize Seagate's Active Drive systems as a storage and on-disk compute platform. With LVFS we show we were able to perform MapReduce computation directly on the drive with comparable perfor- mance to Hadoop running on BlueWave. It also shows a significant reduction in data leav- ing the active drive during computation thereby significantly increasing throughput.