Protecting big data storage with trusted computing: designing a trusted infrastructure and software solution

Author/Creator

Author/Creator ORCID

Date

2014-06-18

Department

Towson University. Department of Computer and Information Sciences

Program

Citation of Original Publication

Rights

Copyright protected, all rights reserved.
There are no restrictions on access to this document. An internet release form signed by the author to display this document online is on file with Towson University Special Collections and Archives.

Subjects

Abstract

Apache Hadoop has the potential to offer powerful and cost effective solutions to big data analytics; however, sensitive data stored within a Hadoop Distributed File System (HDFS) infrastructure has equal potential to be an attractive target for exfiltration, corruption, unauthorized access, and modification. Pairing Apache Hadoop distributed file storage with hardware-based Trusted Computing mechanisms, based on TCG standards, has the potential to alleviate risk of data compromise. With the growing use of Hadoop to tackle big data analytics involving sensitive data, an HDFS cluster could be a target for data exfiltration, corruption, or modification. By implementing open, standards based Trusted Computing technology at the infrastructure and application levels; a novel and robust security posture and protection is presented. A discussion of the motivation for research on this topic, a threat model and evaluation of a targeted Advanced Persistent Threat against HDFS is presented, and a set of common security concerns within HDFS is addressed through infrastructure and software involving integrity validation and data-at-rest encryption. To accomplish these goals, technology from the Trusted Computing Group, such as the pervasively available Trusted Platform Module is used. In addition, a discussion of design considerations in building an encryption framework for Hadoop in a trustworthy manner is presented along with a description of performance and security results of experiments creating an encryption scheme for Hadoop utilizing hardware key protections and Intel AES-NI (Advanced Encryption Standard New Instructions) for encryption acceleration. This work includes an evaluation of the recently implemented crypto framework for Hadoop and independent test of the performance claims of AES-NI regarding mitigating encryption performance overhead.