Heterogeneous and Scalable Sketch-based Framework for Big Data Acceleration on Low Power Embedded Cores

Author/Creator ORCID

Date

2017-01-01

Department

Computer Science and Electrical Engineering

Program

Engineering, Computer

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

Ever-growing IoT demands big data processing and cognitive computing on mobile and battery operated devices. However, big data processing on low power embedded cores is challenging due to their limited communication bandwidth and on-chip storage. Additionally, IoT and cloud-based computing demand low overhead security kernel to avoid data breaches. In this PhD research, we propose LESS ? Light-weight Encryption using Scalable Sketching techniques for data reduction and encryption. LESS is a heterogeneous framework which consists of three important kernels: 1. sketching module for data reduction, 2. an accelerator for efficient sketch recovery using scalable and parallel reconstruction architecture and 3. a host processor to perform post processing. One of the critical challenges in big data processing on embedded hardware platforms is to reconstruct the sketched data in real-time with stringent constraints on error bounds and hardware resources. In this dissertations, we explore Orthogonal Matching Pursuit (OMP) algorithm for sketch data recovery. OMP is a greedy algorithm with high computational complexity which has emerged as an important tool for signal recovery, dictionary learning and sparse data classification. We propose parallel and reconfigurable architecture for OMP algorithm and implement it on FPGA, ASIC, and embedded platforms including Nvidia TK1, and PENC many-core platform developed by EEHPC lab. All processing platforms are evaluated for execution time, energy efficiency, chip area, and throughput based on different frequencies, and different level of parallelism. Based on the result analysis, we propose Thresholding technique for OMP (tOMP) and Gradient Descent OMP (GDOMP) to reduce hardware complexity of OMP algorithm. To demonstrate reconstruction efficiency of proposed OMP modifications, we compare signal-to-reconstruction error rate (SRER), signal-to-noise ratio (PSNR), and Structural Similarity index (SSIM) of previously proposed matching pursuit algorithms. We implemented three different reconfigurable and parallel hardwares for OMP, tOMP and GDOMP algorithms on 65nm CMOS technology operating at 1V supply voltage. The post place and route analysis on area, power, and latency show that, tOMP requires 33% less reconstruction time, whereas GDOMP consumes 44% less chip area when compared to OMP ASIC implementation. Compared to previously published work, the best architecture achieves 2.1x improvement in Area-Delay product (ADP) and consumes 40% less energy. To conclude this work we integrated the heterogeneous LESS framework with Hadoop MapReduce platform and evaluated for three different applications including multi-channel EEG seizure detection, face detection, and object identification. In the LESS framework, sketching module and host processing is performed on ARM CPU, and three different platforms including ASIC, PENC many-core, and ARM CPU are evaluated as an accelerator. For seizure detection application, LESS framework with PENC as an accelerator achieves up to 72% reduction in data transfers with approximately 2.1% and 2.9% degradation in sensitivity and specificity respectively. For face detection and object identification datasets, LESS framework with PENC as an accelerator achieves up to 48% reduction in data transfers with only 0.11% execution overhead and negligible energy overhead of 0.001% when tested for 2.6GB streaming input data. LESS framework achieves 2.25x higher throughput per watt and requires 2x lesser transfer time as compared to MapReduce platform for face detection and object identification application.