Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices
Links to Fileshttps://arxiv.org/abs/1903.11381
MetadataShow full item record
Type of Work7 pages
journal articles preprints
Citation of Original PublicationMorteza Hosseini, et.al, Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices, Signal Processing, 2019, https://arxiv.org/abs/1903.11381
RightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
In this paper, we propose a low-power hardware for efficient deployment of binarized neural networks (BNNs) that have been trained for physiological datasets. BNNs constrain weights and feature-map to 1 bit, can pack in as many 1-bit weights as the width of a memory entry provides, and can execute multiple multiply-accumulate (MAC) operations with one fused bit-wise xnor and population-count instruction over aligned packed entries. Our proposed hardware is scalable with the number of processing engines (PEs) and the memory width, both of which adjustable for the most energy efficient configuration given an application. We implement two real case studies including Physical Activity Monitoring and Stress Detection on our platform, and for each case study on the target platform, we seek the optimal PE and memory configurations. Our implementation results indicate that a configuration with a good choice of memory width and number of PEs can be optimized up to 4x and 2.5x in energy consumption respectively on Artix-7 FPGA and on 65nm CMOS ASIC implementation. We also show that, generally, wider memories make more efficient BNN processing hardware. To further reduce the energy, we introduce Pool-Skipping technique that can skip at least 25% of the operations that are accompanied by a Max-Pool layer in BNNs, leading to a total of 22% operation reduction in the Stress Detection case study. Compared to the related works using the same case studies on the same target platform and with the same classification accuracy, our hardware is respectively 4.5x and 250x more energy efficient for the Stress Detection on FPGA and Physical Activity Monitoring on ASIC, respectively.