A Fast Method to Fine-Tune Neural Networks for the Least Energy Consumption on FPGAs

Author/Creator ORCID

Date

2021

Department

Program

Citation of Original Publication

Hosseini, Morteza et al.; A Fast Method to Fine-Tune Neural Networks for the Least Energy Consumption on FPGAs; HAET workshop of ICLR 2021; https://eehpc.csee.umbc.edu/publications/pdf/2021/A_fast_method.pdf

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Abstract

Because of their simple hardware requirements, low bitwidth neural networks (NNs) have gained significant attention over the recent years, and have been extensively employed in electronic devices that seek efficiency and performance. Research has shown that scaled-up low bitwidth NNs can have accuracy levels on par with their full-precision counterparts. As a result, there seems to be a tradeoff between quantization (q) and scaling (s) of NNs to maintain the accuracy. In this paper, we propose QS-NAS which is a systematic approach to explore the best quantization and scaling factors for a NN architecture that satisfies a targeted accuracy level and results in the least energy consumption per inference when deployed to a hardware–FPGA in this work. Compared to the literature using the same VGG-like NN with different q and s over the same datasets, our selected optimal NNs deployed to a low-cost tiny Xilinx FPGA from the ZedBoard resulted in accuracy levels higher or on par with those of the related work, while giving the least power dissipation and the highest inference/Joule.