Energy-Efficient Implementation of Neural Networks Compressed with Cyclic Sparsely Connected Architectures

Author/Creator ORCID

Date

2021-01-01

Department

Computer Science and Electrical Engineering

Program

Engineering, Computer

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

Deep Convolutional Neural Networks (DCNNs) have evolved to the point where they can surpass human-level accuracy in many applications such as computer vision. Modern DCNNs suffer from large model size and high computation that makes their deployment on resource-bound embedded devices challenging. Recent works on compact DCNN design as well as pruning methods are effective, yet with drawbacks. For instance, more than half the size of all MobileNet models lies in their last two layers, mainly because depthwise separable convolution (CONV) layers are not applicable to their last bulky fully-connected (FC) layers. Also, in pruning methods, the compression is gained at the expense of irregularity in the DCNN architecture, which necessitates additional indexing memory to address non-zero weights, thereby increasing memory footprint and energy consumption. In this thesis, we propose cyclic sparsely connected (CSC) architectures that are composed of a few sparse layers cascaded. Contrary to depthwise separable layers, CSC layers can be used as an overlay for both FC and CONV layers and reduce their memory/computation complexity from O(N^2) down to O(N logN), where N is the number of nodes/channels given a DCNN layer. Also, contrary to pruning methods, CSC layers are structurally sparse and require no indexing. We show that both standard CONV and depthwise CONV layers are special cases of the CSC layers and whose mathematical function, along with FC layers, can be unified into one single formulation, and whose hardware implementation can be carried out under one arithmetic logic component. We examine the efficacy of the CSC architectures for compression of LeNet, ResNet, AlexNet, and MobileNet DCNNs with precision ranging from 2 to 32 bits. More specifically, we surge upon the compact 8-bit quantized 0.5 MobileNet V1 and show that by compressing its last two layers with CSC architectures, the model is compressed by 1.5X with a size of only 873 KB and little accuracy loss.We design a configurable hardware that implements all types of DCNN layers including FC, CONV, depthwise, CSC-FC, and CSC-CONV indistinguishably within a unified pipeline. We configure the design for synthesis and place and route once for a tiny FPGA (Field Programmable Gate Array) and another time for fabrication on ASIC (Application Specific Integrated Circuit) for 8-bit DCNNs. On Xilinx 7A200T FPGA, the hardware is configured to include 192 multipliers for totally on-chip processing of the compressed MobileNet that, compared to the related work, has the highest Inference/J while utilizing the smallest FPGA. On a silicon area of 7mm^2 in TSMC CMOS 65nm technology, the design is fabricated to include 16 multipliers and an on-chip memory of 32 KB. At clock rate 100 MHz, the chip can perform up to 3.6 tera 8-bit operations per joule (TOPJ), which is an order of magnitude higher than that of the NVIDIA TX2 GPU, a general purpose embedded GPU, and Eyeriss, an ASIC for non-sparsified DCNNs.