Compression of deep neural networks

Author/Creator ORCID

Date

2021-01-01

Department

Computer Science and Electrical Engineering

Program

Engineering, Electrical

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

Compressing of deep neural networks aims at finding a sparse network that performs as well as a dense network but with significantly less parameters. This compression is called pruning. As a result of pruning, the energy consumption reduces, hardware requirements are relaxed, and responses to queries become faster. The pruning problem yields a constrained, stochastic, nonconvex, and non-differentiable optimization problem with a very large size. All these barriers can be bypassed by solving an approximate problem. To do so, we present ÒAmenable Sparse Network InvestigatorÓ (ASNI) algorithm that utilizes a novel pruning strategy based on a sigmoid function that induces sparsity level globally over the course of one single round of training. The ASNI algorithm fulfills both tasks that current state-of-the-art strategies can only do one of them. This algorithm has two subalgorithms: 1) ASNI-I, 2) ASNI-II. The first subalgorithmlearns an accurate sparse off-the-shelf network only in one single round of training. ASNI-II learns a sparse network and an initialization that is quantized, compressed, and from which the sparse network is trainable.