Deploying Deep Neural Networks in Embedded Real-Time Systems

Page, Adam

Deploying Deep Neural Networks in Embedded Real-Time Systems

dc.contributor.advisor	Mohsenin, Tinoosh
dc.contributor.author	Page, Adam
dc.contributor.department	Computer Science and Electrical Engineering
dc.contributor.program	Engineering, Computer
dc.date.accessioned	2019-10-11T13:42:52Z
dc.date.available	2019-10-11T13:42:52Z
dc.date.issued	2016-01-01
dc.description.abstract	Deep neural networks have been shown to outperform prior state-of-the-art solutions that rely heavily on hand-engineered features coupled with simple classification techniques. In addition to achieving several orders of magnitude improvement, they offer a number of additional benefits such as the ability to perform end-to-end learning by performing both hierarchical feature abstraction and inference. Furthermore, their success continues to be demonstrated in a growing number of fields for a wide-range of applications, including computer vision, speech recognition, and model forecasting. As this area of machine learning matures, a major challenge that remains is the ability to efficiently deploy such deep networks in embedded, resource-bound settings that have strict power and area budgets. While GPUs have been shown to improve throughput and energy efficiency over traditional computing paradigms, they still impose significant power burden for such low-power embedded settings. In order to further reduce power while still achieving desired throughput and accuracy, classification-efficient networks are required in addition to optimal deployment onto embedded hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically-inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in neural networks. The sparsification techniques include feature compression partition, structured filter pruning and dynamic feature pruning. Additionally, we explore filter factorization and filter quantization approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose utilizing scalable, FPGA-based accelerators that enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. In partcular, we developed SPARCNet: a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by up to 60% and 93% with less than 1% impact on accuracy when evaluated on several public datasets including 1000-class ImageNet dataset. The SPARCNet accelerator has been evaluated in real-time on a number of popular networks including VGGNet, AlexNet, and SqueezeNet when trained on CIFAR-10 and ImageNet datasets. When deployed on a Zynq-based FPGA platform, the reduction techniques enabled up to 6x improvement in energy efficiency relative to the baseline network. Relative to its integrated dual-core ARM A9 CPU counterpart, the SPARCNet accelerator improved throughput by up to 22x while decreasing energy consumption by 13x. The SPARCNet accelerator was further evaluated against a number of other platforms including NVIDIA Jetson TK1 containing an embedded K1 GPU. When evaluated on AlexNet, the SPARCNet accelerator running on Zedboard platform with Zynq-7000 FPGA is able to achieve an efficiency of 8.07 GOP/J while under 3 Watts versus Jetson TK1 that obtained an efficiency of 4.58 GOP/J with total system power of 12 Watts.
dc.genre	dissertations
dc.identifier	doi:10.13016/m2cbtr-a08y
dc.identifier.other	11566
dc.identifier.uri	http://hdl.handle.net/11603/15486
dc.language	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Theses and Dissertations Collection
dc.relation.ispartof	UMBC Graduate School Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.source	Original File Name: Page_umbc_0434D_11566.pdf
dc.subject	Accelerators
dc.subject	Deep Learning
dc.subject	FPGA
dc.subject	GPU
dc.subject	Machine Learning
dc.title	Deploying Deep Neural Networks in Embedded Real-Time Systems
dc.type	Text
dcterms.accessRights	Distribution Rights granted to UMBC by the author.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Page_umbc_0434D_11566.pdf
Size:: 7.88 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: Page_Open.pdf
Size:: 46.33 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

UMBC Theses and Dissertations