Deploying Deep Neural Networks in Embedded Real-Time Systems

dc.contributor.advisorMohsenin, Tinoosh
dc.contributor.authorPage, Adam
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programEngineering, Computer
dc.date.accessioned2019-10-11T13:42:52Z
dc.date.available2019-10-11T13:42:52Z
dc.date.issued2016-01-01
dc.description.abstractDeep neural networks have been shown to outperform prior state-of-the-art solutions that rely heavily on hand-engineered features coupled with simple classification techniques. In addition to achieving several orders of magnitude improvement, they offer a number of additional benefits such as the ability to perform end-to-end learning by performing both hierarchical feature abstraction and inference. Furthermore, their success continues to be demonstrated in a growing number of fields for a wide-range of applications, including computer vision, speech recognition, and model forecasting. As this area of machine learning matures, a major challenge that remains is the ability to efficiently deploy such deep networks in embedded, resource-bound settings that have strict power and area budgets. While GPUs have been shown to improve throughput and energy efficiency over traditional computing paradigms, they still impose significant power burden for such low-power embedded settings. In order to further reduce power while still achieving desired throughput and accuracy, classification-efficient networks are required in addition to optimal deployment onto embedded hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically-inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in neural networks. The sparsification techniques include feature compression partition, structured filter pruning and dynamic feature pruning. Additionally, we explore filter factorization and filter quantization approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose utilizing scalable, FPGA-based accelerators that enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. In partcular, we developed SPARCNet: a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by up to 60% and 93% with less than 1% impact on accuracy when evaluated on several public datasets including 1000-class ImageNet dataset. The SPARCNet accelerator has been evaluated in real-time on a number of popular networks including VGGNet, AlexNet, and SqueezeNet when trained on CIFAR-10 and ImageNet datasets. When deployed on a Zynq-based FPGA platform, the reduction techniques enabled up to 6x improvement in energy efficiency relative to the baseline network. Relative to its integrated dual-core ARM A9 CPU counterpart, the SPARCNet accelerator improved throughput by up to 22x while decreasing energy consumption by 13x. The SPARCNet accelerator was further evaluated against a number of other platforms including NVIDIA Jetson TK1 containing an embedded K1 GPU. When evaluated on AlexNet, the SPARCNet accelerator running on Zedboard platform with Zynq-7000 FPGA is able to achieve an efficiency of 8.07 GOP/J while under 3 Watts versus Jetson TK1 that obtained an efficiency of 4.58 GOP/J with total system power of 12 Watts.
dc.genredissertations
dc.identifierdoi:10.13016/m2cbtr-a08y
dc.identifier.other11566
dc.identifier.urihttp://hdl.handle.net/11603/15486
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
dc.sourceOriginal File Name: Page_umbc_0434D_11566.pdf
dc.subjectAccelerators
dc.subjectDeep Learning
dc.subjectFPGA
dc.subjectGPU
dc.subjectMachine Learning
dc.titleDeploying Deep Neural Networks in Embedded Real-Time Systems
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Page_umbc_0434D_11566.pdf
Size:
7.88 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Page_Open.pdf
Size:
46.33 KB
Format:
Adobe Portable Document Format
Description: