HAC-M-DNN: Hardware Aware Compression of Sustainable Multimodal Deep Neural Networks for Efficient Real-time Edge Deployment

Author/Creator

Author/Creator ORCID

Date

2024/01/01

Department

Computer Science and Electrical Engineering

Program

Engineering, Computer

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

The rapid advancement of sophisticated artificial intelligence (AI) algorithms has significantly increased energy consumption and carbon dioxide emissions, raising concerns about climate change. This issue has highlighted the need for environmentally sustainable AI technologies, particularly as they become more prevalent across various sectors. Addressing these challenges necessitates the development of energy-efficient embedded systems capable of handling diverse data types, even in resource-limited environments, thus ensuring both technological progress and environmental responsibility. Deep learning has demonstrated immense success across multiple domains, guiding research toward the challenges posed by larger and more complex multimodal data. Multimodal deep neural networks (M-DNNs) aim to develop models that process and relate data from various modalities. A central challenge in M-DNNs is achieving energy-efficient, sustainability-aware modality fusion, which involves combining data from different modalities to perform classification or regression tasks. The diverse nature of multimodal data complicates efficient fusion, and deploying M-DNNs on resource-constrained edge hardware adds further challenges related to model size, performance (latency, throughput, accuracy), and power consumption. M-DNNs often suffer from large model sizes and high computational demands, making deployment on low-power, small-size edge devices difficult. As M-DNN computations and model sizes continue to grow, efforts to reduce computation while maintaining accuracy have been explored. However, hardware-agnostic model compression can degrade model accuracy and performance. This dissertation proposes a framework, HAC-M-DNN (Hardware Aware Compression of Sustainable Multimodal Deep Neural Networks for Efficient Real-time Edge Deployment), to enhance energy efficiency in M-DNN training and introduce hardware awareness in model compression techniques for real-time deployment on resource-constrained edge hardware. The main contributions of this proposal are threefold - introducing a methodology for training large multimodal neural networks with a focus on energy efficiency. This approach integrates data from various modalities (images, audio, text) using different fusion techniques to optimize model performance while minimizing energy consumption and carbon footprint. Second, improving generalization, interpretability, and overall performance through hardware-aware model compression methods, uch as hes-sian aaware mixed-precision quantization, cyclic sparsification and memory aware knowledge distillation, to compress M-DNN models. Finally, evaluating the HAC-M-DNN framework using different multimodal datasets and deploying the compact models on a range of heterogeneous, resource-constrained evaluation boards. Results demonstrate that models can be compressed up to 1400X while retaining nearly 98% accuracy, illustrating the effectiveness of HAC-M-DNN in training and compressing large multimodal models.