Browsing by Subject "FPGA"

Now showing 1 - 6 of 6

A Scalable and Low Power Deep Convolutional Neural Network for Multimodal Data Classification In Embedded Real-Time Systems
(2017-01-01) Jafari, AliJafari, Ali; Mohsenin, Tinoosh; Computer Science and Electrical Engineering; Engineering, Computer
Multimodal time series signals are generated by different sensors such as accelerometers, magnetometers, gyroscopes and heart rate monitors, where each sensor usually has various number of input channels and sampling rates. Different signal processing techniques such as feature extraction and classification are employed to process the data generated by each sensor modality which: 1) can lead to a long design time, 2) requires expert knowledge in designing the features, and 3) is unscalable when adding new sensors. Moreover, with recent advances in Internet of Things (IoT) and wearable devices, a major challenge is the ability to efficiently deploy the multimodal signal processing techniques in embedded, resource-bound settings that have strict power and area budgets. In this dissertations we target the previously mentioned challenges. In the first contribution, we propose "SensorNet" which is a scalable deep convolutional neural network designed to classify multimodal time series signals. The raw time series signals generated by different sensor modalities with different sampling rates are first fused into images; then, a Deep Convolutional Neural Network (DCNN) is utilized to automatically learn shared features in the images and perform the classification. SensorNet: (1) is scalable as it can process different types of time series data with variety of input channels and sampling rates. (2) does not need to employ separate signal processing techniques for processing the data generated by each sensor modality. (3) does not require expert knowledge for extracting features for each sensor data. (4) makes it easy and fast to adapt to new sensor modalities with a different sampling rate. (5) achieves very high detection accuracy for different case studies. (6) has a very efficient architecture which makes it suitable to be employed at IoTs and wearable devices. In the second contribution, we propose a custom low power hardware architecture for the efficient deployment of SensorNet at resource-limited embedded devices, which can perform the entire SensorNet signal processing in real-time with minimal energy consumption. The proposed architecture is fully reconfigurable for different applications with various requirements. Finally, we propose a stand-alone dual-mode Tongue Drive System (sdTDS) which employs SensorNet to perform all required multimodal signal processing in real-time. sdTDS is a wireless wearable headset and individuals with severe disabilities can use it to potentially control their environment such as computer, smartphone and wheelchair using their voluntary tongue and head movements. SensorNet performance is evaluated using three different case studies including PhysicalActivity Monitoring, sdTDS and Stress Detection and it achieves an average detection accuracy of 98%, 96.2% and 94% for each case study, respectively. Furthermore, we implement SensorNet using our custom hardware architecture on Xilinx FPGA (Artix-7) which consumes 17 mJ, 9 mJ and 3.5 mJ energy for Physical Activity Monitoring, sdTDS and Stress Detection case studies, respectively. To further reduce the power consumption, SensorNet is implemented using ASIC at the post layout level in 65-nm CMOS technology which consumes approximately 7x lower power compared to the FPGA implementation. Additionally, SensorNet is implemented on NVIDIA Jetson TX2 SoC (CPU+GPU) which is an embedded commercial off-the-shelf platform. Compared to TX2 single-core CPU and GPU implementations, FPGA-based SensorNet obtains 8x and 12x improvement in power consumption, and 71x and 3x improvement in energy consumption. Furthermore, SensorNet achieves 200x, 63x, 27x lower energy consumption compared to previous related work. SensorNet is considered as a generic deep neural network that can accommodates a wide range of applications with minimal effort.
Deploying Deep Neural Networks in Embedded Real-Time Systems
(2016-01-01) Page, Adam; Mohsenin, Tinoosh; Computer Science and Electrical Engineering; Engineering, Computer
Deep neural networks have been shown to outperform prior state-of-the-art solutions that rely heavily on hand-engineered features coupled with simple classification techniques. In addition to achieving several orders of magnitude improvement, they offer a number of additional benefits such as the ability to perform end-to-end learning by performing both hierarchical feature abstraction and inference. Furthermore, their success continues to be demonstrated in a growing number of fields for a wide-range of applications, including computer vision, speech recognition, and model forecasting. As this area of machine learning matures, a major challenge that remains is the ability to efficiently deploy such deep networks in embedded, resource-bound settings that have strict power and area budgets. While GPUs have been shown to improve throughput and energy efficiency over traditional computing paradigms, they still impose significant power burden for such low-power embedded settings. In order to further reduce power while still achieving desired throughput and accuracy, classification-efficient networks are required in addition to optimal deployment onto embedded hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically-inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in neural networks. The sparsification techniques include feature compression partition, structured filter pruning and dynamic feature pruning. Additionally, we explore filter factorization and filter quantization approximation techniques to further reduce the complexity of convolutional layers. In the second contribution, we propose utilizing scalable, FPGA-based accelerators that enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification and approximation techniques proposed. In partcular, we developed SPARCNet: a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. Utilizing the reduction techniques, we demonstrate the ability to reduce computation and memory by up to 60% and 93% with less than 1% impact on accuracy when evaluated on several public datasets including 1000-class ImageNet dataset. The SPARCNet accelerator has been evaluated in real-time on a number of popular networks including VGGNet, AlexNet, and SqueezeNet when trained on CIFAR-10 and ImageNet datasets. When deployed on a Zynq-based FPGA platform, the reduction techniques enabled up to 6x improvement in energy efficiency relative to the baseline network. Relative to its integrated dual-core ARM A9 CPU counterpart, the SPARCNet accelerator improved throughput by up to 22x while decreasing energy consumption by 13x. The SPARCNet accelerator was further evaluated against a number of other platforms including NVIDIA Jetson TK1 containing an embedded K1 GPU. When evaluated on AlexNet, the SPARCNet accelerator running on Zedboard platform with Zynq-7000 FPGA is able to achieve an efficiency of 8.07 GOP/J while under 3 Watts versus Jetson TK1 that obtained an efficiency of 4.58 GOP/J with total system power of 12 Watts.
Heterogeneous Scheduling of Deep Neural Networks
(2021-01-01) Shea, Colin William; Mohsenin, Tinoosh; Computer Science and Electrical Engineering; Engineering, Computer
Deep neural networks (DNN) have become the readiest answer to a range of application challenges, including image recognition, stock analysis, natural language processing, and biomedical applications, all while outperforming prior leading solutions that relied heavily on hand-engineered techniques. However, deployment of these neural networks often requires high computational and memory-intensive solutions. These requirements make it challenging to deploy DNNs in embedded, real-time, low-power applications where classic architectures, CPUs and GPUs, still impose significant power burdens. Systems-on-chip with FPGAs can improve performance and allow more fine-grain control of resources than CPUs or GPUs, but it is challenging finding the optimal balance between hardware and software to improve DNN efficiency. There are few proposed solutions in the current research literature addressing optimized hardware and software deployments of DNNs. To address the computation resource restrictions and low-power needs for deploying these networks, we describe and implement a domain-specific metric model for optimizing task deployment on differing platforms, hardware and software. Next, we discuss our DNN hardware accelerator called SCALENet: a SCalable Low-power AccELerator for real-time DNNs that includes multithreaded software workers. Contained within the framework is a heterogeneous-aware scheduler that uses DNN-specific metric models, software-optimized kernels and the SCALENet accelerator to allocate a task to a resource based on solving a numerical cost for a series of domain objectives. To demonstrate our contribution's applicability, we deploy nine modern deep network architectures, each containing a different number of parameters within the context of two different neural network applications: image processing and biomedical seizure detection. Utilizing the metric modeling techniques integrated into the heterogeneous-aware scheduler, we show the ability to meet computational requirements, adapt multiple architectures, and lower power by providing an optimized task to resource allocation. Our heterogeneous-aware scheduler decreases total power consumption by 10% does not affect the accuracy of the networks, and meets real-time deadlines. We demonstrate the ability to achieve parity with or exceed the energy efficiency of Nvidia� GPUs when evaluated against Jetson� TK1 with embedded GPU SoC and a 4x power savings in a power envelope of 2.0 Watts. When evaluated with the CIFAR10 dataset and a batch size of 1 against the NVIDIA�Jetson� TX1 and TX2, SCALENet has a throughput improvement of 2.2x and 1.3x to the TX1 and TX2 respectively, while improving energy efficiency by 3.7x and 1.9x. Compared to existing FPGA-based accelerators, SCALENet's accelerator and heterogeneous-aware scheduler achieve a 1.3x improvement in energy efficiency.
Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices
(2019-03-25) Hosseini, Morteza; Paneliya, Hirenkumar; Kallakuri, Uttej; Khatwani, Mohit; Mohsenin, Tinoosh
In this paper, we propose a low-power hardware for efficient deployment of binarized neural networks (BNNs) that have been trained for physiological datasets. BNNs constrain weights and feature-map to 1 bit, can pack in as many 1-bit weights as the width of a memory entry provides, and can execute multiple multiply-accumulate (MAC) operations with one fused bit-wise xnor and population-count instruction over aligned packed entries. Our proposed hardware is scalable with the number of processing engines (PEs) and the memory width, both of which adjustable for the most energy efficient configuration given an application. We implement two real case studies including Physical Activity Monitoring and Stress Detection on our platform, and for each case study on the target platform, we seek the optimal PE and memory configurations. Our implementation results indicate that a configuration with a good choice of memory width and number of PEs can be optimized up to 4x and 2.5x in energy consumption respectively on Artix-7 FPGA and on 65nm CMOS ASIC implementation. We also show that, generally, wider memories make more efficient BNN processing hardware. To further reduce the energy, we introduce Pool-Skipping technique that can skip at least 25% of the operations that are accompanied by a Max-Pool layer in BNNs, leading to a total of 22% operation reduction in the Stress Detection case study. Compared to the related works using the same case studies on the same target platform and with the same classification accuracy, our hardware is respectively 4.5x and 250x more energy efficient for the Stress Detection on FPGA and Physical Activity Monitoring on ASIC, respectively.
ON THE RESILIENCY OF PHYSICALLY UNCLONABLE FUNCTIONS AGAINST POWER ANALYSIS ATTACKS
(2022-01-01) Kroeger, Trevor Anthony Paul; Karimi, Naghmeh; Computer Science and Electrical Engineering; Engineering, Computer
Integrated Circuits (ICs) have made their way into many critical systems that service transportation, medical, and military industries; areas that are targeted for maximum disruption of operations and daily life. Whether the motive is monetary or political, it is clear that ICs require protection from malicious intent. To aid in the protection of devices various techniques have been developed to identify, authenticate, and track them. One of the principal security primitives used in the aforementioned techniques are Physical Unclonable Functions (PUFs). PUFs produce unique signatures based on the uncontrollable physical variations which occur during the fabrication of ICs. A PUF's output (response) is produced when the PUF is given an input (challenge). Together these inputs and outputs are known as Challenge Response Pairs (CRPs). A PUF's CRPs are used for authenticating devices or for IC metering purposes, aiding in the prevention of over-production and IC cloning. The arbiter-PUF is one of the most popular PUFs broadly adopted by industry because of its large number of CRPs. Owing to their usefulness in securing ICs, PUFs are also the focus of attacks. They are vulnerable to modeling attacks in which the adversary tries to model the PUF's behavior to predict its response for unseen challenges. There are two forms of modeling attacks: CRP-based modeling attacks and power-based modeling attacks.When attacking a PUF using its power side-channel, the PUF response is predicted based on the PUF's power consumption. This research focuses on the power- based modeling attacks perpetrated against the arbiter-PUF family, and presents PUFs that are resilient against power-based attacks via inserting the proposed countermeasures. First, investigations are performed on the resiliency of the state-of-the- art PUFs that were proposed in literature recently to counter modeling attacks such as analog variants and challenge obfuscation based PUFs. These PUFs are shown to be successfully compromised through their power side-channel. These investigations are taken one step further by performing Cross-PUF attacks, where the power traces of one PUF can be used to model another PUF fabricated from the same GDSII file. This research shows, for the first time, that such attacks are highly successful in exposing a previously unexplored vulnerability of PUFs. Further investigations of power-based modeling attacks are performed by characterizing the effects that temperature and aging have on both Self-PUF and Cross-PUF attacks. Exploration of modeling the power is extended to multi-bit response parallel PUFs to show their vulnerability against power-based attacks. The results of these investigations showed that the response could still be discerned from the power consumption of the device. Because of the phenomenon being exploited in the power-based modeling attacks this research shows that these attacks work not only on the arbiter-PUF, but also its derivatives. To further improve the understanding of the various power-based modeling attacks, this research uses the Signal-to-Noise Ratio (SNR) to characterize and assess the vulnerability of the target PUFs to modeling attacks. Finally, to enhance the resiliency of the targeted PUFs against power-based modeling attacks, a number of circuit-level countermeasures, based on reducing the SNR and/or confusing the model, are proposed. These countermeasures appear to be highly successful in protecting the PUF against the power-based modeling attack. The results have been extracted first using HSpice simulations, and then the experiments (the attacks and countermeasures) were performed on FPGA fabric to verify the findings in silicon.
Power Supply Analysis for Device Verification
(2020-01-20) Shey, James; Patel, Chintan; Computer Science and Electrical Engineering; Engineering, Electrical
With the global spanning of integrated circuit (IC) and electronic device supply chains, the ability of an untrusted entity to alter an IC or device while it is in the supply chain increases. There is also an increased ability to intercept and alter or introduce an altered version of a product with a global supply chain. This work seeks to verify devices through the use of power supply analysis on three focus areas: Solid-State Drives (SSDs) operation verification, IC identification, and Field Programmable Gate Array (FPGA) bitstream verification. The first focus area is using power supply analysis on an SSD to ensure that the device is operating properly. This work started with identifying trim operation signatures and then using machine learning to reach near 100 percent accuracy at identifying the operation. Follow on applications include read and write classification, operating system identification, and malware detection on SSDs. The second focus area is using current analysis to identify alterations in an IC even if the IC is functionally equivalent. This work is done through simulation and captures process variation, noise, and aging at multiple different temperatures all with greater than an F1 score of 0.90 given at least 2.5 percent of the circuit had been altered. Additionally, various machine learning techniques are applied to increase performance and lower the detection floor to than 2 percent of the circuit and in most cases 1 percent of the circuit. The application for this focus area is the ability to identify ICs to ensure that the desired IC is used in a final product and make it more difficult for piraters. The final focus area is using current analysis to verify a bitstream that is on an FPGA. Given a set of known bitstreams, the presented method results in a minimum F1 score of 0.8832 at determining if a bitstream is the desired version. This is accomplished by using both support vector machines and neural networks to determine changes in the FPGA fabric. Altered circuits ranged from 0.087 to 20.05 percent change in the underlying fabric while maintaining a functionally equivalent implementation. The application of this ensures that the correct bitstream is running on an FPGA. The three focus areas independently help to verify a device but can be combined to more completely verify a device. Monitoring its operations on the device level as well as the chip level, whether the chip is an Application-Specific Integrated Circuit (ASIC) or an FPGA, can make pirating cost-prohibitive.

Browsing by Subject "FPGA"

Results Per Page

Sort Options