UMBC Center for Real-time Distributed Sensing and Autonomy

Permanent URI for this collectionhttp://hdl.handle.net/11603/23124

The vision of this center is to advance AI-based autonomy in order to deliver safe, effective, and resilient new capabilities across a variety of complex mission types, including search-and-rescue, persistent surveillance, managing, adapting and optimizing smart, connected robots and machinery, and augmenting humans in performing complex analytical and decision-making tasks. These systems are continually getting better, but to achieve their potential, there are still numerous developments required to improve their capability, command and control, interoperability, resiliency and trustworthiness.

Focus Areas: Networking, Sensing and IoT for the Battlefield, IoT for the Battlefield. Adaptive Machine/Deep Learning, Individual and Collective Health Assessment, Adaptive Cybersecurity, Cross Domain Machine Learning with Few Labels, AI/ML on Edg, Predictive Maintenance

Browse

Recent Submissions

Now showing 1 - 20 of 81
  • Item
    Semantic Integration and Knowledge Discovery for Environmental Research
    (IGI Global, 2007-01-01) Chen, Zhiyuan; Gangopadhyay, Aryya; Karabatis, George; McGuire, Michael P.; Welty, Claire
    Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantic-based techniques to assist...
  • Item
    A Utility-Aware and Holistic Approach for Privacy Preserving Distributed Mining with Worst Case Privacy Guarantee
    Banerjee, Madhushri; Chen, Zhiyuan; Gangopadhyay, Aryya
    Organizations often want to predict some attribute values collaboratively. However, they are often unwilling or not allowed to directly share their private data. Thus there is great need for distributed privacy preserving techniques. There exists a rich body of work based on Secure MultiParty Computation techniques. However, most such techniques are tied to a specific mining algorithm and users have to run a different protocol for each mining algorithm. A holistic approach was proposed in which all parties first use a SMC protocol to generate a synthetic data set and then share this data for different mining algorithms. However, this approach has two major drawbacks: 1) it provides no worst case privacy guarantee, 2) parties involved in the mining process often know what attribute to predict, but the holistic approach does not take this into account. In this paper, we propose a method that addresses these shortcomings. Experimental results demonstrate the benefits of the proposed solution.
  • Item
    A Survey on Efficient Vision-Language Models
    (2025-04-13) Shinde, Gaurav; Ravi, Anuradha; Dey, Emon; Sakib, Shadman; Rampure, Milind; Roy, Nirmalya
    Vision-language models (VLMs) integrate visual and textual information, enabling a wide range of applications such as image captioning and visual question answering, making them crucial for modern AI systems. However, their high computational demands pose challenges for real-time applications. This has led to a growing focus on developing efficient vision language models. In this survey, we review key techniques for optimizing VLMs on edge and resource-constrained devices. We also explore compact VLM architectures, frameworks and provide detailed insights into the performance-memory trade-offs of efficient VLMs. Furthermore, we establish a GitHub repository at https://github.com/MPSCUMBC/Efficient-Vision-Language-Models-A-Survey to compile all surveyed papers, which we will actively update. Our objective is to foster deeper research in this area.
  • Item
    Navigation Rules for Exploring Large Multidimensional Data Cubes
    (IGI Global, 2006-10-01) Kumar, Navin; Gangopadhyay, Aryya; Karabatis, George; Bapna, Sanjay; Chen, Zhiyuan
    Navigating through multidimensional data cubes is a nontrivial task. Although On-Line Analytical Processing (OLAP) provides the capability to view multidimensional data through rollup, drill-down, and slicing-dicing, it offers minimal guidance to end users in the actual knowledge discovery process. In this article, we address this knowledge discovery problem by identifying novel and useful patterns concealed in multidimensional data that are used for effective exploration of data cubes. We present an algorithm for the DIscovery of Sk-NAvigation Rules (DISNAR), which discovers the hidden interesting patterns in the form of Sk-navigation rules using a test of skewness on the pairs of the current and its candidate drill-down lattice nodes. The rules then are used to enhance navigational capabilities, as illustrated by our rule-driven system. Extensive experimental analysis shows that the DISNAR algorithm discovers the interesting patterns with a high recall and precision with small execution time and low space overhead.
  • Item
    A Secure Face Recognition System for Mobile-devices without The Need of Decryption
    Mukherjee, Shibnath; Chen, Zhiyuan; Gangopadhyay, Aryya; Russell, Stephen
    Face recognition technology has received much attention due to its application in defense and crime prevention. In such applications, there is great need to incorporate face recognition technologies onto mobile devices to allow onthe-spot field usage. However there are four major problems that need to be solved, namely the limited storage and processing power of the mobile device, connection instability, security and privacy concerns, and limited network bandwidth. Existing methods do not solve all the problems. This paper addresses all of the above problems holistically by proposing a novel approach. The core of the approach is a DCT-based compression method. This method has high compression ratio such that the compressed image database can be easily stored at a mobile device. Further, face recognition algorithms can be run directly on the compressed database without decompression, which enables on-the-spot field usage. The overhead of network transfer is also greatly reduced due to compression. The security and privacy issue is addressed by pruning most DCT coefficients of images and by a random permutation protocol. As a result, the reconstructed images are not visually recognizable even if the permutation is known. Additional security can also be provided by encrypting the coefficients for network transfer. The system has been implemented on a commercially available general purpose PDA-phone and experimental results demonstrate the potential of the proposed solution.
  • Item
    Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
    (2025-03-08) Khan, Md Azim; Gangopadhyay, Aryya; Wang, Jianwu; Erbacher, Robert F.
    Situational awareness applications rely heavily on real-time processing of visual and textual data to provide actionable insights. Vision language models (VLMs) have become essential tools for interpreting complex environments by connecting visual inputs with natural language descriptions. However, these models often face computational challenges, especially when required to perform efficiently in real environments. This research presents a novel vision language model (VLM) framework that leverages frequency domain transformations and low-rank adaptation (LoRA) to enhance feature extraction, scalability, and efficiency. Unlike traditional VLMs, which rely solely on spatial-domain representations, our approach incorporates Discrete Fourier Transform (DFT) based low-rank features while retaining pretrained spatial weights, enabling robust performance in noisy or low visibility scenarios. We evaluated the proposed model on caption generation and Visual Question Answering (VQA) tasks using benchmark datasets with varying levels of Gaussian noise. Quantitative results demonstrate that our model achieves evaluation metrics comparable to state-of-the-art VLMs, such as CLIP ViT-L/14 and SigLIP. Qualitative analysis further reveals that our model provides more detailed and contextually relevant responses, particularly for real-world images captured by a RealSense camera mounted on an Unmanned Ground Vehicle (UGV).
  • Item
    Impact of increased anthropogenic Amazon wildfires on Antarctic Sea ice melt via albedo reduction
    (Cambridge University Press, 2025-03-10) Chakraborty, Sudip; Devnath, Maloy Kumar; Jabeli, Atefeh; Kulkarni, Chhaya; Boteju, Gehan; Wang, Jianwu; Janeja, Vandana
    This study shows the impact of black carbon (BC) aerosol atmospheric rivers (AAR) on the Antarctic Sea ice retreat. We detect that a higher number of BC AARs arrived in the Antarctic region due to increased anthropogenic wildfire activities in 2019 in the Amazon compared to 2018. Our analyses suggest that the BC AARs led to a reduction in the sea ice albedo, increased the amount of sunlight absorbed at the surface, and a significant reduction of sea ice over the Weddell, Ross Sea (Ross), and Indian Ocean (IO) regions in 2019. The Weddell region experienced the largest amount of sea ice retreat (~ 33,000 km²) during the presence of BC AARs as compared to ~13,000 km² during non-BC days. We used a suite of data science techniques, including random forest, elastic net regression, matrix profile, canonical correlations, and causal discovery analyses, to discover the effects and validate them. Random forest, elastic net regression, and causal discovery analyses show that the shortwave upward radiative flux or the reflected sunlight, temperature, and longwave upward energy from the earth are the most important features that affect sea ice extent. Canonical correlation analysis confirms that aerosol optical depth is negatively correlated with albedo, positively correlated with shortwave energy absorbed at the surface, and negatively correlated with Sea Ice Extent. The relationship is stronger in 2019 than in 2018. This study also employs the matrix profile and convolution operation of the Convolution Neural Network (CNN) to detect anomalous events in sea ice loss. These methods show that a higher amount of anomalous melting events were detected over the Weddell and Ross regions.
  • Item
    Correlation to Causation: A Causal Deep Learning Framework for Arctic Sea Ice Prediction
    (2025-03-03) Hossain, Emam; Ferdous, Muhammad Hasan; Wang, Jianwu; Subramanian, Aneesh; Gani, Md Osman
    Traditional machine learning and deep learning techniques rely on correlation-based learning, often failing to distinguish spurious associations from true causal relationships, which limits robustness, interpretability, and generalizability. To address these challenges, we propose a causality-driven deep learning framework that integrates Multivariate Granger Causality (MVGC) and PCMCI+ causal discovery algorithms with a hybrid deep learning architecture. Using 43 years (1979-2021) of daily and monthly Arctic Sea Ice Extent (SIE) and ocean-atmospheric datasets, our approach identifies causally significant factors, prioritizes features with direct influence, reduces feature overhead, and improves computational efficiency. Experiments demonstrate that integrating causal features enhances the deep learning model's predictive accuracy and interpretability across multiple lead times. Beyond SIE prediction, the proposed framework offers a scalable solution for dynamic, high-dimensional systems, advancing both theoretical understanding and practical applications in predictive modeling.
  • Item
    CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
    (2025-03-19) Ahmed, Masud; Hasan, Zahid; Haque, Syed Arefinul; Faridee, Abu Zaher Md; Purushotham, Sanjay; You, Suya; Roy, Nirmalya
    Traditional transformer-based semantic segmentation relies on quantized embeddings. However, our analysis reveals that autoencoder accuracy on segmentation mask using quantized embeddings (e.g. VQ-VAE) is 8% lower than continuous-valued embeddings (e.g. KL-VAE). Motivated by this, we propose a continuous-valued embedding framework for semantic segmentation. By reformulating semantic mask generation as a continuous image-to-embedding diffusion process, our approach eliminates the need for discrete latent representations while preserving fine-grained spatial and semantic details. Our key contribution includes a diffusion-guided autoregressive transformer that learns a continuous semantic embedding space by modeling long-range dependencies in image features. Our framework contains a unified architecture combining a VAE encoder for continuous feature extraction, a diffusion-guided transformer for conditioned embedding generation, and a VAE decoder for semantic mask reconstruction. Our setting facilitates zero-shot domain adaptation capabilities enabled by the continuity of the embedding space. Experiments across diverse datasets (e.g., Cityscapes and domain-shifted variants) demonstrate state-of-the-art robustness to distribution shifts, including adverse weather (e.g., fog, snow) and viewpoint variations. Our model also exhibits strong noise resilience, achieving robust performance (≈ 95% AP compared to baseline) under gaussian noise, moderate motion blur, and moderate brightness/contrast variations, while experiencing only a moderate impact (≈ 90% AP compared to baseline) from 50% salt and pepper noise, saturation and hue shifts. Code available: this https URL
  • Item
    VIVAR: learning view-invariant embedding for video action recognition
    (SPIE, 2025-03-10) Hasan, Zahid; Ahmed, Masud; Faridee, Abu Zaher Md; Purushotham, Sanjay; Lee, Hyungtae; Kwon, Heesung; Roy, Nirmalya
    Deep learning has achieved state-of-the-art video action recognition (VAR) performance by comprehending action-related features from raw video. However, these models often learn to jointly encode auxiliary view (viewpoints and sensor properties) information with primary action features, leading to performance degradation under novel views and security concerns by revealing sensor types and locations. Here, we systematically study these shortcomings of VAR models and develop a novel approach, VIVAR, to learn view-invariant spatiotemporal action features removing view information. In particular, we leverage contrastive learning to separate actions and jointly optimize adversarial loss that aligns view distributions to remove auxiliary view information in the deep embedding space using the unlabeled synchronous multiview (MV) video to learn view-invariant VAR system. We evaluate VIVAR using our in-house large-scale time synchronous MV video dataset containing 10 actions with three angular viewpoints and sensors in diverse environments. VIVAR successfully captures view-invariant action features, improves inter and intra-action clusters’ quality, and outperforms SoTA models consistently with 8% more accuracy. We additionally perform extensive studies with our datasets, model architectures, multiple contrastive learning, and view distribution alignments to provide VIVAR insights. We open-source our code and dataset to facilitate further research in view-invariant systems.
  • Item
    DACC-Comm: DNN-Powered Adaptive Compression and Flow Control for Robust Communication in Network-Constrained Environment
    (IEEE, 2025-01) Dey, Emon; Ravi, Anuradha; Lewis, Jared; Kumar, Vinay Krishna; Freeman, Jade; Gregory, Timothy; Suri, Niranjan; Busart, Carl; Roy, Nirmalya
    Robust communication is vital for multi-agent robotic systems involving heterogeneous agents like Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) operating in dynamic and contested environments. These agents often communicate to collaboratively execute critical tasks for perception awareness and are faced with different communication challenges: (a) The disparity in velocity between these agents results in rapidly changing distances, in turn affecting the physical channel parameters such as Received Signal Strength Indicator (RSSI), data rate (applicable for certain networks) and most importantly "reliable data transfer", (b) As these devices work in outdoor and network-deprived environments, they tend to use proprietary network technologies with low frequencies to communicate long range, which tremendously reduces the available bandwidth. This poses a challenge when sending large amounts of data for time-critical applications. To mitigate the above challenges, we propose DACC-Comm, an adaptive flow control and compression sensing framework to dynamically adjust the receiver window size and selectively sample the image pixels based on various network parameters such as latency, data rate, RSSI, and physiological factors such as the variation in movement speed between devices. DACC-Comm employs state-of-the-art DNN (TABNET) to optimize the payload and reduce the retransmissions in the network, in turn maintaining low latency. The multi-head transformer-based prediction model takes the network parameters and physiological factors as input and outputs (a) an optimal receiver window size for TCP, determining how many bytes can be sent without the sender waiting for an acknowledgment (ACK) from the receiver, (b) a compression ratio to sample a subset of pixels from an image. We propose a novel sampling strategy to select the image pixels, which are then encoded using a feature extractor. To optimize the amount of data sent across the network, the extracted feature is further quantized to INT8 with the help of post-training quantization. We evaluate DACC-Comm on an experimental testbed comprising Jackal and ROSMaster2 UGV devices that communicate image features using a proprietary radio (Doodle) in 915-MHz frequency. We demonstrate that DACC-Comm improves the retransmission rate by ≈17% and reduces the overall latency by ≈12%. The novel compression sensing strategy reduces the overall payload by ≈56%.
  • Item
    Accurate and Interpretable Radar Quantitative Precipitation Estimation with Symbolic Regression
    (IEEE, 2025-01-16) Zhang, Olivia; Grissom, Brianna; Pulido, Julian; Munoz-Ordaz, Kenia; He, Jonathan; Cham, Mostafa; Jing, Haotong; Qian, Weikang; Wen, Yixin; Wang, Jianwu
    Accurate quantitative precipitation estimation (QPE) is essential for managing water resources, monitoring flash floods, creating hydrological models, and more. Traditional methods of obtaining precipitation data from rain gauges and radars have limitations such as sparse coverage and inaccurate estimates for different precipitation types and intensities. Symbolic regression, a machine learning method that generates mathematical equations fitting the data, presents a unique approach to estimating precipitation that is both accurate and interpretable. Using WSR-88D dual-polarimetric radar data from Oklahoma and Florida over three dates, we tested symbolic regression models involving genetic programming and deep learning, symbolic regression on separate clusters of the data, and the incorporation of knowledge-based loss terms into the loss function. We found that symbolic regression is both accurate in estimating rainfall and interpretable through learned equations. Accuracy and simplicity of the learned equations can be slightly improved by clustering the data based on select radar variables and by adjusting the loss function with knowledge-based loss terms. This research provides insights into improving QPE accuracy through interpretable symbolic regression methods
  • Item
    Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment
    (2024-10-23) Ghosh, Indrajeet; Chugh, Garvit; Faridee, Abu Zaher Md; Roy, Nirmalya
    Recent advancements in deep learning-based wearable human action recognition (wHAR) have improved the capture and classification of complex motions, but adoption remains limited due to the lack of expert annotations and domain discrepancies from user variations. Limited annotations hinder the model's ability to generalize to out-of-distribution samples. While data augmentation can improve generalizability, unsupervised augmentation techniques must be applied carefully to avoid introducing noise. Unsupervised domain adaptation (UDA) addresses domain discrepancies by aligning conditional distributions with labeled target samples, but vanilla pseudo-labeling can lead to error propagation. To address these challenges, we propose μDAR, a novel joint optimization architecture comprised of three functions: (i) consistency regularizer between augmented samples to improve model classification generalizability, (ii) temporal ensemble for robust pseudo-label generation and (iii) conditional distribution alignment to improve domain generalizability. The temporal ensemble works by aggregating predictions from past epochs to smooth out noisy pseudo-label predictions, which are then used in the conditional distribution alignment module to minimize kernel-based class-wise conditional maximum mean discrepancy (kCMMD) between the source and target feature space to learn a domain invariant embedding. The consistency-regularized augmentations ensure that multiple augmentations of the same sample share the same labels; this results in (a) strong generalization with limited source domain samples and (b) consistent pseudo-label generation in target samples. The novel integration of these three modules in μDAR results in a range of ≈4-12% average macro-F1 score improvement over six state-of-the-art UDA methods in four benchmark wHAR datasets
  • Item
    SERN: Simulation-Enhanced Realistic Navigation for Multi-Agent Robotic Systems in Contested Environments
    (2024-10-22) Hossain, Jumman; Dey, Emon; Chugh, Snehalraj; Ahmed, Masud; Anwar,Mohammad Saeid; Faridee, Abu Zaher Md; Hoppes, Jason; Trout, Theron; Basak, Anjon; Chowdhury, Rafidh; Mistry, Rishabh; Kim, Hyun; Freeman, Jade; Suri, Niranjan; Raglin, Adrienne; Busart, Carl; Gregory, Timothy; Ravi, Anuradha; Roy, Nirmalya
    The increasing deployment of autonomous systems in complex environments necessitates efficient communication and task completion among multiple agents. This paper presents SERN (Simulation-Enhanced Realistic Navigation), a novel framework integrating virtual and physical environments for real-time collaborative decision-making in multi-robot systems. SERN addresses key challenges in asset deployment and coordination through a bi-directional communication framework using the AuroraXR ROS Bridge. Our approach advances the SOTA through accurate real-world representation in virtual environments using Unity high-fidelity simulator; synchronization of physical and virtual robot movements; efficient ROS data distribution between remote locations; and integration of SOTA semantic segmentation for enhanced environmental perception. Our evaluations show a 15% to 24% improvement in latency and up to a 15% increase in processing efficiency compared to traditional ROS setups. Real-world and virtual simulation experiments with multiple robots demonstrate synchronization accuracy, achieving less than 5 cm positional error and under 2-degree rotational error. These results highlight SERN's potential to enhance situational awareness and multi-agent coordination in diverse, contested environments.
  • Item
    Tutorial on Causal Inference with Spatiotemporal Data
    (ACM, 2024-11-04) Ali, Sahara; Wang, Jianwu
    Spatiotemporal data, which captures how variables evolve across space and time, is ubiquitous in fields such as environmental science, epidemiology, and urban planning. However, identifying causal relationships in these datasets is challenging due to the presence of spatial dependencies, temporal autocorrelation, and confounding factors. This tutorial provides a comprehensive introduction to spatiotemporal causal inference, offering both theoretical foundations and practical guidance for researchers and practitioners. We explore key concepts such as causal inference frameworks, the impact of confounding in spatiotemporal settings, and the challenges posed by spatial and temporal dependencies. The paper covers synthetic spatiotemporal benchmark data generation, widely used spatiotemporal causal inference techniques, including regression-based, propensity score-based, and deep learning-based methods, and demonstrates their application using synthetic datasets. Through step-by-step examples, readers will gain a clear understanding of how to address common challenges and apply causal inference techniques to spatiotemporal data. This tutorial serves as a valuable resource for those looking to improve the rigor and reliability of their causal analyses in spatiotemporal contexts.
  • Item
    Deep Learning-Based Joint Channel Equalization and Symbol Detection for Air-Water Optoacoustic Communications
    (IEEE, 2024-10-14) Mahmud, Muntasir; Younis, Mohamed; Ahmed, Masud; Choa, Fow-Sen
    The optoacoustic effect is triggered by directing an optical signal in the air (using laser) to the surface of water, leading to the generation of a corresponding acoustic signal inside the water. Careful modulation of the laser signal would enable an innovative method for direct communication in air-water cross-medium scenarios experienced in many civil and military applications. In order to achieve a high data rate, a multilevel amplitude modulation scheme can be used to generate different acoustic signals to transmit multiple symbols. However, accurately demodulating these acoustic signals can be challenging due to multipath propagation within the harsh underwater environment, inducing inter-symbol interferences. This paper proposes a deep learning-based demodulation technique that uses a U-Net for signal equalization and a Residual Neural Network for symbol detection. In addition, fine-tuning at the receiver side is also considered to increase the demodulation robustness. The proposed deep learning model has been trained with our laboratory constructed dataset containing eight levels of optoacoustic signals captured from three different underwater positions. The model is validated using two datasets containing severe interference due to multipath-generated echoes and reverberations. The results show that our demodulation model achieves 96.6% and 91.7% accuracy for the two datasets, respectively, which significantly surpasses the 72.9% and 65.30% accuracy achieved by the conventional peak detection-based technique.
  • Item
    Let Students Take the Wheel: Introducing Post-Quantum Cryptography with Active Learning
    (2024-10-17) Jamshidi, Ainaz; Kaur, Khushdeep; Gangopadhyay, Aryya; Zhang, Lei
    Quantum computing presents a double-edged sword: while it has the potential to revolutionize fields such as artificial intelligence, optimization, healthcare, and so on, it simultaneously poses a threat to current cryptographic systems, such as public-key encryption. To address this threat, post-quantum cryptography (PQC) has been identified as the solution to secure existing software systems, promoting a national initiative to prepare the next generation with the necessary knowledge and skills. However, PQC is an emerging interdisciplinary topic, presenting significant challenges for educators and learners. This research proposes a novel active learning approach and assesses the best practices for teaching PQC to undergraduate and graduate students in the discipline of information systems. Our contributions are two-fold. First, we compare two instructional methods: 1) traditional faculty-led lectures and 2) student-led seminars, both integrated with active learning techniques such as hands-on coding exercises and Kahoot games. The effectiveness of these methods is evaluated through student assessments and surveys. Second, we have published our lecture video, slides, and findings so that other researchers and educators can reuse the courseware and materials to develop their own PQC learning modules. We employ statistical analysis (e.g., t-test and chi-square test) to compare the learning outcomes and students' feedback between the two learning methods in each course. Our findings suggest that student-led seminars significantly enhance learning outcomes, particularly for graduate students, where a notable improvement in comprehension and engagement is observed. Moving forward, we aim to scale these modules to diverse educational contexts and explore additional active learning and experiential learning strategies for teaching complex concepts of quantum information science.
  • Item
    Investigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features
    (2024-09-09) Khanjani, Zahra; Ale, Tolulope; Wang, Jianwu; Davis, Lavon; Mallinson, Christine; Janeja, Vandana
    Several types of spoofed audio, such as mimicry, replay attacks, and deepfakes, have created societal challenges to information integrity. Recently, researchers have worked with sociolinguistics experts to label spoofed audio samples with Expert Defined Linguistic Features (EDLFs) that can be discerned by the human ear: pitch, pause, word-initial and word-final release bursts of consonant stops, audible intake or outtake of breath, and overall audio quality. It is established that there is an improvement in several deepfake detection algorithms when they augmented the traditional and common features of audio data with these EDLFs. In this paper, using a hybrid dataset comprised of multiple types of spoofed audio augmented with sociolinguistic annotations, we investigate causal discovery and inferences between the discernible linguistic features and the label in the audio clips, comparing the findings of the causal models with the expert ground truth validation labeling process. Our findings suggest that the causal models indicate the utility of incorporating linguistic features to help discern spoofed audio, as well as the overall need and opportunity to incorporate human knowledge into models and techniques for strengthening AI models. The causal discovery and inference can be used as a foundation of training humans to discern spoofed audio as well as automating EDLFs labeling for the purpose of performance improvement of the common AI-based spoofed audio detectors.
  • Item
    Atmospheric Gravity Wave Detection Using Transfer Learning Techniques
    (IEEE, 2022-12) González, Jorge López; Chapman, Theodore; Chen, Kathryn; Nguyen, Hannah; Chambers, Logan; Mostafa, Seraj Al Mahmud; Wang, Jianwu; Purushotham, Sanjay; Wang, Chenxi; Yue, Jia
    Atmospheric gravity waves are produced when gravity attempts to restore disturbances through stable layers in the atmosphere. They have a visible effect on many atmospheric phenomena such as global circulation and air turbulence. Despite their importance, however, little research has been conducted on how to detect gravity waves using machine learning algorithms. We faced two major challenges in our research: our raw data had a lot of noise and the labeled dataset was extremely small. In this study, we explored various methods of preprocessing and transfer learning in order to address those challenges. We pre-trained an autoencoder on unlabeled data before training it to classify labeled data. We also created a custom CNN by combining certain pre-trained layers from the InceptionV3 Model trained on ImageNet with custom layers and a custom learning rate scheduler. Experiments show that our best model outperformed the best performing baseline model by 6.36% in terms of test accuracy.