UMBC Information Systems Department
Permanent URI for this collectionhttp://hdl.handle.net/11603/51
Browse
Recent Submissions
Item Attack Detection and Optimal Deployment for Underwater Constrained Wireless Sensor Networks via Hybrid Trust Evidence(IEEE, 2025) Jiang, Bin; Zhou, Ronghao; Luo, Fei; Cui, Xuerong; Wang, Huihui Helen; Song, HoubingUnderwater wireless sensor networks have been widely used in the acquisition and processing of oceanic information. The marine environment is complex and changeable, and the existence of obstacles is the main manifestation of the complex underwater environment, which affect the communication between underwater nodes. In addition, wireless sensor networks with obstacles are often more vulnerable to various attacks, making it more fragile. In order to address the aforementioned issues, we firstly propose a underwater wireless sensor deployment strategy with obstacle avoidance as the target (GEHO). After that, we use Tabtransformer algorithm to build trust model and detect attacks according to trust data set, which can enhance the robustness of the entire wireless sensor network. In the final stage, we collect the patterns of malicious attacks on nodes according to the detection results, which is convenient for us to make timely responses and reduce the losses of underwater acoustic sensor networks due to malicious attacks. The simulation results show that the trust model can effectively detect malicious nodes and attack types in the network, and has higher detection accuracy than the existing trust model.Item Support Vector Machine for Predicting Student Dropout Under Different Normalization Methods(IEEE, 2025-01-16) Boteju, Gehan; Tang, Leon; Brown, Michael ScottStudent dropout in universities brings significant challenges that impacts both individual futures and institutional effectiveness. Early prediction of potential dropouts is crucial for timely intervention, but it is complex because of the nature of the problem influenced by diverse socioeconomic factors. This paper utlizies Support Vector Machines (SVMs) to predict student dropout with an emphasis on exploring the efficacy of various data normalization methods to optimize prediction accuracy. Using a dataset from the UC Irvine repository, this study compares 9 different normalization techniques such as Min Max Scaler, Standard Scaler, and Power Transformer, among others, to determine their impact on the predictive performance of SVMs. Results demonstrate substantial variations in model accuracy depending on the normalization method used to show the importance of detailed selection of data preprocessing techniques. The best normalization method was the One Hot Scaler which produced an average F1 score of 0.779. This work enhances the ability to identify at-risk students earlier but also the understanding of how data normalization influences predictive modeling in educational settings.Item Qualitative Research Methods in Software Engineering: Past, Present, and Future(IEEE, 2025) Seaman, Carolyn; Hoda, Rashina; Feldt, RobertThe paper entitled “Qualitative Methods in Empirical Studies of Software Engineering” by Carolyn Seaman was published in TSE in 1999. It has been chosen as one of the most influential papers from the third decade of TSE’s 50 years history. In this retrospective, the authors discuss the evolution of the use of qualitative methods in software engineering research, the impact it’s had on research and practice, and reflections on what is coming and deserves attention.Item Supporting Campus Activism through Creating DIY-AT in a Social Justice Aligned Makerspace(ACM, 2025-01-31) Higgins, Erin; Oliver, Zaria; Hamidi, FoadUtilizing digital fabrication methods (e.g., 3D printing) has exciting implications for the design and production of customized assistive technology (AT). However, utilizing these tools currently requires a high level of technical expertise as well as time and money investments. Furthermore, facilitating collaboration between end users and makers needs effective and inclusive approaches with shared language and support for asynchronous, dispersed communication of design requirements. While these Do-It-Yourself (DIY) approaches are shown to support end-user agency and furthering technology democratization, research has to yet explore how they can further align with social justice values and practices. We explored these possibilities by facilitating DIY-AT design with students with disabilities, activist staff members, and community members within a university makerspace. By explicitly encouraging participants to consider social justice issues important to them as they engaged in DIY-AT design, we studied the considerations and supports needed for facilitating flexible co-design activities and broader conversations about accessibility barriers at the university. Adopting a transdisciplinary approach, we offer lessons learned about the potential of co-designing DIY-ATs as a way to investigate questions of social justice, inclusion, and access in academic contexts. We show how these created DIY-ATs can be leveraged by students and staff as tangible artifacts to encourage more funding and support from university administration for accessibility initiatives.Item Greenland Ice Sheet Wide Supraglacial Lake Evolution and Dynamics: Insights From the 2018 and 2019 Melt Seasons(AGU, 2025-02-21) Dunmire, Devon; Subramanian, Aneesh C.; Hossain, Emam; Gani, Md Osman; Banwell, Alison F.; Younas, Hammad; Myers, BrendanSupraglacial lakes on the Greenland Ice Sheet (GrIS) can impact both the ice sheet surface mass balance and ice dynamics. Thus, understanding the evolution and dynamics of supraglacial lakes is important to provide improved parameterizations for ice sheet models to enable better projections of future GrIS changes. In this study, we utilize the growing inventory of optical and microwave satellite imagery to automatically determine the fate of Greenland-wide supraglacial lakes during 2018 and 2019; low and high melt seasons respectively. We develop a novel time series classification method to categorize lakes into four classes: (a) Refreezing, (b) rapidly draining, (c) slowly draining, and (d) buried. Our findings reveal significant interannual variability between the two melt seasons, with a notable increase in the proportion of draining lakes, and a particular dominance of slowly draining lakes, in 2019. We also find that as mean lake depth increases, so does the percentage of lakes that drain, indicating that lake depth may influence hydrofracture potential. We further observe rapidly draining lakes at higher elevations than the previously hypothesized upper-elevation hydrofracture limit (1,600 m), and that non-draining lakes are generally deeper during the lower melt 2018 season. Our automatic classification approach and the resulting 2-year ice-sheet-wide data set provide new insights into GrIS supraglacial lake dynamics and evolution, offering a valuable resource for future research.Item A LSTM with Dual-stage Attention Method to Predict Amine Emissions for Carbon Dioxide Capture and Storage(IEEE, 2025-01-16) Rapelli, Sai Rajesh; Chen, Zhiyuan; Lu, WeiTo mitigate climate change impacts, carbon capture technologies have been implemented at significant CO2 emission points, such as industrial sites and electric power generation facilities. Solvent-based carbon capture solutions are pivotal in reducing atmospheric CO2 levels and enhancing air quality by capturing harmful pollutants. Amine-based solvents, favored for their efficiency in post-combustion CO2 capture, are susceptible to thermal and oxidative degradation, leading to complex emissions profiles that demand comprehensive management strategies. We develop a Machine Learning model designed to predict future amine emissions in real-time, thereby assisting in the formulation of mitigation strategies required for the operation of capture plants. We conducted an experiment using data from test campaigns run at the Technology Centre Mongstad (TCM). We employed a Long Short-Term Memory (LSTM) autoencoder model with dual-stage attention mechanisms to predict amine emissions using historical data. The results were quite promising: we achieved a mean absolute percentage error ranging from 5.8% to 6.8% percent for the real-time prediction of amine emissions. The results are better than existing approaches using simpler machine learning models as well as the standard LSTM autoencoder model.Item SEALM: Semantically Enriched Attributes with Language Models for Linkage Recommendation(2025-02-02) Traeger, Leonard; Behrend, Andreas; Karabatis, GeorgeMatching attributes from different repositories is an important step in the process of schema integration to consolidate heterogeneous data silos. In order to recommend linkages between relevant attributes, a contextually rich representation of each attribute is quite essential, particularly when more than two database schemas are to be integrated. This paper introduces the SEALM approach to generate a data catalog of semantically rich attribute descriptions using Generative Language Models based on a new technique that employs six variations of available metadata information. Instead of using raw attribute metadata, we generate SEALM descriptions, which are used to recommend linkages with an unsupervised matching pipeline that involves a novel multi-source Blocking algorithm. Experiments on multiple schemas yield a 5% to 20% recall improvement in recommending linkages with SEALM-based attribute descriptions generated by the tiniest Llama3.1:8B model compared to existing techniques. With SEALM, we only need to process the small fraction of attributes to be integrated rather than exhaustively inspecting all combinations of potential linkages.Item Long-Tailed Federated Learning in Internet of Medical Things Based on Ensemble Distillation and Imbalanced Calibration(IEEE, 2025-01-31) Jiang, Bin; Shang, Yuchen; Yue, Guanghui; Wang, Huihui Helen; Song, HoubingThe Internet of Medical Things (IoMT) has a promising future, as its devices can monitor vital signs, offer treatment guidance, and perform real-time diagnostics using AI and wireless communication technologies. However, due to the difficulty of collecting patient data on a large scale and potential privacy risks, traditional centralized machine learning methods are often challenging to apply in IoMT devices. Federated learning, as a privacy-preserving technology, aims to build high-quality deep learning models across distributed clients while protecting data privacy. However, current popular federated learning methods exhibit suboptimal performance when dealing with non-IIDness data, especially in the case of long-tail class distributions, leading to unsatisfactory results. Additionally, due to privacy constraints on distributed clients, these methods cannot leverage traditional deep learning techniques to handle long-tail data, which is often characterized by long-tail heterogeneous distributions in IoMT. To address these challenges, this paper proposes a solution of Privacy-preserving Computing Client Scoring and Knowledge Distillation (FedLT+SKD). The method uses privacy protection computation to provide prior knowledge of global data class distribution while ensuring data privacy. Based on this prior knowledge, it employs a points-based sampling strategy to identify clients that perform well on long tail data and uploads their local model to the server. On the server side, the robustness of the global model is enhanced by collection distillation and imbalance correction. We verify the effectiveness of this method on the medical datasets ISIC, ChestX-ray14, MRI and also on the traditional datasets CIFAR-10-LT and CIFAR-100-LT, and the experimental results show that the method is superior to the popular federation and long-tail learning methods.Item Lightweight and Robust Key Agreement for Securing IIoT-Driven Flexible Manufacturing Systems(IEEE, 2025-01-28) Hammad, Muhammad; Badshah, Akhtar; Almeer, Mohammed A.; Waqas, Muhammad; Song, Houbing; Chen, Sheng; Han, ZhuThe ever-evolving Internet of Things (IoT) has ushered in a new era of intelligent manufacturing across multiple industries. However, the security and privacy of real-time data transmitted over the public channel of the industrial IoT (IIoT) remain formidable challenges. Existing lightweight protocols often omit one or more critical security features, such as anonymity and untraceability, and are susceptible to threats like desynchronization attacks. Additionally, they struggle to achieve an optimal balance between robust security and performance efficiency. To bridge these gaps, we introduce a new lightweight key agreement security scheme that guarantees secure access to the IIoT-enabled flexible manufacturing system (FMS). The strength of our scheme lies in its utilization of the authenticated encryption with associative data (AEAD) primitive, AEGIS, along with hash functions and physical unclonable functions, which secure the IIoT ecosystem. Additionally, our scheme offers flexibility in the form of the addition of new machines, password updates, and revocation in cases of theft or loss. A comprehensive security analysis demonstrates the efficacy of the proposed scheme in thwarting various attacks. The formal analysis, based on the Real-Or-Random (RoR) model, ensures session key indistinguishability, while the informal analysis highlights its resilience against known attacks. The comparative assessment demonstrate that the proposed scheme consistently outperforms the benchmark schemes across multiple dimensions, including security and functionality features, computational and communication overheads, and runtime efficiency. Specifically, the proposed scheme achieves peak performance enhancements of 77.55%, 44.73%, and 69.6% in computational overhead, runtime overhead, and communication overhead, respectively, underscoring its substantial performance advantages.Item Neurosymbolic AI for Travel Demand Prediction: Integrating Decision Tree Rules into Neural Networks(2025-02-02) Acharya, Kamal; Lad, Mehul; Sun, Liang; Song, HoubingTravel demand prediction is crucial for optimizing transportation planning, resource allocation, and infrastructure development, ensuring efficient mobility and economic sustainability. This study introduces a Neurosymbolic Artificial Intelligence (Neurosymbolic AI) framework that integrates decision tree (DT)-based symbolic rules with neural networks (NNs) to predict travel demand, leveraging the interpretability of symbolic reasoning and the predictive power of neural learning. The framework utilizes data from diverse sources, including geospatial, economic, and mobility datasets, to build a comprehensive feature set. DTs are employed to extract interpretable if-then rules that capture key patterns, which are then incorporated as additional features into a NN to enhance its predictive capabilities. Experimental results show that the combined dataset, enriched with symbolic rules, consistently outperforms standalone datasets across multiple evaluation metrics, including Mean Absolute Error (MAE), \(R^2\), and Common Part of Commuters (CPC). Rules selected at finer variance thresholds (e.g., 0.0001) demonstrate superior effectiveness in capturing nuanced relationships, reducing prediction errors, and aligning with observed commuter patterns. By merging symbolic and neural learning paradigms, this Neurosymbolic approach achieves both interpretability and accuracy.Item Accurate and Interpretable Radar Quantitative Precipitation Estimation with Symbolic Regression(IEEE, 2025-01-16) Zhang, Olivia; Grissom, Brianna; Pulido, Julian; Munoz-Ordaz, Kenia; He, Jonathan; Cham, Mostafa; Jing, Haotong; Qian, Weikang; Wen, Yixin; Wang, JianwuAccurate quantitative precipitation estimation (QPE) is essential for managing water resources, monitoring flash floods, creating hydrological models, and more. Traditional methods of obtaining precipitation data from rain gauges and radars have limitations such as sparse coverage and inaccurate estimates for different precipitation types and intensities. Symbolic regression, a machine learning method that generates mathematical equations fitting the data, presents a unique approach to estimating precipitation that is both accurate and interpretable. Using WSR-88D dual-polarimetric radar data from Oklahoma and Florida over three dates, we tested symbolic regression models involving genetic programming and deep learning, symbolic regression on separate clusters of the data, and the incorporation of knowledge-based loss terms into the loss function. We found that symbolic regression is both accurate in estimating rainfall and interpretable through learned equations. Accuracy and simplicity of the learned equations can be slightly improved by clustering the data based on select radar variables and by adjusting the loss function with knowledge-based loss terms. This research provides insights into improving QPE accuracy through interpretable symbolic regression methodsItem Can Generative AI be Egalitarian?(IEEE, 2024-10) Feldman, Philip; Foulds, James; Pan, ShimeiThe recent explosion of “foundation” generative AI models has been built upon the extensive extraction of value from online sources, often without corresponding reciprocation. This pattern mirrors and intensifies the extractive practices of surveillance capitalism [46], while the potential for enormous profit has challenged technology organizations’ commitments to responsible AI practices, raising significant ethical and societal concerns. However, a promising alternative is emerging: the development of models that rely on content willingly and collaboratively provided by users. This article explores this “egalitarian” approach to generative AI, taking inspiration from the successful model of Wikipedia. We explore the potential implications of this approach for the design, development, and constraints of future foundation models. We argue that such an approach is not only ethically sound but may also lead to models that are more responsive to user needs, more diverse in their training data, and ultimately more aligned with societal values. Furthermore, we explore potential challenges and limitations of this approach, including issues of scalability, quality control, and potential biases inherent in volunteercontributed content.Item Fair Inference for Discrete Latent Variable Models: An Intersectional Approach(ACM, 2024-09-04) Islam, Rashidul; Pan, Shimei; Foulds, JamesIt is now widely acknowledged that machine learning models, trained on data without due care, often exhibit discriminatory behavior. Traditional fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. This paper, however, takes a different approach by investigating fairness in unsupervised learning using graphical models with discrete latent variables. We develop a fair stochastic variational inference method for discrete latent variables. Our approach uses a fairness penalty on the variational distribution that reflects the principles of intersectionality, a comprehensive perspective on fairness from the fields of law, social sciences, and humanities. Intersectional fairness brings the challenge of data sparsity in minibatches, which we address via a stochastic approximation approach. We first show the utility of our method in improving equity and fairness for clustering using naïve Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a specialized graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases.Item ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital Twins(2025-01-15) Hakim, Safayat Bin; Adil, Muhammad; Velasquez, Alvaro; Song, HoubingIn this paper, we propose an Adaptive Neuro-Symbolic Learning Framework for digital twin technology called ``ANSR-DT." Our approach combines pattern recognition algorithms with reinforcement learning and symbolic reasoning to enable real-time learning and adaptive intelligence. This integration enhances the understanding of the environment and promotes continuous learning, leading to better and more effective decision-making in real-time for applications that require human-machine collaboration. We evaluated the \textit{ANSR-DT} framework for its ability to learn and adapt to dynamic patterns, observing significant improvements in decision accuracy, reliability, and interpretability when compared to existing state-of-the-art methods. However, challenges still exist in extracting and integrating symbolic rules in complex environments, which limits the full potential of our framework in heterogeneous settings. Moreover, our ongoing research aims to address this issue in the future by ensuring seamless integration of neural models at large. In addition, our open-source implementation promotes reproducibility and encourages future research to build on our foundational work.Item Understanding the Challenges of Maker Entrepreneurship(ACM, 2025-01-23) Friedman, Natalie; Bremers, Alexandra; Nyanyo, Adelaide; Clark, Ian; Kotturi, Yasmine; Dabbish, Laura; Ju, Wendy; Martelaro, NikolasThe maker movement embodies a resurgence in DIY creation, merging physical craftsmanship and arts with digital technology support. However, mere technological skills and creativity are insufficient for economically and psychologically sustainable practice. By illuminating and smoothing the path from ``maker" to ``maker entrepreneur," we can help broaden the viability of making as a livelihood. Our research centers on makers who design, produce, and sell physical goods. In this work, we explore the transition to entrepreneurship for these makers and how technology can facilitate this transition online and offline. We present results from interviews with 20 USA-based maker entrepreneurs {(i.e., lamps, stickers)}, six creative service entrepreneurs {(i.e., photographers, fabrication)}, and seven support personnel (i.e., art curator, incubator director). Our findings reveal that many maker entrepreneurs 1) are makers first and entrepreneurs second; 2) struggle with business logistics and learn business skills as they go; and 3) are motivated by non-monetary values. We discuss training and technology-based design implications and opportunities for addressing challenges in developing economically sustainable businesses around making.Item RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots(2024-06-12) Feldman, Philip; Foulds, James; Pan, ShimeiLarge language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.Item GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models(2024-12-16) Zhang, Tao; Zeng, Ziqian; Xiao, Yuxiang; Zhuang, Huiping; Chen, Cen; Foulds, James; Pan, ShimeiLarge Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicly available. The commonly used and publicly available alignment dataset, HH-RLHF, still exhibits gender bias to some extent. There is a lack of publicly available alignment datasets specifically designed to address gender bias. Hence, we developed a new dataset named GenderAlign, aiming at mitigating a comprehensive set of gender biases in LLMs. This dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response. Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality. Furthermore, we categorized the gender biases in the "rejected" responses of GenderAlign into 4 principal categories. The experimental results show the effectiveness of GenderAlign in reducing gender bias in LLMs.Item DoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text(ACM, 2024-11-04) Hasan, Fatema; Li, Yulong; Foulds, James; Pan, Shimei; Bhattacharjee, BishwaranjanTraditional large language models (LLMs) leverage extensive text corpora but lack access to acoustic and para-linguistic cues present in speech. There is a growing interest in enhancing text-based models with audio information. However, current models often require an aligned audio-text dataset which is frequently much smaller than typical language model training corpora. Moreover, these models often require both text and audio streams during inference/testing. In this study, we introduce a novel two-stage knowledge distillation (KD) approach that enables language models to (a) incorporate rich acoustic and paralinguistic information from speech, (b) utilize text corpora comparable in size to typical language model training data, and (c) support text-only analysis without requiring an audio stream during inference/testing. Specifically, we employ a pre-trained speech embedding teacher model (OpenAI Whisper) to train a Teacher Assistant (TA) model on an aligned audio-text dataset in the first stage. In the second stage, the TA’s knowledge is transferred to a student language model trained on a conventional text dataset. Thus, our two-stage KD method leverages both the acoustic and paralinguistic cues in the aligned audio-text data and the nuanced linguistic knowledge in a large text-only dataset. Based on our evaluation, this DoubleDistillation system consistently outperforms traditional LLMs in 15 informal text understanding tasks.Item EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses(IEEE, 2025-01-11) Pyreddy, Shireesh Reddy; Zaman, Tarannum ShailaThe widespread adoption of generative AI has generated diverse opinions, with individuals expressing both support and criticism of its applications. This study investigates the emotional dynamics surrounding generative AI by analyzing human tweets referencing terms such as ChatGPT, OpenAI, Copilot, and LLMs. To further understand the emotional intelligence of ChatGPT, we examine its responses to selected tweets, highlighting differences in sentiment between human comments and LLM-generated responses. We introduce EmoXpt, a sentiment analysis framework designed to assess both human perspectives on generative AI and the sentiment embedded in ChatGPT's responses. Unlike prior studies that focus exclusively on human sentiment, EmoXpt uniquely evaluates the emotional expression of ChatGPT. Experimental results demonstrate that LLM-generated responses are notably more efficient, cohesive, and consistently positive than human responses.Item Topology-Driven Attribute Recovery for Attribute Missing Graph Learning in Social Internet of Things(2025-01-17) Li, Mengran; Chen, Junzhou; Yu, Chenyun; Jiang, Guanying; Zhang, Ronghui; Shen, Yanming; Song, HoubingWith the advancement of information technology, the Social Internet of Things (SIoT) has fostered the integration of physical devices and social networks, deepening the study of complex interaction patterns. Text Attribute Graphs (TAGs) capture both topological structures and semantic attributes, enhancing the analysis of complex interactions within the SIoT. However, existing graph learning methods are typically designed for complete attributed graphs, and the common issue of missing attributes in Attribute Missing Graphs (AMGs) increases the difficulty of analysis tasks. To address this, we propose the Topology-Driven Attribute Recovery (TDAR) framework, which leverages topological data for AMG learning. TDAR introduces an improved pre-filling method for initial attribute recovery using native graph topology. Additionally, it dynamically adjusts propagation weights and incorporates homogeneity strategies within the embedding space to suit AMGs' unique topological structures, effectively reducing noise during information propagation. Extensive experiments on public datasets demonstrate that TDAR significantly outperforms state-of-the-art methods in attribute reconstruction and downstream tasks, offering a robust solution to the challenges posed by AMGs. The code is available at https://github.com/limengran98/TDAR.