UMBC Student Collection
Permanent URI for this collectionhttp://hdl.handle.net/11603/33
Browse
Recent Submissions
Item Can Generative AI be Egalitarian?(IEEE, 2024-10) Feldman, Philip; Foulds, James; Pan, ShimeiThe recent explosion of “foundation” generative AI models has been built upon the extensive extraction of value from online sources, often without corresponding reciprocation. This pattern mirrors and intensifies the extractive practices of surveillance capitalism [46], while the potential for enormous profit has challenged technology organizations’ commitments to responsible AI practices, raising significant ethical and societal concerns. However, a promising alternative is emerging: the development of models that rely on content willingly and collaboratively provided by users. This article explores this “egalitarian” approach to generative AI, taking inspiration from the successful model of Wikipedia. We explore the potential implications of this approach for the design, development, and constraints of future foundation models. We argue that such an approach is not only ethically sound but may also lead to models that are more responsive to user needs, more diverse in their training data, and ultimately more aligned with societal values. Furthermore, we explore potential challenges and limitations of this approach, including issues of scalability, quality control, and potential biases inherent in volunteercontributed content.Item Fair Inference for Discrete Latent Variable Models: An Intersectional Approach(ACM, 2024-09-04) Islam, Rashidul; Pan, Shimei; Foulds, JamesIt is now widely acknowledged that machine learning models, trained on data without due care, often exhibit discriminatory behavior. Traditional fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. This paper, however, takes a different approach by investigating fairness in unsupervised learning using graphical models with discrete latent variables. We develop a fair stochastic variational inference method for discrete latent variables. Our approach uses a fairness penalty on the variational distribution that reflects the principles of intersectionality, a comprehensive perspective on fairness from the fields of law, social sciences, and humanities. Intersectional fairness brings the challenge of data sparsity in minibatches, which we address via a stochastic approximation approach. We first show the utility of our method in improving equity and fairness for clustering using naïve Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a specialized graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases.Item RNA-Puzzles Round V: blind predictions of 23 RNA structures(Springer Nature, 2024-12-02) Bu, Fan; Adam, Yagoub; Adamiak, Ryszard W.; Antczak, Maciej; de Aquino, Belisa Rebeca H.; Badepally, Nagendar Goud; Batey, Robert T.; Baulin, Eugene F.; Boinski, Pawel; Boniecki, Michal J.; Bujnicki, Janusz M.; Carpenter, Kristy A.; Chacon, Jose; Chen, Shi-Jie; Chiu, Wah; Cordero, Pablo; Das, Naba Krishna; Das, Rhiju; Dawson, Wayne K.; DiMaio, Frank; Ding, Feng; Dock-Bregeon, Anne-Catherine; Dokholyan, Nikolay V.; Dror, Ron O.; Dunin-Horkawicz, Stanisław ; Eismann, Stephan; Ennifar, Eric; Esmaeeli, Reza; Farsani, Masoud Amiri; Ferré-D’Amaré, Adrian R.; Geniesse, Caleb; Ghanim, George E.; Guzman, Horacio V.; Hood, Iris V.; Huang, Lin; Jain, Dharm Skandh; Jaryani, Farhang; Jin, Lei; Joshi, Astha; Karelina, Masha; Kieft, Jeffrey S.; Kladwang, Wipapat; Kmiecik, Sebastian; Koirala, Deepak; Kollmann, Markus; Kretsch, Rachael C.; Kurciński, Mateusz; Li, Jun; Li, Shuang; Magnus, Marcin; Masquida, BenoÎt; Moafinejad, S. Naeim; Mondal, Arup; Mukherjee, Sunandan; Nguyen, Thi Hoang Duong; Nikolaev, Grigory; Nithin, Chandran; Nye, Grace; Pandaranadar Jeyeram, Iswarya P. N.; Perez, Alberto; Pham, Phillip; Piccirilli, Joseph A.; Pilla, Smita Priyadarshini; Pluta, Radosław ; Poblete, Simón; Ponce-Salvatierra, Almudena; Popenda, Mariusz; Popenda, Lukasz; Pucci, Fabrizio; Rangan, Ramya; Ray, Angana; Ren, Aiming; Sarzynska, Joanna; Sha, Congzhou Mike; Stefaniak, Filip; Su, Zhaoming; Suddala, Krishna C.; Szachniuk, Marta; Townshend, Raphael; Trachman, Robert J.; Wang, Jian; Wang, Wenkai; Watkins, Andrew; Wirecki, Tomasz K.; Xiao, Yi; Xiong, Peng; Xiong, Yiduo; Yang, Jianyi; Yesselman, Joseph David; Zhang, Jinwei; Zhang, Yi; Zhang, Zhenzhen; Zhou, Yuanzhe; Zok, Tomasz; Zhang, Dong; Zhang, Sicheng; Żyła, Adriana; Westhof, Eric; Miao, ZhichaoRNA-Puzzles is a collective endeavor dedicated to the advancement and improvement of RNA three-dimensional structure prediction. With agreement from structural biologists, RNA structures are predicted by modeling groups before publication of the experimental structures. We report a large-scale set of predictions by 18 groups for 23 RNA-Puzzles: 4 RNA elements, 2 Aptamers, 4 Viral elements, 5 Ribozymes and 8 Riboswitches. We describe automatic assessment protocols for comparisons between prediction and experiment. Our analyses reveal some critical steps to be overcome to achieve good accuracy in modeling RNA structures: identification of helix-forming pairs and of non-Watson–Crick modules, correct coaxial stacking between helices and avoidance of entanglements. Three of the top four modeling groups in this round also ranked among the top four in the CASP15 contest.Item Single Image Super Resolution Using AI Generated Images(2025-01-18) Singh, Amanjot; Khan, Faisal Rasheed; Singh, MrinaliniImage super-resolution has become increasingly important in various applications because of their demand for producing high output images from the low input images. Earlier for the image enhancements techniques like deblurring were performed to get the quality image. With the advancements in the Generative Adversarial Networks (GAN), the generating of high-quality image from the low-quality image has been outstanding. The models like SRGAN, ESRGAN [12]are the competitive models which make the Image-Resolution look good because of their performance on the images. But the architecture of the SRGAN which is a state-of-art model is complex and ESRGAN is built on the SRGAN, but by observing the results of the SRGAN the image quality looks good. We try to build a Super-Image Resolution by having the less complex architecture which is faster than SRGAN and the results aren’t compromising even after reducing the architecture complexity. We have built our base model based on the SRGAN by reducing the complexity in the architecture. In our final model we added another discriminator layer which enhances the sub parts of the images to improve the image quality. Our aim is to build an efficient model where the architecture of our model is less complex than SRGAN [14]and give as competitive results as SRGAN. Our results for the final model compared to our base model shows that there were significant improvements in the image quality. The code link for our project is here:https://github.com/faisalkhansk3283/ Computer_Vision_Extended_SRGANItem ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital Twins(2025-01-15) Hakim, Safayat Bin; Adil, Muhammad; Velasquez, Alvaro; Song, HoubingIn this paper, we propose an Adaptive Neuro-Symbolic Learning Framework for digital twin technology called ``ANSR-DT." Our approach combines pattern recognition algorithms with reinforcement learning and symbolic reasoning to enable real-time learning and adaptive intelligence. This integration enhances the understanding of the environment and promotes continuous learning, leading to better and more effective decision-making in real-time for applications that require human-machine collaboration. We evaluated the \textit{ANSR-DT} framework for its ability to learn and adapt to dynamic patterns, observing significant improvements in decision accuracy, reliability, and interpretability when compared to existing state-of-the-art methods. However, challenges still exist in extracting and integrating symbolic rules in complex environments, which limits the full potential of our framework in heterogeneous settings. Moreover, our ongoing research aims to address this issue in the future by ensuring seamless integration of neural models at large. In addition, our open-source implementation promotes reproducibility and encourages future research to build on our foundational work.Item Assessing the K₂BO₃ family of materials as multiferroics(APS, 2024-11-26) Casale, Anthony; Bennett, JosephWe evaluate the potential of an overlooked family of materials to support both the magnetization and polarization required to be classified as multiferroics. This family of materials has a stoichiometry of A₂BX₃ and was uncovered in the Inorganic Crystal Structure Database (ICSD) while searching for structural platforms that could support low energy polarization switching. The examples here have the general chemical formula of K₂BO₃, where B is a magnetically active cation located within edge-sharing square pyramids that form a 1D chain. Density functional theory with Hubbard U corrections (DFT + U) are used to determine the potential energy landscape of K₂BO₃, which include investigating multiple magnetic and polarization orderings. We analyze the ground state and electronic structures and report on how the choice of Hubbard U will affect both, which is important when predicting functional properties of low-dimensional and potentially exfoliable systems. This family contains a ferromagnetic insulator, K₂VO₃, as well as antiferromagnetic (K₂NbO₃) and nonmagnetic (K₂MoO₃) insulators with antipolar ground state symmetries, and accessible polar metastable states, that we predict to be antiferroelectric. This preliminary assessment of the K₂BO₃ members of the A₂BX₃ family reveals a new class of materials, that with further optimization via compositional tuning, could be multiferroic.Item Digital skills use profiles among older workers in the United States: a person-centered approach(Taylor & Francis, 2024-12-22) Yamashita, Takashi; Narine, Donnette; Ojomo, Adeola; Chidebe, Runcie C. W.; Cummins, Phyllis A.; Kramer, Jenna W.; Karam, Rita; Smith, Thomas J.Considering the digitalisation of the workplace and increasingly crucial digital skill proficiency in the technology-rich labour market, the objectives of the present study are to develop digital skill use profiles and to identify specific individual characteristics that are linked with digital skill use patterns among older workers in the United States. However, relatively little is known about older workers’ digital skill use patterns and skill use opportunity structures. Data of the U.S. older workers (age 50 years and older; n = 1,670) were obtained from the 2012/2014/2017 International Assessment of the Adult Competencies (PIAAC). Latent class analysis – a form of person-centred approach that identifies subgroups based on distinctive digital skill use patterns, showed that there were two underlying subgroups of older workers, including more frequent and less frequent digital skill users. More frequent users practiced a greater variety of digital skills both at work and outside of work than their counterparts. Also, logistic regression analysis showed that higher digital skill proficiency and full-time employment (vs. part-time) were associated with belonging to the more frequent digital skill use subgroup. The digital skill use profiles of U.S. older workers, subgroup characteristics, and implications for adult education and labour policies are evaluated.Item RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots(2024-06-12) Feldman, Philip; Foulds, James; Pan, ShimeiLarge language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.Item Investigation of Solid Fuel Ramjets Using Analytical Theory and Computational Fluid Dynamics(2025-01-03) Khokhar, Gohar T.; McBeth, Joshua; Hanquist, Kyle M.; Oveissi, Parham; Goel, AnkitThis paper investigates the modeling and analysis of a solid-fuel ramjet (SFRJ) using analytical theory and computational fluid dynamics (CFD). The primary objectives of this study are to first apply analytical theories of an SFRJ in combination with combustion physics from NASA CEA software to establish a foundation for analytically modeling the thrust of an SFRJ with a specified bypass ratio. Secondly, this study aims to model the thrust output of an SFRJ using a simplified backward-facing step computational model consisting of a single inlet and outlet through computational analysis. In the computational model, combustion is modeled as wall heat addition, and thermal choking effects leading to engine unstart conditions are predicted. While there are more complex SFRJ modeling approaches, consideration of computational cost is an important aspect of this work, since these models will be coupled with a control system. This research serves as a foundational step in a broader investigation aimed at coupling SFRJ thrust dynamics with control systems for regulating thrust under uncertain operating conditions.Item Development and Initial Testing of XR-Based Fence Diagrams for Polar Science(IEEE, 2023-07) Tack, Naomi; Holschuh, Nicholas; Sharma, Sharad; Williams, Rebecca M.; Engel, DonEarth’s ice sheets are the largest contributor to sea level rise. For this reason, understanding the flow and topology of ice sheets is crucial for the development of accurate models and predictions. In order to aid in the generation of such models, ice penetrating radar is used to collect images of the ice sheet through both airborne and ground-based platforms. Glaciologists then take these images and visualize them in 3D fence diagrams on a flat 2D screen. We aim to consider the benefits that an XR visualization of these diagrams may provide to enable better data comprehension, annotation, and collaborative work. In this paper, we discuss our initial development and evaluation of such an XR system.Item Visualizing the Greenland Ice Sheet in VR using Immersive Fence Diagrams(ACM, 2023-09-10) Tack, Naomi; Williams, Rebecca M.; Holschuh, Nicholas; Sharma, Sharad; Engel, DonThe melting of the ice sheets covering Greenland and Antarctica are primary drivers of sea level rise. Predicting the rate of ice loss depends on modeling the ice dynamics. Ice penetrating radar provides the ability to capture images through the ice sheet, down to the bedrock. Historical environmental and climate perturbations cause small changes to the dielectric constant of ice, which are visually manifested as layers of varying brightness in the radar imagery. To understand how the flow of ice has progressed between neighboring image slices, glaciologists use Fence Diagrams to visualize several cross-sections at once. Here, we describe the immersive virtual reality (VR) fence diagrams we have developed. The goal of our system is to enable glaciologists to make sense of these data and thereby predict future ice loss.Item Learning-Based Thrust Regulation of Solid-Fuel Ramjet in Flight Conditions(2025-01-03) Oveissi, Parham; Dorsey, Alex; McBeth, Joshua; Hanquist, Kyle M.; Goel, AnkitThis paper investigates the performance of a learning-based control system for regulating the thrust generated by a solid fuel ramjet engine in realistic flight scenarios. An integrated simulation framework is developed that combines a longitudinal missile dynamics model, a missile autopilot, a quasi-static engine dynamics model, and a learning controller for thrust regulation. The missile autopilot is based on the classical three-loop topology. The learning controller is an adaptive PID controller whose gains are recursively optimized using the retrospective cost adaptive control algorithm. First, harmonic acceleration commands are used to simulate variable flight conditions that affect the thrust generated by the engine model. Next, an interception scenario is simulated by integrating a guidance law in the loop. Numerical results indicate that the learning controller can regulate the generated thrust despite wide variations in operating conditions.Item Adaptive Numerical Differentiation for Extremum Seeking with Sensor Noise(2025-01-08) Verma, Shashank; Salazar, Juan Augusto Paredes; Delgado, Jhon Manuel Portella; Goel, Ankit; Bernstein, Dennis S.Extremum-seeking control (ESC) is widely used to optimize performance when the system dynamics are uncertain. However, sensitivity to sensor noise is an important issue in ESC implementation due to the use of high-pass filters or gradient estimators. To reduce the sensitivity of ESC to noise, this paper investigates the use of adaptive input and state estimation (AISE) for numerical differentiation. In particular, this paper develops extremum-seeking control with adaptive input and state estimation (ESC/AISE), where the high-pass filter of ESC is replaced by AISE to improve performance under sensor noise. The effectiveness of ESC/AISE is illustrated via numerical examples.Item Adaptive Combustion Regulation in High-Fidelity Computational Model of Solid Fuel Ramjet(2025-01-03) Oveissi, Parham; Dorsey, Alex; Khokhar, Gohar T.; Hanquist, Kyle M.; Goel, AnkitControlling the combustion process under hypersonic conditions remains a significant challenge. This paper uses a data-driven, learning-based control technique to regulate the combustion process within a solid fuel ramjet, aiming to regulate the generated thrust under uncertain operating conditions. A high-fidelity computational model combining compressible flow theory with equilibrium chemistry is developed to simulate combustion dynamics. This model evaluates the stability of the combustion dynamics and defines the engine’s operational envelope. An online learning controller based on retrospective cost optimization is integrated with the computational model to regulate the thrust. Numerical simulations indicate that the learning control system can regulate the thrust generated by an SFRJ without requiring any modeling information.Item DoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text(ACM, 2024-11-04) Hasan, Fatema; Li, Yulong; Foulds, James; Pan, Shimei; Bhattacharjee, BishwaranjanTraditional large language models (LLMs) leverage extensive text corpora but lack access to acoustic and para-linguistic cues present in speech. There is a growing interest in enhancing text-based models with audio information. However, current models often require an aligned audio-text dataset which is frequently much smaller than typical language model training corpora. Moreover, these models often require both text and audio streams during inference/testing. In this study, we introduce a novel two-stage knowledge distillation (KD) approach that enables language models to (a) incorporate rich acoustic and paralinguistic information from speech, (b) utilize text corpora comparable in size to typical language model training data, and (c) support text-only analysis without requiring an audio stream during inference/testing. Specifically, we employ a pre-trained speech embedding teacher model (OpenAI Whisper) to train a Teacher Assistant (TA) model on an aligned audio-text dataset in the first stage. In the second stage, the TA’s knowledge is transferred to a student language model trained on a conventional text dataset. Thus, our two-stage KD method leverages both the acoustic and paralinguistic cues in the aligned audio-text data and the nuanced linguistic knowledge in a large text-only dataset. Based on our evaluation, this DoubleDistillation system consistently outperforms traditional LLMs in 15 informal text understanding tasks.Item Signal Processing of Images for Convective Boundary Layer Height Estimation from Radar (SPICER) and multi-instrument verification(IEEE, 2025-01-13) Porta, Delia Tatiana Della; Demoz, BelayThe study of the planetary boundary layer (PBL) is one of the main topics of the atmospheric community. The current study presents a new algorithm for PBL height determination using a publicly available but unexplored data source, the Weather Service Radar (WSR-88D). The diurnal evolution of the PBL is also known as Convective Boundary Layer (CBL), key in the study of convection and precipitation. This paper presents the Signal Processing of Images for Convective Boundary Layer Height Estimation (SPICER) algorithm that can automatically detect the CBL Height (CBLH) for all of the 159 radar locations across the United States during clear days. The present work is the first step to applying SPICER to a network of Next Generation Radars (NEXRAD) with continuous countrywide coverage. With the possible combination with the Automated Surface Observing System network (ASOS), a source of ceilometer profile data, a validated dataset of CBLH estimates can be expected soon. The algorithm treats averaged differential reflectivity vs range as an image and applies filtering plus Canny edge detection to estimate the CBLH. In addition, another algorithm is presented to automate the detection of the mixing layer height (MLH), a proxy for CBLH from Raman Lidar and a 915 MHz wind profiler. A comparison of CBLH estimates vs widely used methods in meteorology (Radiosondes, Raman Lidar, ceilometer, 915 MHz wind profiler, and Doppler Lidar-based derived Value-Added Product (VAP) ) is performed to validate the NEXRAD detected CBLH using SPICER. The SPICER algorithm shows over 0.9 correlation with radiosonde measurements.Item Mapping the Edges of Mass Spectral Prediction: Evaluation of Machine Learning EIMS Prediction for Xeno Amino Acids(2025-01-14) Brown, Sean M.; Allgair, Evan; Kryštůfek, RobinMass spectrometry is one of the most effective analytical methods for unknown compound identification. By comparing observed m/z spectra with a database of experimentally determined spectra, this process identifies compound(s) in any given sample. Unknown sample identification is thus limited to whatever has been experimentally determined. To address the reliance on experimentally determined signatures, multiple state-of-the-art MS spectra prediction algorithms have been developed within the past half decade. Here we evaluate the accuracy of the NEIMS spectral prediction algorithm. We focus our analyses on monosubstituted α-amino acids given their significance as important targets for astrobiology, synthetic biology, and diverse biomedical applications. Our general intent is to inform those using generated spectra for detection of unknown biomolecules. We find predicted spectra are inaccurate for amino acids beyond the algorithms training data. Interestingly, these inaccuracies are not explained by physicochemical differences or the derivatization state of the amino acids measured. We thus highlight the need to improve both current machine learning based approaches and further optimization of ab initio spectral prediction algorithms so as to expand databases for structures beyond what is currently experimentally possible, even including theoretical molecules.Item MedGrad E-CLIP: Enhancing Trust and Transparency in AI-Driven Skin Lesion Diagnosis(IEEE, 2025-01-12) Kamal, Sadia; Oates, TimAs deep learning models gain attraction in medical data, ensuring transparent and trustworthy decision-making is essential. In skin cancer diagnosis, while advancements in lesion detection and classification have improved accuracy, the black-box nature of these methods poses challenges in understanding their decision processes, leading to trust issues among physicians. This study leverages the CLIP (Contrastive Language-Image Pretraining) model, trained on different skin lesion datasets, to capture meaningful relationships between visual features and diagnostic criteria terms. To further enhance transparency, we propose a method called MedGrad E-CLIP, which builds on gradient-based E-CLIP by incorporating a weighted entropy mechanism designed for complex medical imaging like skin lesions. This approach highlights critical image regions linked to specific diagnostic descriptions. The developed integrated pipeline not only classifies skin lesions by matching corresponding descriptions but also adds an essential layer of explainability developed especially for medical data. By visually explaining how different features in an image relates to diagnostic criteria, this approach demonstrates the potential of advanced vision-language models in medical image analysis, ultimately improving transparency, robustness, and trust in AI-driven diagnostic systems.Item Comparison of Several Neural Network-Enhanced Sub-Grid Scale Stress Models for Meso-Scale Hurricane Boundary Layer Flow Simulation(AIAA, 2025-01-03) Hasan, MD Badrul; Yu, Meilin; Oates, TimThe complicated energy cascade and backscatter dynamics present a challenge when studying turbulent flows in storms at the meso-scale. When performing standard large-eddy simulations (LES), sub-grid scale (SGS) stress models usually fail to consider energy backscatter. These models assume that kinetic energy only moves continuously from larger to smaller scales. However, coherent energy backscatter structures exist when analyzing hurricane boundary layer flows at the meso-scale. Our recent research has shown that machine-learning SGS models trained with high-resolution data can effectively forecast forward and backward energy transfers in meso-scale hurricane-like vortex flows. Therein, physical and geometrical invariances were introduced to better represent flow physics. This further improved the predictability and generalizability of machine-learning-enhanced SGS models. In this study, we compare the performance of several machine-learning-enhanced SGS models, especially those based on neural networks (NNs), with varying physical and geometrical invariance embedding levels for SGS stress modeling in an a priori sense, which sets the cornerstone for ongoing a posteriori tests of NN models.Item Examining Engagement with Disinformation Accounts on Instagram Using Web Archives(2025-01-10) Prince, Leah; Weigle, Michele C.Disinformation poses a serious threat to society. Researchers have shown that many people spread disinformation, especially in relation to the COVID-19 pandemic, on social media platforms like Instagram. However, performing disinformation analysis is difficult if those users have been banned. Previous research has shown that web archives are useful for performing disinformation analysis on pages that are no longer live, but this approach is limited by web page capture availability. To better understand how disinformation is spreading on Instagram, we examine how disinformation actors are utilizing hashtags and account mentions to boost engagement by extracting engagement metrics from archived Instagram account pages. We use the data gathered from these archived webpages, or mementos, to perform network analysis showing how engagement connects users. We then perform clustering based on hashtag frequency to examine how people searching for reputable content are being exposed to disinformation by identifying groupings that contain both health authority accounts and anti-vax accounts. Our findings indicate that roughly one-fifth of intra-group hashtags are also distributed inter-group. Limited memento availability remains an obstacle to comparing reputable accounts to disinformation accounts, but a higher percentage of Instagram account page mementos from the Archive.today web archive are able to be scraped than those from the Internet Archive’s Wayback Machine.