UMBC Computer Science and Electrical Engineering Department

Permanent URI for this collectionhttp://hdl.handle.net/11603/50

The Computer Science and Electrical Engineering Department aims to maintain a program of excellence in teaching, research, and service for all of its programs. At the undergraduate level, we will provide students with a firm foundation of both the theory and practice of Computer Science and Computer Engineering. Our curricula also give students the social, ethical, and liberal education needed to make significant contributions to society. Students receiving a bachelor’s degree are ready to enter the work force as productive computer scientists or computer engineers, or to continue their education at the graduate or professional level.

At the graduate level, we are committed to developing the research and professional capabilities of students in Computer Science, Computer Engineering, Electrical Engineering and Cybersecurity. Our programs provide a deeper mastery of the basics of these fields, as well as opportunities to collaborate on leading-edge research with our faculty. Our faculty are engaged in both practical and theoretical research, often in partnership with government agencies, private industry and non-governmental organizations. The aim of this research is to advance knowledge within our disciplines and also to contribute to solving problems faced by our society.

Browse

Recent Submissions

Now showing 1 - 20 of 2137
  • Item
    A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19
    (2024-11-11) Khandelwal, Vedant; Gaur, Manas; Kursuncu, Ugur; Shalin, Valerie; Sheth, Amit
    Monitoring public sentiment via social media is potentially helpful during health crises such as the COVID-19 pandemic. However, traditional frequency-based, data-driven neural network-based approaches can miss newly relevant content due to the evolving nature of language in a dynamically evolving environment. Human-curated symbolic knowledge sources, such as lexicons for standard language and slang terms, can potentially elevate social media signals in evolving language. We introduce a neurosymbolic method that integrates neural networks with symbolic knowledge sources, enhancing the detection and interpretation of mental health-related tweets relevant to COVID-19. Our method was evaluated using a corpus of large datasets (approximately 12 billion tweets, 2.5 million subreddit data, and 700k news articles) and multiple knowledge graphs. This method dynamically adapts to evolving language, outperforming purely data-driven models with an F1 score exceeding 92\%. This approach also showed faster adaptation to new data and lower computational demands than fine-tuning pre-trained large language models (LLMs). This study demonstrates the benefit of neurosymbolic methods in interpreting text in a dynamic environment for tasks such as health surveillance.
  • Item
    In Context Learning and Reasoning for Symbolic Regression with Large Language Models
    (2024-10-22) Sharlin, Samiha; Josephson, Tyler R.
    Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks for which they were not explicitly trained. Here, we explore the potential of LLMs to perform symbolic regression -- a machine-learning method for finding simple and accurate equations from datasets. We prompt GPT-4 to suggest expressions from data, which are then optimized and evaluated using external Python tools. These results are fed back to GPT-4, which proposes improved expressions while optimizing for complexity and loss. Using chain-of-thought prompting, we instruct GPT-4 to analyze the data, prior expressions, and the scientific context (expressed in natural language) for each problem before generating new expressions. We evaluated the workflow in rediscovery of five well-known scientific equations from experimental data, and on an additional dataset without a known equation. GPT-4 successfully rediscovered all five equations, and in general, performed better when prompted to use a scratchpad and consider scientific context. We also demonstrate how strategic prompting improves the model's performance and how the natural language interface simplifies integrating theory with data. Although this approach does not outperform established SR programs where target equations are more complex, LLMs can nonetheless iterate toward improved solutions while following instructions and incorporating scientific context in natural language.
  • Item
    The Dual Role of Student and Creator: Exploring the TikTok Experience
    (ACM, 2024-11-13) Bulley, Bharadwaj Kuruba; Tirumala, Shravika; Mahamkali, Bhavani Shankar; Sakib, Md Nazmus; Ahmed, Saquib; Dey, Sanorita
    TikTok is one of the most common content-creating social media platforms for youth in the USA. In recent years, its engaging content has significantly influenced people, shaping trends, behaviors, and communication styles among its predominantly young user base. This study evaluates TikTok's impact on college and university students as they invest a lot of time creating content and engaging on TikTok besides their studies. While existing research highlights TikTok's educational benefits and adverse societal and psychological effects, our mixed-method approach provides a focused analysis of student content creators. Survey data quantifies usage patterns and their correlation with academic and mental health indicators, while interviews offer qualitative insights into personal experiences. Findings reveal that TikTok affects students' time management, mental health, academic performance, and self-perception. Although TikTok facilitates creativity and social connections, it also induces stress and distraction. This study aims to fill research gaps and propose new directions, offering practical recommendations for balancing TikTok's benefits and drawbacks for student content creators.
  • Item
    When to Commute During the COVID-19 Pandemic and Beyond: Analysis of Traffic Crashes in Washington, D.C
    (2024-11-08) Choi, Joanne; Clark, Sam; Jaiswal, Ranjan; Kirk, Peter; Jayaraman, Sachin; Ashqar, Huthaifa
    Many workers in cities across the world, who have been teleworking because of the COVID-19 pandemic, are expected to be back to their commutes. As this process is believed to be gradual and telecommuting is likely to remain an option for many workers, hybrid model and flexible schedules might become the norm in the future. This variable work schedules allows employees to commute outside of traditional rush hours. Moreover, many studies showed that commuters might be skeptical of using trains, buses, and carpools and could turn to personal vehicles to get to work, which might increase congestion and crashes in the roads. This study attempts to provide information on the safest time to commute to Washington, DC area analyzing historical traffic crash data before the COVID-19 pandemic. It also aims to advance our understanding of traffic crashes and other relating factors such as weather in the Washington, DC area. We created a model to predict crashes by time of the day, using a negative binomial regression after rejecting a Poisson regression, and additionally explored the validity of a Random Forest regression. Our main consideration for an eventual application of this study is to reduce crashes in Washington DC, using this tool that provides people with better options on when to commute and when to telework, if available. The study also provides policymakers and researchers with real-world insights that decrease the number of traffic crashes to help achieve the goals of The Vision Zero Initiative adopted by the district.
  • Item
    Post-Roe Public Discourse: A Temporal Analysis of Discussion on US Abortion Law Changes
    (ACM, 2024-11-13) Venkata, Harisahan Nookala; Palakurthi, Varshitha; Devalam, Sree Sai Bindu; Sakib, Md Nazmus; Ahmed, Saquib; Dey, Sanorita
    The "Post-Roe" refers to the period following the June 2022 Supreme Court decision to overrule Roe v. Wade, the 1973 abortion right law. Since this overturn of this law, substantial public discourse was noticed across the US and this controversial issue has trended on all news media agencies and social media platforms. Several studies have analyzed public opinion and the impact of this legal change on healthcare, economic challenges, and society. However, little work has been done to identify the shift in discussion over time. Our research analyzes YouTube and Reddit comments to perceive insights into the evolving spectrum of public opinion on abortion legislation by utilizing NLP techniques and opinion mining. By systematically categorizing comments to extract themes and employing a temporal analysis approach, we identify shifts in public sentiment across different phases of time, such as immediate reactions, peak debate, and long-term responses. Our preliminary findings show that different themes prevail at different phases and primary concerns shift over time. This study might help policymakers, activists, and social commentators understand these shifts to effectively address the evolving concerns of the public and take measures accordingly.
  • Item
    A Framework for Empirical Fourier Decomposition based Gesture Classification for Stroke Rehabilitation
    (IEEE, 2024-11-11) Chen, Ke; Wang, Honggang; Catlin, Andrew; Satyanarayana, Ashwin; Vinjamuri, Ramana; Kadiyala, Sai Praveen
    The demand for surface electromyography (sEMG) based exoskeletons is rapidly increasing due to their non-invasive nature and ease of use. With increase in use of Internet-of-Things (IoT) based devices in daily life, there is a greater acceptance of exoskeleton based rehab. As a result, there is a need for highly accurate and generalizable gesture classification mechanisms based on sEMG data. In this work, we present a framework which pre-processes raw sEMG signals with Empirical Fourier Decomposition (EFD) based approach followed by dimension reduction. This resulted in improved performance of the hand gesture classification. EFD decomposition’s efficacy of handling mode mixing problem on non-stationary signals, resulted in less number of decomposed components. In the next step, a thorough analysis of decomposed components as well as inter-channel analysis is performed to identify the key components and channels that contribute towards the improved gesture classification accuracy. As a third step, we conducted ablation studies on time-domain features to observe the variations in accuracy on different models. Finally, we present a case study of comparison of automated feature extraction based gesture classification vs. manual feature extraction based methods. Experimental results show that manual feature based gesture classification method thoroughly outperformed automated feature extraction based methods, thus emphasizing a need for rigorous fine tuning of automated models.
  • Item
    Morphology and Luminescence Properties of Transition Metal Doped Zinc Selenide Crystals
    (Springer Nature, 2024-11-11) Bowman, Eric; Scheurer, Leslie; Arnold, Bradley; Su, Ching Hua; Choa, Fow-Sen; Cullum, Brian; Singh, Narsingh
    Zinc selenide is an excellent matrix material to dope with rare-earth and transition metal to achieve mid-infrared luminescence to develop high power lasers. The luminescence, morphology and refractive index is significantly affected by the doping and defects generated due to size and valency of dopants, concentration, growth process and convection during the growth. The aim of the study is to investigate effect of point and line defects generated due to low doping of iron and chromium on the emission and morphology of the zinc selenide. Luminescence and morphological properties of large iron and chromium doped zinc selenide single crystals were studied to evaluate the effect of extremely low residual impurities and defects associated with the doping process. The emission properties following both short wavelength (i.e., ultraviolet; 350–370 nm) excitation and longer wavelength (i.e., near infrared; 850–870 nm) excitation were characterized. Luminescence emission bands were identified in both doped crystals. In addition to the primary emission bands, satellite peaks and intra-center transitions were also observed. Due to local population defects associated with the residual impurities (ppm to ppb) in the Fe-ZnSe and Cr-ZnSe crystals, peak emission wavelengths were observed to shift. The emission bands were found to decrease in intensity due to recombination of residual impurity co-dopants and complex defects generated during growth and fabrication. Cryogenic temperature analyses revealed a very clean emission band due to freezing of some of the point and line defects. An emission band observed at 980 nm for both crystals at room temperature as well as cryogenic temperatures indicates a vibronic peak in ZnSe. The scanning electron microscopy (SEM) images of the local morphology support the conclusion that small crystallites in doped crystals are also present.
  • Item
    Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
    (Association for Computational Linguistics, 2024-11) Joshi, Abhinav; Saha, Shaswati; Shukla, Divyaksh; Vema, Sriram; Jhamtani, Harsh; Gaur, Manas; Modi, Ashutosh
    Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII). Consequently, in recent times research in the area of Machine Unlearning (MUL) has become active, the main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks. In this work, we examine the robustness of the existing MUL techniques for their ability to enable leakage-proof forgetting in LLMs. In particular, we examine the effect of data transformation on forgetting, i.e., is an unlearned LLM able to recall forgotten information if there is a change in the format of the input? Our findings on the TOFU dataset highlight the necessity of using diverse data formats to quantify unlearning in LLMs more reliably.
  • Item
    MUMOSA, Interactive Dashboard for MUlti-MOdal Situation Awareness
    (Association for Computational Linguistics, 2024-11) Lukin, Stephanie M.; Bowser, Shawn; Suchocki, Reece; Summers-Stay, Douglas; Ferraro, Francis; Matuszek, Cynthia; Voss, Clare
    Information extraction has led the way for event detection from text for many years. Recent advances in neural models, such as Large Language Models (LLMs) and Vision-Language Models (VLMs), have enabled the integration of multiple modalities, providing richer sources of information about events. Concurrently, the development of schema graphs and 3D reconstruction methods has enhanced our ability to visualize and annotate complex events. Building on these innovations, we introduce the MUMOSA (MUlti-MOdal Situation Awareness) interactive dashboard that brings these diverse resources together. MUMOSA aims to provide a comprehensive platform for event situational awareness, offering users a powerful tool for understanding and analyzing complex scenarios across modalities.
  • Item
    Is Function Similarity Over-Engineered? Building a Benchmark
    (2024-10-30) Saul, Rebecca; Liu, Chang; Fleischmann, Noah; Zak, Richard; Micinski, Kristopher; Raff, Edward; Holt, James
    Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen functions can reduce the time it takes to understand a new file. However, given the complexity of assembly, and the NP-hard nature of determining function equivalence, this task is extremely difficult. Common approaches often use sophisticated disassembly and decompilation tools, graph analysis, and other expensive pre-processing steps to perform function similarity searches over some corpus. In this work, we identify a number of discrepancies between the current research environment and the underlying application need. To remedy this, we build a new benchmark, REFuSE-Bench, for binary function similarity detection consisting of high-quality datasets and tests that better reflect real-world use cases. In doing so, we address issues like data duplication and accurate labeling, experiment with real malware, and perform the first serious evaluation of ML binary function similarity models on Windows data. Our benchmark reveals that a new, simple basline, one which looks at only the raw bytes of a function, and requires no disassembly or other pre-processing, is able to achieve state-of-the-art performance in multiple settings. Our findings challenge conventional assumptions that complex models with highly-engineered features are being used to their full potential, and demonstrate that simpler approaches can provide significant value.
  • Item
    Neural Normalized Compression Distance and the Disconnect Between Compression and Classification
    (2024-10-20) Hurwitz, John; Nicholas, Charles; Raff, Edward
    It is generally well understood that predictive classification and compression are intrinsically related concepts in information theory. Indeed, many deep learning methods are explained as learning a kind of compression, and that better compression leads to better performance. We interrogate this hypothesis via the Normalized Compression Distance (NCD), which explicitly relies on compression as the means of measuring similarity between sequences and thus enables nearest-neighbor classification. By turning popular large language models (LLMs) into lossless compressors, we develop a Neural NCD and compare LLMs to classic general-purpose algorithms like gzip. In doing so, we find that classification accuracy is not predictable by compression rate alone, among other empirical aberrations not predicted by current understanding. Our results imply that our intuition on what it means for a neural network to ``compress'' and what is needed for effective classification are not yet well understood.
  • Item
    Tutorial on Causal Inference with Spatiotemporal Data
    (ACM, 2024-11-04) Ali, Sahara; Wang, Jianwu
    Spatiotemporal data, which captures how variables evolve across space and time, is ubiquitous in fields such as environmental science, epidemiology, and urban planning. However, identifying causal relationships in these datasets is challenging due to the presence of spatial dependencies, temporal autocorrelation, and confounding factors. This tutorial provides a comprehensive introduction to spatiotemporal causal inference, offering both theoretical foundations and practical guidance for researchers and practitioners. We explore key concepts such as causal inference frameworks, the impact of confounding in spatiotemporal settings, and the challenges posed by spatial and temporal dependencies. The paper covers synthetic spatiotemporal benchmark data generation, widely used spatiotemporal causal inference techniques, including regression-based, propensity score-based, and deep learning-based methods, and demonstrates their application using synthetic datasets. Through step-by-step examples, readers will gain a clear understanding of how to address common challenges and apply causal inference techniques to spatiotemporal data. This tutorial serves as a valuable resource for those looking to improve the rigor and reliability of their causal analyses in spatiotemporal contexts.
  • Item
    TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
    (2024-11-04) Patel, Maitreya; Kusumba, Abhiram; Cheng, Sheng; Kim, Changhoon; Gokhale, Tejas; Baral, Chitta; Yang, Yezhou
    Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between text and visual modalities to learn representations. This makes the nature of the training data a significant factor in the efficacy of CLIP for downstream tasks. However, the lack of compositional diversity in contemporary image-text datasets limits the compositional reasoning ability of CLIP. We show that generating ``hard'' negative captions via in-context learning and synthesizing corresponding negative images with text-to-image generators offers a solution. We introduce a novel contrastive pre-training strategy that leverages these hard negative captions and images in an alternating fashion to train CLIP. We demonstrate that our method, named TripletCLIP, when applied to existing datasets such as CC3M and CC12M, enhances the compositional capabilities of CLIP, resulting in an absolute improvement of over 9% on the SugarCrepe benchmark on an equal computational budget, as well as improvements in zero-shot image classification and image retrieval. Our code, models, and data are available at: https://tripletclip.github.io
  • Item
    Identifying Economic Factors Affecting Unemployment Rates in the United States
    (2024-11-04) Green, Alrick; Nasim, Ayesha; Radadia, Jaydeep; Kallam, Devi Manaswi; Kalyanam, Viswas; Owenga, Samfred; Ashqar, Huthaifa
    In this study, we seek to understand how macroeconomic factors such as GDP, inflation, Unemployment Insurance, and S&P 500 index; as well as microeconomic factors such as health, race, and educational attainment impacted the unemployment rate for about 20 years in the United States. Our research question is to identify which factor(s) contributed the most to the unemployment rate surge using linear regression. Results from our studies showed that GDP (negative), inflation (positive), Unemployment Insurance (contrary to popular opinion; negative), and S&P 500 index (negative) were all significant factors, with inflation being the most important one. As for health issue factors, our model produced resultant correlation scores for occurrences of Cardiovascular Disease, Neurological Disease, and Interpersonal Violence with unemployment. Race as a factor showed a huge discrepancies in the unemployment rate between Black Americans compared to their counterparts. Asians had the lowest unemployment rate throughout the years. As for education attainment, results showed that having a higher education attainment significantly reduced one chance of unemployment. People with higher degrees had the lowest unemployment rate. Results of this study will be beneficial for policymakers and researchers in understanding the unemployment rate during the pandemic.
  • Item
    Knowledge Graphs for Responsible AI
    (ACM, 2024-10-21) Vakaj, Edlira; Mihindukulasooriya, Nandana; Gaur, Manas; Khan, Arijit
    Responsible AI is built upon a set of principles that prioritize fairness, transparency, accountability, and inclusivity in AI development and deployment. As AI systems become increasingly sophisticated, including the explosion of generative AI, there is a growing need to address ethical considerations and potential societal impacts of their uses. Knowledge graphs (KGs), as structured representations of information, can enhance generative AI performance by providing context, explaining outputs, and reducing biases, thereby offering a powerful framework to address the challenges of responsible AI. By leveraging semantic relationships and contextual understanding, KGs facilitate transparent decision-making, enabling stakeholders to trace and interpret the reasoning behind AI driven outcomes. Moreover, they provide a means to capture and manage diverse knowledge sources, supporting the development of fair and unbiased AI models. The workshop aims to investigate the role of knowledge graphs in promoting responsible AI principles and creating a cooperative space for researchers, practitioners, and policymakers to exchange insights and enhance their comprehension of KGs' impact on achieving responsible AI solutions. It seeks to facilitate collaboration and idea-sharing to advance the understanding of how KGs can contribute to responsible AI.
  • Item
    FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation
    (2024-10-30) Gu, Yuechun; He, Jiajie; Chen, Keke
    Training data privacy has been a top concern in AI modeling. While methods like differentiated private learning allow data contributors to quantify acceptable privacy loss, model utility is often significantly damaged. In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments. In controlled data access, authorized model builders work in a restricted environment to access sensitive data, which can fully preserve data utility with reduced risk of data leak. However, unlike differential privacy, there is no quantitative measure for individual data contributors to tell their privacy risk before participating in a machine learning task. We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task. The demo source code will be available at \url{https://github.com/RhincodonE/demo_privacy_scoring}.
  • Item
    Calibrating Practical Privacy Risks for Differentially Private Machine Learning
    (2024-10-30) Gu, Yuechun; Chen, Keke
    Differential privacy quantifies privacy through the privacy budget ϵ, yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical ϵ setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA ASRᴹ on model M can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical ϵ settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed ϵ settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.
  • Item
    FAT-RABBIT: Fault-Aware Training towards Robustness Against Bit-flip Based Attacks in Deep Neural Networks
    (2024-11-06) Pourmehrani, Hossein; Bahrami, Javad; Nooralinejad, Parsa; Pirsiavash, Hamed; Karimi, Naghmeh
    Machine learning and in particular deep learning is used in a broad range of crucial applications. Implementing such models in custom hardware can be highly beneficial thanks to their low power and computation latency compared to GPUs. However, an error in their output can lead to disastrous outcomes. An adversary may force misclassification in the model's outcome by inducing a number of bit-flips in the targeted locations; thus declining the accuracy. To fill the gap, this paper presents FAT-RABBIT, a cost-effective mechanism designed to mitigate such threats by training the model such that there would be few weights that can be highly impactful in the outcome; thus reducing the sensitivity of the model to the fault injection attacks. Moreover, to increase robustness against bit-wise large perturbations, we propose an optimization scheme so-called M-SAM. We then augment FAT-RABBIT with the M-SAM optimizer to further bolster model accuracy against bit-flipping fault attacks. Notably, these approaches incur no additional hardware overhead. Our experimental results demonstrate the robustness of FAT-RABBIT and its augmented version, called Augmented FAT-RABBIT, against such attacks.
  • Item
    Are the flows of complex-valued Laplacians and their pseudoinverses related?
    (2024-11-14) Saxena, Aditi; Tripathy, Twinkle; Anguluri, Rajasekhar
    Laplacian flows model the rate of change of each node's state as being proportional to the difference between its value and that of its neighbors. Typically, these flows capture diffusion or synchronization dynamics and are well-studied. Expanding on these classical flows, we introduce a pseudoinverse Laplacian flow system, substituting the Laplacian with its pseudoinverse within complex-valued networks. Interestingly, for undirected graphs and unsigned weight-balanced digraphs, Laplacian and the pseudoinverse Laplacian flows exhibit an interdependence in terms of consensus. To show this relation, we first present the conditions for achieving consensus in the pseudoinverse Laplacian flow system using the property of real eventually exponentially positivity. Thereafter, we show that the pseudoinverse Laplacian flow system converges to consensus if and only if the Laplacian flow system achieves consensus in the above-mentioned networks. However, these are only the sufficient conditions for digraphs. Further, we illustrate the efficacy of the proposed approach through examples, focusing primarily on power networks.