UMBC Computer Science and Electrical Engineering Department

Permanent URI for this collectionhttp://hdl.handle.net/11603/50

The Computer Science and Electrical Engineering Department aims to maintain a program of excellence in teaching, research, and service for all of its programs. At the undergraduate level, we will provide students with a firm foundation of both the theory and practice of Computer Science and Computer Engineering. Our curricula also give students the social, ethical, and liberal education needed to make significant contributions to society. Students receiving a bachelor’s degree are ready to enter the work force as productive computer scientists or computer engineers, or to continue their education at the graduate or professional level.

At the graduate level, we are committed to developing the research and professional capabilities of students in Computer Science, Computer Engineering, Electrical Engineering and Cybersecurity. Our programs provide a deeper mastery of the basics of these fields, as well as opportunities to collaborate on leading-edge research with our faculty. Our faculty are engaged in both practical and theoretical research, often in partnership with government agencies, private industry and non-governmental organizations. The aim of this research is to advance knowledge within our disciplines and also to contribute to solving problems faced by our society.

Browse

Recent Submissions

Now showing 1 - 20 of 2153
  • Item
    Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation
    (2024-12-30) Jiang, Yuxuan; Ferraro, Francis
    Recently, Large Language Models (LLMs) have shown impressive performance in character understanding tasks, such as analyzing the roles, personalities, and relationships of fictional characters. However, the extensive pre-training corpora used by LLMs raise concerns that they may rely on memorizing popular fictional works rather than genuinely understanding and reasoning about them. In this work, we argue that 'gist memory'-capturing essential meaning - should be the primary mechanism for character understanding tasks, as opposed to 'verbatim memory' - exact match of a string. We introduce a simple yet effective method to mitigate mechanized memorization in character understanding evaluations while preserving the essential implicit cues needed for comprehension and reasoning. Our approach reduces memorization-driven performance on popular fictional works from 96% accuracy to 72% and results in up to an 18% drop in accuracy across various character understanding tasks. These findings underscore the issue of data contamination in existing benchmarks, which often measure memorization rather than true character understanding.
  • Item
    MedReg-KG: KnowledgeGraph for Streamlining Medical Device Regulatory Compliance
    (2024-12-15) Chattoraj, Subhankar; Joshi, Karuna
    Healthcare providers are deploying a large number of AI-driven Medical devices to help monitor and medicate patients. For patients with chronic ailments, like diabetes or gastric diseases, usage of these devices becomes part of their daily lifestyle. These medical devices often capture personally identifiable information (PII) and hence are strictly regulated by the Food and Drug Administration (FDA) to ensure the safety and efficacy of the medical device. Medical device regulations are currently available as large textual documents, called Code of Federal Regulations (CFR) Title 21, that cross-reference other documents and so require substantial human effort and cost to parse and comprehend. We have developed a semantically rich framework MedReg-KG to extract the knowledge from the rules and policies for Medical devices and translate it into a machine-processable format that can be reasoned over. By applying Deontic Logic over the policies, we are able to identify the permissions and prohibitions in the regulation policies. This framework was developed using AI/Knowledge extraction techniques and Semantic Web technologies like OWL/RDF and SPARQL. This paper presents our Ontology/Knowledge graph and the Deontic rules integrated into the design. We include the results of our validation against the dataset of Gastroenterology Urology devices and demonstrate the efficiency gained by using our system.
  • Item
    An Efficient Computational Algorithm for Modeling Slow Soliton Interactions in Microresonators
    (Optica, 2024) Akter, Sanzida; Shandilya, Pradyoth; Courtright, Logan; D’Aguanno, Giuseppe; Leshem, Amir; Gat, Omri; Menyuk, Curtis
    Standard simulations of microresonator waveforms are limited by the photon lifetime. We describe a computational method that enables simulations on a laboratory timescale and apply this approach to study two-soliton interactions.
  • Item
    Increasing Visual Literacy With Collaborative Foraging, Annotation, Curation, and Critique
    (ACM, 2024-12-05) Williams, Rebecca M.; Syed, Afrin Unnisa; Kurumaddali, Krishna Vamsi
    Students today are facing information overload, contamination, and bloat from dubious sources: AI-generated content, masqueraded influencer opinions, context-less listicles, and consumer manipulation - frequently heralded by graphs and charts to bolster the argument. Because this information firehose presents as technical visual communications, the overload is both cognitive and perceptual, potentially causing more insidious misperceptions than text alone. In addition to consuming such media, students in computing fields work with data to produce graphs and charts themselves, including assignments, academic research, and personal projects/blog posts/tweets. Depending on visual literacy (VL) and prior data analysis instruction, many students inadvertently code misleading, unethical, or biased visualizations, potentially contributing to the dark corpus already festering online. Prior research on misconceptions in visualization pedagogy suggests students benefit from repeated opportunities to forage, curate and critique examples, discussing and debating with peers and instructors. Inspired by these findings, we incorporated a visual curation + annotation platform into a Data Visualization Computer Science course, enabling students to participate in processes of searching for and curating found examples of misleading visualizations, collaborative annotation + critique of examples, and structured self-evaluation of misleading elements in their own work. We assess our interventions with pre-/post-course Visualization Literacy Assessment Tests, qualitative evaluation of student reflections, taxonomic evaluation of formative student-produced visualizations, and post-course exit surveys. Post-course, students' VL increased significantly, and the number and severity of misleading visualizations they created decreased. Students also reflected that they gained increased confidence in spotting visual disinformation online, and in avoiding its creation in software.
  • Item
    HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
    (2024-12-05) Bhattarai, Manish; Barron, Ryan; Eren, Maksim; Vu, Minh; Grantcharov, Vesselin; Boureima, Ismael; Stanev, Valentin; Matuszek, Cynthia; Valtchinov, Vladimir; Rasmussen, Kim; Alexandrov, Boian
    Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.
  • Item
    WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
    (2024-12-17) Huang, Runsheng "Anson"; Martin, Lara J.; Callison-Burch, Chris
    WHAT-IF -- Writing a Hero's Alternate Timeline through Interactive Fiction -- is a system that uses zero-shot meta-prompting to create branching narratives from a prewritten story. Played as an interactive fiction (IF) game, WHAT-IF lets the player choose between decisions that the large language model (LLM) GPT-4 generates as possible branches in the story. Starting with an existing linear plot as input, a branch is created at each key decision taken by the main character. By meta-prompting the LLM to consider the major plot points from the story, the system produces coherent and well-structured alternate storylines. WHAT-IF stores the branching plot tree in a graph which helps it to both keep track of the story for prompting and maintain the structure for the final IF system. A video demo of our system can be found here: https://youtu.be/8vBqjqtupcc.
  • Item
    Robust and Lightweight Challenge Obfuscation Mechanism for Anti-modeling Protection of Arbiter-PUFs
    (Springer Nature, 2024-12-06) Ebrahimabadi, Mohammad; Younis, Mohamed; Sanjana Mehjabin, Suhee; Tekeoglu, Ali; Sookoor, Tamim I.; Karimi, Naghmeh
    Physically unclonable functions (PUFs) are lightweight hardware security primitives that leverage the imperfection of the manufacturing process of integrated circuits to generate unique signatures (responses) when queried by various bit-strings (challenges). These signatures can be used not only to authenticate interconnected devices but also to generate cryptographic keys for preserving the integrity and confidentiality of data. Among the different designs, the arbiter-PUF and its variants have received the most attention due to the large cardinality of their challenge-response set. To prevent the PUF circuits from being modeled using machine learning techniques, challenge obfuscation is often being pursued. Particularly, the simplicity of bit scrambling makes it an attractive means to achieve such a goal without diminishing the low complexity advantages of PUFs. This paper first shows that the conventional, fixed pattern-based, bit scrambling scheme is vulnerable by developing a detailed attack scenario. Then, we propose a novel lightweight dynamic challenge scrambling (DCS) mechanism that predictably varies the bit-swapping pattern per packet and per node. Such variability severely degrades the PUF modeling accuracy. The results extracted from the FPGA implementation of DCS confirm its effectiveness in thwarting PUF modeling attacks.
  • Item
    Multi-modal Pre-silicon Evaluation of Hardware Masking Styles
    (Springer, 2024-12-16) Anik, Md Toufiq Hasan; Reefat, Hasin Ishraq; Cheng, Wei; Danger, Jean-Luc; Guilley, Sylvain; Karimi, Naghmeh
    Protecting sensitive logic functions in ASICs requires side-channel countermeasures. Many gate-level masking styles have been published, each with pros and cons. Some styles such as RSM, GLUT, and ISW are compact but can feature 1st-order leakage. Some other styles, such as TI, DOM, and HPC are secure at the 1st-order but incur significant overheads in terms of performance. Another requirement is that security shall be ensured even when the device is aged. Pre-silicon security evaluation is now a normatively approved method to characterize the expected resiliency against attacks ahead of time. However, in this regard, there is still a fragmentation in terms of leakage models, Points of Interest (PoI) selection, attack order, and distinguishers. Accordingly, in this paper we focus on such factors as they affect the success of side-channel analysis attacks and assess the resiliency of the state-of-the-art masking styles in various corners. Moreover, we investigate the impact of device aging as another factor and analyze its influence on the success of side-channel attacks targeting the state-of-the-art masking schemes. This pragmatic evaluation enables risk estimation in a complex PPA (Power, Performance, and Area) and security plane while also considering aging impacts into account. For instance, we explore the trade-off between low-cost secure styles attackable at 1st-order vs high-cost protection attackable only at 2nd-order.
  • Item
    Multiple Chronic Condition Patterns among Full-Benefit Maryland Medicaid Enrollees
    (2024-06-29) Han, Fei; Gill, Christine; Blake, Elizabeth; Stockwell, Ian
  • Item
    Mohamed Younis Honored For Contributions To Modern Communication Technologies
    (UMBC News, 2024-12-13) Meyers, Catherine; Demond, Marlayna
    Mohamed Younis, professor and chair of the Department of Computer Science and Electrical Engineering, has been honored by the Institute of Electrical and Electronics Engineers (IEEE) Communications Society for his significant and lasting contributions to the advancement of modern communication technologies. The award was announced December 9 at the society抯 Global Communications Conference in Cape...
  • Item
    Mode Coresets for Efficient, Interpretable Tensor Decompositions: An Application to Feature Selection in fMRI Analysis
    (IEEE, 2024-12-13) Gabrielson, Ben; Yang, Hanlu; Vu, Trung; Calhoun, Vince; Adali, Tulay
    Generalizations of matrix decompositions to multidimensional arrays, called tensor decompositions, are simple yet powerful methods for analyzing datasets in the form of tensors. These decompositions model a data tensor as a sum of rank-1 tensors, whose factors provide uses for a myriad of applications. Given the massive sizes of modern datasets, an important challenge is how well computational complexity scales with the data, balanced with how well decompositions approximate the data. Many efficient methods exploit a small subset of the tensor抯 elements, representing most of the tensor抯 variation via a basis over the subset. These methods� efficiencies are often due to their randomized natures; however, deterministic methods can provide better approximations, and can perform feature selection, highlighting a meaningful subset that well-represents the entire tensor. In this paper, we introduce an efficient subset-based form of the Tucker decomposition, by selecting coresets from the tensor modes such that the resulting core tensor can well-approximate the full tensor. Furthermore, our method enables a novel feature selection scheme unlike other methods for tensor data. We introduce methods for random and deterministic coresets, minimizing error via a measure of discrepancy between the coreset and full tensor. We perform the decompositions on simulated data, and perform on real-world fMRI data to demonstrate our method抯 feature selection ability. We demonstrate that compared with other similar decomposition methods, our methods can typically better approximate the tensor with comparably low computational complexities.
  • Item
    An Investigation of the Relationship Between Crime Rate and Police Compensation
    (2024-11-21) Amarsingh, Jhancy; Appakondreddigari, Likhith Kumar Reddy; Nunna, Ashish; Tummala, Charishma Choudary; Winship, John; Zhou, Alex; Ashqar, Huthaifa
    The goal of this paper is to assess whether there is any correlation between police salaries and crime rates. Using public data sources that contain Baltimore Crime Rates and Baltimore Police Department (BPD) salary information from 2011 to 2021, our research uses a variety of techniques to capture and measure any correlation between the two. Based on that correlation, the paper then uses established social theories to make recommendations on how this data can potentially be used by State Leadership. Our initial results show a negative correlation between salary/compensation levels and crime rates.
  • Item
    A Method for Multimodal IVA Fusion Within a MISA Unified Model Reveals Markers of Age, Sex, Cognition, and Schizophrenia in Large Neuroimaging Studies
    (Wiley, 2024-11-19) Silva, Rogers F.; Damaraju, Eswar; Li, Xinhui; Kochunov, Peter; Ford, Judith M.; Mathalon, Daniel H.; Turner, Jessica A.; van Erp, Theo G. M.; Adali, Tulay; Calhoun, Vince D.
    With the increasing availability of large-scale multimodal neuroimaging datasets, it is necessary to develop data fusion methods which can extract cross-modal features. A general framework, multidataset independent subspace analysis (MISA), has been developed to encompass multiple blind source separation approaches and identify linked cross-modal sources in multiple datasets. In this work, we utilized the multimodal independent vector analysis (MMIVA) model in MISA to directly identify meaningful linked features across three neuroimaging modalities—structural magnetic resonance imaging (MRI), resting state functional MRI and diffusion MRI—in two large independent datasets, one comprising of control subjects and the other including patients with schizophrenia. Results show several linked subject profiles (sources) that capture age-associated decline, schizophrenia-related biomarkers, sex effects, and cognitive performance. For sources associated with age, both shared and modality-specific brain-age deltas were evaluated for association with non-imaging variables. In addition, each set of linked sources reveals a corresponding set of cross-modal spatial patterns that can be studied jointly. We demonstrate that the MMIVA fusion model can identify linked sources across multiple modalities, and that at least one set of linked, age-related sources replicates across two independent and separately analyzed datasets. The same set also presented age-adjusted group differences, with schizophrenia patients indicating lower multimodal source levels. Linked sets associated with sex and cognition are also reported for the UK Biobank dataset.
  • Item
    What is the Point? Evaluating the Structure, Color, and Semantic Traits of Computer Vision Point Clouds of Vegetation
    (MDPI, 2017-04-09) Dandois, Jonathan P.; Baker, Matthew; Olano, Marc; Parker, Geoffrey G.; Ellis, Erle C.
    Remote sensing of the structural and spectral traits of vegetation is being transformed by structure from motion (SFM) algorithms that combine overlapping images to produce three-dimensional (3D) red-green-blue (RGB) point clouds. However, much remains unknown about how these point clouds are used to observe vegetation, limiting the understanding of the results and future applications. Here, we examine the content and quality of SFM point cloud 3D-RGB fusion observations. An SFM algorithm using the Scale Invariant Feature Transform (SIFT) feature detector was applied to create the 3D-RGB point clouds of a single tree and forest patches. The fusion quality was evaluated using targets placed within the tree and was compared to fusion measurements from terrestrial LIDAR (TLS). K-means clustering and manual classification were used to evaluate the semantic content of SIFT features. When targets were fully visible in the images, SFM assigned color in the correct place with a high accuracy (93%). The accuracy was lower when targets were shadowed or obscured (29%). Clustering and classification revealed that the SIFT features highlighted areas that were brighter or darker than their surroundings, showing little correspondence with canopy objects like leaves or branches, though the features showed some relationship to landscape context (e.g., canopy, pavement). Therefore, the results suggest that feature detectors play a critical role in determining how vegetation is sampled by SFM. Future research should consider developing feature detectors that are optimized for vegetation mapping, including extracting elements like leaves and flowers. Features should be considered the fundamental unit of SFM mapping, like the pixel in optical imaging and the laser pulse of LIDAR. Under optimal conditions, SFM fusion accuracy exceeded that of TLS, and the two systems produced similar representations of the overall tree shape. SFM is the lower-cost solution for obtaining accurate 3D-RGB fusion measurements of the outer surfaces of vegetation, the critical zone of interaction between vegetation, light, and the atmosphere from leaf to canopy scales.
  • Item
    LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation
    (2015-07-23) Ganesan, Ashwinkumar; Brantley, Kiante; Pan, Shimei; Chen, Jian
    We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.
  • Item
    A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19
    (2024-11-11) Khandelwal, Vedant; Gaur, Manas; Kursuncu, Ugur; Shalin, Valerie; Sheth, Amit
    Monitoring public sentiment via social media is potentially helpful during health crises such as the COVID-19 pandemic. However, traditional frequency-based, data-driven neural network-based approaches can miss newly relevant content due to the evolving nature of language in a dynamically evolving environment. Human-curated symbolic knowledge sources, such as lexicons for standard language and slang terms, can potentially elevate social media signals in evolving language. We introduce a neurosymbolic method that integrates neural networks with symbolic knowledge sources, enhancing the detection and interpretation of mental health-related tweets relevant to COVID-19. Our method was evaluated using a corpus of large datasets (approximately 12 billion tweets, 2.5 million subreddit data, and 700k news articles) and multiple knowledge graphs. This method dynamically adapts to evolving language, outperforming purely data-driven models with an F1 score exceeding 92\%. This approach also showed faster adaptation to new data and lower computational demands than fine-tuning pre-trained large language models (LLMs). This study demonstrates the benefit of neurosymbolic methods in interpreting text in a dynamic environment for tasks such as health surveillance.
  • Item
    In Context Learning and Reasoning for Symbolic Regression with Large Language Models
    (2024-10-22) Sharlin, Samiha; Josephson, Tyler R.
    Large Language Models (LLMs) are transformer-based machine learning models that have shown remarkable performance in tasks for which they were not explicitly trained. Here, we explore the potential of LLMs to perform symbolic regression -- a machine-learning method for finding simple and accurate equations from datasets. We prompt GPT-4 to suggest expressions from data, which are then optimized and evaluated using external Python tools. These results are fed back to GPT-4, which proposes improved expressions while optimizing for complexity and loss. Using chain-of-thought prompting, we instruct GPT-4 to analyze the data, prior expressions, and the scientific context (expressed in natural language) for each problem before generating new expressions. We evaluated the workflow in rediscovery of five well-known scientific equations from experimental data, and on an additional dataset without a known equation. GPT-4 successfully rediscovered all five equations, and in general, performed better when prompted to use a scratchpad and consider scientific context. We also demonstrate how strategic prompting improves the model's performance and how the natural language interface simplifies integrating theory with data. Although this approach does not outperform established SR programs where target equations are more complex, LLMs can nonetheless iterate toward improved solutions while following instructions and incorporating scientific context in natural language.
  • Item
    The Dual Role of Student and Creator: Exploring the TikTok Experience
    (ACM, 2024-11-13) Bulley, Bharadwaj Kuruba; Tirumala, Shravika; Mahamkali, Bhavani Shankar; Sakib, Md Nazmus; Ahmed, Saquib; Dey, Sanorita
    TikTok is one of the most common content-creating social media platforms for youth in the USA. In recent years, its engaging content has significantly influenced people, shaping trends, behaviors, and communication styles among its predominantly young user base. This study evaluates TikTok's impact on college and university students as they invest a lot of time creating content and engaging on TikTok besides their studies. While existing research highlights TikTok's educational benefits and adverse societal and psychological effects, our mixed-method approach provides a focused analysis of student content creators. Survey data quantifies usage patterns and their correlation with academic and mental health indicators, while interviews offer qualitative insights into personal experiences. Findings reveal that TikTok affects students' time management, mental health, academic performance, and self-perception. Although TikTok facilitates creativity and social connections, it also induces stress and distraction. This study aims to fill research gaps and propose new directions, offering practical recommendations for balancing TikTok's benefits and drawbacks for student content creators.
  • Item
    When to Commute During the COVID-19 Pandemic and Beyond: Analysis of Traffic Crashes in Washington, D.C
    (2024-11-08) Choi, Joanne; Clark, Sam; Jaiswal, Ranjan; Kirk, Peter; Jayaraman, Sachin; Ashqar, Huthaifa
    Many workers in cities across the world, who have been teleworking because of the COVID-19 pandemic, are expected to be back to their commutes. As this process is believed to be gradual and telecommuting is likely to remain an option for many workers, hybrid model and flexible schedules might become the norm in the future. This variable work schedules allows employees to commute outside of traditional rush hours. Moreover, many studies showed that commuters might be skeptical of using trains, buses, and carpools and could turn to personal vehicles to get to work, which might increase congestion and crashes in the roads. This study attempts to provide information on the safest time to commute to Washington, DC area analyzing historical traffic crash data before the COVID-19 pandemic. It also aims to advance our understanding of traffic crashes and other relating factors such as weather in the Washington, DC area. We created a model to predict crashes by time of the day, using a negative binomial regression after rejecting a Poisson regression, and additionally explored the validity of a Random Forest regression. Our main consideration for an eventual application of this study is to reduce crashes in Washington DC, using this tool that provides people with better options on when to commute and when to telework, if available. The study also provides policymakers and researchers with real-world insights that decrease the number of traffic crashes to help achieve the goals of The Vision Zero Initiative adopted by the district.