UMBC Information Systems Department

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 901
  • Item
    SMART: a Secure Remote Sensing Solution for Smart Cities' Urban Areas
    (IEEE, 2024-02-12) Rathee, Geetanjali; Kerrache, Chaker Abdelaziz; Calafate, Carlos T.; Bilal, Muhammad; Song, Houbing
    Nowadays, smart cities are becoming an emerging area of research for upgrading and modifying our existing society by adopting the latest and the most trending technologies in the market. Though the number of IoT based applications is constantly increasing, with new products being launched every 6 months, many organizations are afraid of an early adoption of such products because of their security issues. In particular, the transmission and storage of online information causes a lot of cybersecurity issues while ensuring a secure communication mechanism. The aim of this paper is thus to present an efficient and effective communicating mechanism for smart cities using two decision-making models based on the SMART and Subjective approaches. The SMART approach is used to make an intelligent and ideal decision when communicating in the network. In addition, the continuous surveillance of the communicating entities can be done by computing their trust values through a subjective mechanism. The devices having a higher trust value are thus considered as more trustworthy devices. The proposed mechanism is simulated and verified for various security metrics, being compared to the state-of-art approaches. In addition, the proposed mechanism is simulated and out-performed against existing approaches by showing a 97% improvement in terms of accuracy, utility value, delay and threat metrics.
  • Item
    Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Tra!icking Text Corpora
    (2023-08-06) Karabatis, Saydeh N.; Janeja, Vandana
    Climate change and political unrest in certain regions of the world are imposing extreme hardship on many communities and are forcing millions of vulnerable populations to abandon their homelands and seek refuge in safer lands. As international laws are not fully set to deal with the migration crisis, people are relying on networks of exploiting smugglers to escape the devastation in order to live in stability. During the smuggling journey, migrants can become victims of human trafficking if they fail to pay the smuggler and may be forced into coerced labor. Government agencies and anti- trafficking organizations try to identify the trafficking routes based on stories of survivors in order to gain knowledge and help prevent such crimes. In this paper, we propose a system called Narrative to Trajectory (N2T⁺), which extracts trajectories of trafficking routes. N2T⁺ uses Data Science and Natural Language Processing techniques to analyze trafficking narratives, automatically extract relevant location names, disambiguate possible name ambiguities, and plot the trafficking route on a map. In a comparative evaluation we show that the proposed multi-dimensional approach offers significantly higher geolocation detection than other state of the art techniques.
  • Item
    Making existing software quantum safe: A case study on IBM Db2
    (Elsevier, 2023-05-23) Zhang, Lei; Miranskyy, Andriy; Rjaibi, Walid; Stager, Greg; Gray, Michael; Peck, John
    Context: The software engineering community is facing challenges from quantum computers (QCs). In the era of quantum computing, Shor’s algorithm running on QCs can break asymmetric encryption algorithms that classical computers practically cannot. Though the exact date when QCs will become “dangerous” for practical problems is unknown, the consensus is that this future is near. Thus, the software engineering community needs to start making software ready for quantum attacks and ensure quantum safety proactively. Objective: We argue that the problem of evolving existing software to quantum-safe software is very similar to the Y2K bug. Thus, we leverage some best practices from the Y2K bug and propose our roadmap, called 7E, which gives developers a structured way to prepare for quantum attacks. It is intended to help developers start planning for the creation of new software and the evolution of cryptography in existing software. Method: In this paper, we use a case study to validate the viability of 7E. Our software under study is the IBM Db2 database system. We upgrade the current cryptographic schemes to post-quantum cryptographic ones (using Kyber and Dilithium schemes) and report our findings and lessons learned. Results: We show that the 7E roadmap effectively plans the evolution of existing software security features towards quantum safety, but it does require minor revisions. We incorporate our experience with IBM Db2 into the revised 7E roadmap. Conclusion: The U.S. Department of Commerce’s National Institute of Standards and Technology is finalizing the post-quantum cryptographic standard. The software engineering community needs to start getting prepared for the quantum advantage era. We hope that our experiential study with IBM Db2 and the 7E roadmap will help the community prepare existing software for quantum attacks in a structured manner.
  • Item
    Automated data validation: An industrial experience report
    (Elsevier, 2022-12-16) Zhang, Lei; Howard, Sean; Montpool, Tom; Moore, Jessica; Mahajan, Krittika; Miranskyy, Andriy
    Abstract There has been a massive explosion of data generated by customers and retained by companies in the last decade. However, there is a significant mismatch between the increasing volume of data and the lack of automation methods and tools. The lack of best practices in data science programming may lead to software quality degradation, release schedule slippage, and budget overruns. To mitigate these concerns, we would like to bring software engineering best practices into data science. Specifically, we focus on automated data validation in the data preparation phase of the software development life cycle. This paper studies a real-world industrial case and applies software engineering best practices to develop an automated test harness called RESTORE. We release RESTORE as an open-source R package. Our experience report, done on the geodemographic data, shows that RESTORE enables efficient and effective detection of errors injected during the data preparation phase. RESTORE also significantly reduced the cost of testing. We hope that the community benefits from the open-source project and the practical advice based on our experience.
  • Item
    A Reference Architecture for Observability and Compliance of Cloud Native Applications
    (2023-02-22) Pourmajidi, William; Zhang, Lei; Steinbacher, John; Erwin, Tony; Miranskyy, Andriy
    The evolution of Cloud computing led to a novel breed of applications known as Cloud-Native Applications (CNAs). However, observing and monitoring these applications can be challenging, especially if a CNA is bound by compliance requirements. To address this challenge, we explore the characteristics of CNAs and how they affect CNAs' observability and compliance. We then construct a reference architecture for observability and compliance pipelines for CNAs. Furthermore, we sketch instances of this reference architecture for single- and multi-cloud deployments. The proposed architecture embeds observability and compliance into the CNA architecture and adopts a "battery-included" mindset. This architecture can be applied to small and large CNA deployments in regulated and non-regulated industries. It allows Cloud practitioners to focus on what is critical, namely building their products, without being burdened by observability and compliance requirements. This work may also interest academics as it provides a building block for generic CNA architectures.
  • Item
    Automatic Diagnosis of Quantum Software Bug-Fix Motifs
    (2023) Kher, Krishn V.; Chandra, M. Bharat; Joshi, Ishan; Zhang, Lei; Rao, M. V. Panduranga
    Bug-fix pattern detection has been investigated in the past in the context of classical software. However, while quantum software is developing rapidly, the literature is still lacking automated methods and tools to identify, analyze, and detect bug-fix patterns. To the best of our knowledge, our work is the first to leverage classical techniques to detect bug-fix patterns in quantum code. In this paper, we propose an automated framework, called Q-Diff, for detecting bug-fix patterns in IBM Qiskit quantum code. In the framework, we develop a proof-of-concept tool based on Abstract Syntax Trees. To validate our method, we test Q-Diff with a variety of quantum bug-fix patterns using examples. We hope our work will attract the attention of the quantum software engineering community to improve the quality of quantum software.
  • Item
    R3ACWU : A Lightweight, Trustworthy Authentication Scheme for UAV-Assisted IoT Applications
    (IEEE, 2024-01-24) Adil, Muhammad; Abulkasim, Hussein; Farouk, Ahmed; Song, Houbing
    The technology of Unmanned Aerial Vehicles (UAVs) has sparked a revolution in numerous Internet of Things (IoT) applications, such as flood monitoring, wildfire monitoring, coastal area surveillance, intelligent transportation, and classified military operations, etc. This technology offers several advantages when used as a flying base station to enhance the communication metrics of an employed IoT appplication. However, as an integrated technology (UAV-assisted IoT applications), it suffers from many challenges, and security is one of the foremost concerns. Considering that, in this paper, we proposed a hybrid lightweight key exchange authentication model for UAV-assisted IoT applications to resolve the device-to-device (D2D) authentication and data privacy issues in these networks. The proposed model employs five different security parameters named registration, authentication, authorization, accounting, and cache wash and update (R3ACWU) in coordination with a hash function. The network architecture consists of UAVs, IoT devices, and micro base stations, followed by base stations, authentication servers, and service providers (SP). In this framework, we introduce a concept known as ‘dead time’, a specific time period after which each device’s cache memory is cleared and updated. This practice not only enhances the security of the devices in use but also reduces computational and memory overhead by eliminating the records of devices that haven’t participated in the communication process within the specified time frame. Results statistics of our lightweight R3ACWU authentication scheme exhibit notable improvement corresponded to the present authentication schemes in terms of comparative parameters.
  • Item
    International Workshop on Digital Twins for Smart Health
    (ACM, 2023-04-30) Deng, Jun; Ding, Ying; Achenie, Luke; Liu, Jinwei; Pan, Shimei; Purushotham, Sanjay
  • Item
    Trapping LLM Hallucinations Using Tagged Context Prompts
    (2023-06-09) Feldman, Philip; Foulds, James; Pan, Shimei
    Recent advances in large language models (LLMs), such as ChatGPT, have led to highly sophisticated conversation agents. However, these models suffer from "hallucinations," where the model generates false or fabricated information. Addressing this challenge is crucial, particularly with AI-driven platforms being adopted across various sectors. In this paper, we propose a novel method to recognize and flag instances when LLMs perform outside their domain knowledge, and ensuring users receive accurate information. We find that the use of context combined with embedded tags can successfully combat hallucinations within generative language models. To do this, we baseline hallucination frequency in no-context prompt-response pairs using generated URLs as easily-tested indicators of fabricated data. We observed a significant reduction in overall hallucination when context was supplied along with question prompts for tested generative engines. Lastly, we evaluated how placing tags within contexts impacted model responses and were able to eliminate hallucinations in responses with 98.88% effectiveness.
  • Item
    The Keyword Explorer Suite: A Toolkit for Understanding Online Populations
    (ACM, 2023-03-27) Feldman, Philip; Pan, Shimei; Foulds, James
    We have developed a set of Python applications that use large language models to identify and analyze data from social media platforms relevant to a population of interest. Our pipeline begins with using OpenAI’s GPT-3 to generate potential keywords for identifying relevant text content from the target population. The keywords are then validated, and the content downloaded and analyzed using GPT-3 embedding and manifold reduction. Corpora are then created to fine-tune GPT-2 models to explore latent information via prompt-based queries. These tools allow researchers and practitioners to gain valuable insights into population subgroups online.
  • Item
    The Role of Interactive Visualization in Explaining (Large) NLP Models: from Data to Inference
    (2023-01-11) Brath, Richard; Keim, Daniel; Knittel, Johannes; Pan, Shimei; Sommerauer, Pia; Strobelt, Hendrik
    With a constant increase of learned parameters, modern neural language models become increasingly more powerful. Yet, explaining these complex model's behavior remains a widely unsolved problem. In this paper, we discuss the role interactive visualization can play in explaining NLP models (XNLP). We motivate the use of visualization in relation to target users and common NLP pipelines. We also present several use cases to provide concrete examples on XNLP with visualization. Finally, we point out an extensive list of research opportunities in this field.
  • Item
    Do Humans Prefer Debiased AI Algorithms? A Case Study in Career Recommendation
    (ACM, 2022-03-22) Wang, Clarice; Wang, Kathryn; Bian, Andrew Y.; Islam, Rashidul; Keya, Kamrun Naher; Foulds, James; Pan, Shimei
    Currently, there is a surge of interest in fair Artificial Intelligence (AI) and Machine Learning (ML) research which aims to mitigate discriminatory bias in AI algorithms, e.g. along lines of gender, age, and race. While most research in this domain focuses on developing fair AI algorithms, in this work, we examine the challenges which arise when human- fair-AI interact. Our results show that due to an apparent conflict between human preferences and fairness, a fair AI algorithm on its own may be insufficient to achieve its intended results in the real world. Using college major recommendation as a case study, we build a fair AI recommender by employing gender debiasing machine learning techniques. Our offline evaluation showed that the debiased recommender makes fairer and more accurate college major recommendations. Nevertheless, an online user study of more than 200 college students revealed that participants on average prefer the original biased system over the debiased system. Specifically, we found that the perceived gender disparity associated with a college major is a determining factor for the acceptance of a recommendation. In other words, our results demonstrate we cannot fully address the gender bias issue in AI recommendations without addressing the gender bias in humans. They also highlight the urgent need to extend the current scope of fair AI research from narrowly focusing on debiasing AI algorithms to including new persuasion and bias explanation technologies in order to achieve intended societal impacts.
  • Item
    Emotional State Measurement Trial (EMOPROEXE):a Protocol for Promoting Exercise in Adult and Children with Cerebral Palsy
    (2024-01-23) Gomez-Gonzalez, Isabel M.; Castro-Garcia, Juan A.; Merino-Monge, Manuel; Sanchez-Anton, Gemma; Hamidi, Foad; Mendoza-Sagrera, Alejandro; Molina-Cantero, Alberto J.
    Background: The protocol described in this paper is part of a research project coordinated 1 between three Spanish universities where a technology aimed at improving the quality of life of 2 people suffering from cerebral palsy will be developed. Part of the technology developed will consist 3 of developing an interface and a series of applications to increase motivation for daily physical activity. 4 The basis of these developments is based on the measurement of the emotional state of the subjects. 5 Methods: The experimental protocol is designed with two objectives, on the one hand to identify the 6 emotional state through physiological signals, and on the other to determine whether music can be 7 a motivating factor to promote physical activity. It has been specifically designed for subjects with 8 cerebral palsy taking into account the special characteristics of this population. These are people with 9 whom it is difficult to use questionnaires to have a basis to contrast with the measured physiological 10 signals, so measurements must be taken in carefully chosen daily life situations Discussion: As results 11 we hope to obtain which physiological parameters are the most robust to measure the emotional state 12 and how to design rehabilitation and physical activity promotion routines that are motivating, in 13 addition to being able to avoid risk factors during the performance of these routines. Trial registration: 14 NCT05621057.
  • Item
    Coping through Precise Labeling of Emotions: A Deep Learning Approach to Studying Emotional Granularity in Consumer Reviews
    (2024-01-20) Faraji-Rad, Ali; Tamaddoni, Ali; Jebeli, Atefeh
    When describing their emotions, people may demonstrate emotional expertise by differentiating between emotions when using emotional labels or use emotion labels interchangeably to indicate a general valence. The authors develop a novel deep-learning-based method to measure the granularity with which people describe their emotions via language. They investigate the role of emotional granularity in consumer decision making, specifically in relation to coping with negative consumption experiences described in online reviews. Granularity in describing negative emotions is associated with more successful coping with negative experiences. Therefore, especially when the overall experience is negative, in which case coping is most relevant, greater granularity in describing negative emotions predicts more positive ratings of the business. Furthermore, in line with the view that the ability to granularly describe negative emotions is a skill, reviewers progressively become more granular when describing their negative emotions as they write more reviews. Consequently, reviewers progressively provide more positive ratings for negative experiences as they write more reviews. Finally, a greater temporal distance between the consumption experience and the writing of the review predicts greater granularity in describing negative emotions. Consequently, when the overall experience is negative and coping is relevant, a greater temporal distance predicts more positive ratings.
  • Item
    Modeling Metacognitive and Cognitive Processes in Data Science Problem Solving (Student Abstract)
    (AAAI, 2023-09-06) Alomair, Maryam; Pan, Shimei; Chen, Lujie Karen
    Data Science (DS) is an interdisciplinary topic that is applicable to many domains. In this preliminary investigation, we use caselet, a mini-version of a case study, as a learning tool to allow students to practice data science problem solving (DSPS). Using a dataset collected from a real-world classroom, we performed correlation analysis to reveal the structure of cognition and metacognition processes. We also explored the similarity of different DS knowledge components based on students’ performance. In addition, we built a predictive model to characterize the relationship between metacognition, cognition, and learning gain.
  • Item
    “How Do We Do This at a Distance?!” A Descriptive Study of Remote Undergraduate Research Programs during COVID-19
    (ASCB, 2022-01-03) Erickson, Olivia A.; Cole, Rebecca B.; Isaacs, Jared M.; Alvarez-Clare, Silvia; Ordoñez, Patricia; et al
    The COVID-19 pandemic shut down undergraduate research programs across the United States. A group of 23 colleges, universities, and research institutes hosted remote undergraduate research programs in the life sciences during Summer 2020. Given the unprecedented offering of remote programs, we carried out a study to describe and evaluate them. Using structured templates, we documented how programs were designed and implemented, including who participated. Through focus groups and surveys, we identified programmatic strengths and shortcomings as well as recommendations for improvements from students’ perspectives. Strengths included the quality of mentorship, opportunities for learning and professional development, and a feeling of connection with a larger community. Weaknesses included limited cohort building, challenges with insufficient structure, and issues with technology. Although all programs had one or more activities related to diversity, equity, inclusion, and justice, these topics were largely absent from student reports even though programs coincided with a peak in national consciousness about racial inequities and structural racism. Our results provide evidence for designing remote Research Experiences for Undergraduates (REUs) that are experienced favorably by students. Our results also indicate that remote REUs are sufficiently positive to further investigate their affordances and constraints, including the potential to scale up offerings, with minimal concern about disenfranchising students.
  • Item
    Virtually the Same? Evaluating the Effectiveness of Remote Undergraduate Research Experiences
    (ASCB, 2023-04-14) Hess, Riley A.; Erickson, Olivia A.; Cole, Rebecca B.; Isaacs, Jared M.; Ordoñez, Patricia; et al
    In-person undergraduate research experiences (UREs) promote students’ integration into careers in life science research. In 2020, the COVID-19 pandemic prompted institutions hosting summer URE programs to offer them remotely, raising questions about whether undergraduates who participate in remote research can experience scientific integration and whether they might perceive doing research less favorably (i.e., not beneficial or too costly). To address these questions, we examined indicators of scientific integration and perceptions of the benefits and costs of doing research among students who participated in remote life science URE programs in Summer 2020. We found that students experienced gains in scientific self-efficacy pre- to post-URE, similar to results reported for in-person UREs. We also found that students experienced gains in scientific identity, graduate and career intentions, and perceptions of the benefits of doing research only if they started their remote UREs at lower levels on these variables. Collectively, students did not change in their perceptions of the costs of doing research despite the challenges of working remotely. Yet students who started with low cost perceptions increased in these perceptions. These findings indicate that remote UREs can support students’ self-efficacy development, but may otherwise be limited in their potential to promote scientific integration.
  • Item
    Relationship Between Inter-individual Variation in Circadian Rhythm and Sociality: A case Study Using Halictid Bees
    (2021-09-06) Cartagena, Sofía Meléndez; Ortiz-Alvarado, Carlos A.; Ordoñez, Patricia; Cordero-Martínez, Claudia S.; Ambrose, Alexandria F.; Lizasoain, Luis A Roman; Vega, Milexis A Santos; Velez, Andrea V Velez; Acevedo-Gonzalez, Jenny P.; Gibbs, Jason; Petanidou, Theodora; Tscheulin, Thomas; Barthell, John T.; González, Victor H.; Giray, Tugrul; Agosto-Rivera, José L.
    The bee family Halictidae is considered to be an optimal model for the study of social evolution due to its remarkable range of social behaviors. Past studies in circadian rhythms suggest that social species may express more diversity in circadian behaviors than solitary species. However, these previous studies did not make appropriate taxonomic comparisons. To further explore the link between circadian rhythms and sociality, we examine four halictid species with different degrees of sociality, three social species of Lasioglossum, one from Greece and two from Puerto Rico, and a solitary species of Systropha from Greece. Based on our previous observations, we hypothesized that species with greater degree of sociality will show greater inter-individual variation in circadian rhythms than solitary species. We observed distinct differences in their circadian behavior that parallel differences across sociality, where the most social species expressed the highest inter-individual variation. We predict that circadian rhythm differences will be informative of sociality across organisms.
  • Item
    Mitigate: An Adaptive Network Data Anonymization Tool Using Condensation-Based Differential Privacy
    (2022-03-14) Karabatis, George; Chen, Zhiyuan; Aleroud, Ahmed
    Modern network devices collect a large amount of data that can be analyzed to identify bottlenecks, anomalies, cyber-attacks, etc. Therefore, there is often a need to analyze such collections of network data quite often by an external expert or by the research community. However, these collections of data contain sensitive, proprietary information. In order for the network data to be shared, it must first be anonymized. The overall objective of this project is to develop an innovative privacy management tool to anonymize network data and achieve sufficient privacy, acceptable data utility, and efficient data analysis at the same time. No existing anonymization methods can achieve all of these at the same time. The core of this technology is a differential private clustering algorithm that provides strong privacy protection, preserves data properties important for subsequent analysis, and allows the party receiving the anonymized data to conduct analysis directly on anonymized data without the need of decryption or any extra processing. The research carried out was to design, implement and verify a solution to this problem by completing the following tasks: 1) developing the core technology; 2) developing a context based method that automatically recommends fields that must be anonymized; 3) conducted experiments showing superior results using our approach compared to existing tools, and 4) developed an intuitive but basic user interface. The research that was conducted generated novel algorithmic techniques that utilize state-of-theart methods such as condensation, differential privacy preservation, clustering, automated tuning based on contextual awareness, and recommendation techniques to specify columns to users for anonymization leading to optimal privacy that allows research analysis on the dataset. Experiments were conducted to evaluate the efficacy of these novel algorithmic techniques by performing analysis on original non-anonymized datasets, then conducting analysis on the same yet anonymized datasets and comparing the results of the analyses. Overall, the anonymized analysis results were within 1% of the original results, verifying that the generated technology not only guarantees a high level of privacy but also enables research analysis as if it were conducted on the original dataset. Potential applications of this technology include anonymization of any type of structured network datasets that contain sensitive identifiers, such as IP addresses, that can be used in multiple applications. For example, to create an AI or machine learning model for cyber security, e.g., to detect attacks, or for performance analysis, e.g., identify bottlenecks or predict performance. In addition, a market analysis that was conducted for potential applications of this technology identified a broader range of applications of our anonymization technology beyond the network sector that includes healthcare, banking, insurance, securities, finance (FISB), data brokering, cloud services, ad sales, and government.