UMBC Information Systems Department

Permanent URI for this collectionhttp://hdl.handle.net/11603/51

Browse

Recent Submissions

Now showing 1 - 20 of 1102
  • Item
    Listening for Expert Identified Linguistic Features: Assessment of Audio Deepfake Discernment among Undergraduate Students
    (2024-11-21) Bhalli, Noshaba Nasir; Naqvi, Nehal; Evered, Chloe; Mallinson, Christine; Janeja, Vandana
    This paper evaluates the impact of training undergraduate students to improve their audio deepfake discernment ability by listening for expert-defined linguistic features. Such features have been shown to improve performance of AI algorithms; here, we ascertain whether this improvement in AI algorithms also translates to improvement of the perceptual awareness and discernment ability of listeners. With humans as the weakest link in any cybersecurity solution, we propose that listener discernment is a key factor for improving trustworthiness of audio content. In this study we determine whether training that familiarizes listeners with English language variation can improve their abilities to discern audio deepfakes. We focus on undergraduate students, as this demographic group is constantly exposed to social media and the potential for deception and misinformation online. To the best of our knowledge, our work is the first study to uniquely address English audio deepfake discernment through such techniques. Our research goes beyond informational training by introducing targeted linguistic cues to listeners as a deepfake discernment mechanism, via a training module. In a pre-/post- experimental design, we evaluated the impact of the training across 264 students as a representative cross section of all students at the University of Maryland, Baltimore County, and across experimental and control sections. Findings show that the experimental group showed a statistically significant decrease in their unsurety when evaluating audio clips and an improvement in their ability to correctly identify clips they were initially unsure about. While results are promising, future research will explore more robust and comprehensive trainings for greater impact.
  • Item
    Interactive Assessment of Variances of High-Resolution Model Features in Digital Twin Simulations
    (ACM, 2024-11-22) Kulkarni, Chhaya; Privé, Nikki; Janeja, Vandana
    Prior to the deployment of expensive instruments into orbit, spatio-temporal digital twin systems modeling the whole earth are used to study the efficacy of these instruments. However, we need to make sure that the simulated instruments have realistic characteristics (to reflect the physics of the atmosphere and limits of the instrument itself) in order for the results of the digital twin to be robust and usable. If these simulations are done accurately, the instrument can be deployed, leading to more accurate weather forecasts and climate research. This demonstration system validates the simulations, specifically the realism of remotely sensed observations. The digital twin system is a low-cost way to improve instrument design used in meteorological and climatological research. The primary goal is to show how atmospheric data can improve the development and validation of new observational systems for meteorology and climate science. We have developed an interactive variability study system that uses a dynamic platform to visualize, assess, and grasp complex atmospheric dynamics. The dashboard is built using Python for backend operations and integrates tools such as the Streamlit framework for quick web application development and the Folium library for advanced geospatial visualizations. This dashboard acts as a bridge between advanced atmospheric modeling and spatio-temporal digital twin applications, showcasing the substantial benefits of integrating comprehensive model outputs into the simulation of observational systems.
  • Item
    Making Enterprise Recorded Meetings Easy to Discover and Share
    (IGI Global, 2015) Pan, Shimei; Topkara, Mercan; Boston, Jeff; Wood, Steve; Lai, Jennifer
    The prevalence of social content sharing such as video and photo sharing has greatly enhanced information discovery and social interaction over the internet. This has inspired similar efforts within enterprise to encourage collaboration and expertise sharing. Moreover, enterprise web meeting tools increasingly become an important platform for knowledge workers to participate and collaborate remotely. Although these web meetings contain rich enterprise knowledge and are frequently recorded, they are rarely revisited and shared. To encourage enterprise knowledge sharing especially, to facilitate the discovery and sharing of enterprise meetings, we develop an end-to-end enterprise meeting service Agora that manages the full cycle of hosting and sharing recorded web meetings. Agora leverages the functionality of existing enterprise meeting hosting, video sharing and presentation sharing services to build a coherent meeting service. Agora was deployed as a cloud service in a global fortune 500 company which allows its customers to test new collaborative technologies.
  • Item
    User-directed Non-Disruptive Topic Model Update for Effective Exploration of Dynamic Content
    (ACM, 2015-03-18) Yang, Yi; Pan, Shimei; Song, Yangqiu; Lu, Jie; Topkara, Mercan
    Statistical topic models have become a useful and ubiquitous text analysis tool for large corpora. One common application of statistical topic models is to support topic-centric navigation and exploration of document collections at the user interface by automatically grouping documents into coherent topics. For today's constantly expanding document collections, topic models need to be updated when new documents become available. Existing work on topic model update focuses on how to best fit the model to the data, and ignores an important aspect that is closely related to the end user experience: topic model stability. When the model is updated with new documents, the topics previously assigned to old documents may change, which may result in a disruption of end users' mental maps between documents and topics, thus undermining the usability of the applications. In this paper, we describe a user-directed non-disruptive topic model update system, nTMU, that balances the tradeoff between finding the model that fits the data and maintaining the stability of the model from end users' perspective. It employs a novel constrained LDA algorithm (cLDA) to incorporate pair-wise document constraints, which are converted from user feedback about topics, to achieve topic model stability. Evaluation results demonstrate advantages of our approach over previous methods.
  • Item
    LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation
    (2015-07-23) Ganesan, Ashwinkumar; Brantley, Kiante; Pan, Shimei; Chen, Jian
    We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.
  • Item
    An Uncertainty-Aware Approach for Exploratory Microblog Retrieval
    (IEEE, 2015-08-12) Liu, Mengchen; Liu, Shixia; Zhu, Xizhou; Liao, Qinying; Wei, Furu; Pan, Shimei
    Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient posts, users, and hashtags. We extend an existing ranking technique to compute a multifaceted retrieval result: the mutual reinforcement rank of a graph node, the uncertainty of each rank, and the propagation of uncertainty among different graph nodes. To illustrate the three facets, we have also designed a composite visualization with three visual components: a graph visualization, an uncertainty glyph, and a flow map. The graph visualization with glyphs, the flow map, and the uncertainty analysis together enable analysts to effectively find the most uncertain results and interactively refine them. We have applied our approach to several Twitter datasets. Qualitative evaluation and two real-world case studies demonstrate the promise of our approach for retrieving high-quality microblog data.
  • Item
    Using Personal Traits For Brand Preference Prediction
    (ACL, 2015-09) Yang, Chao; Pan, Shimei; Mahmud, Jalal; Yang, Huahai; Srinivasan, Padmini
    In this paper, we present a comprehensive study of the relationship between an individual’s personal traits and his/her brand preferences. In our analysis, we included a large number of character traits such as personality, personal values and individual needs. These trait features were obtained from both a psychometric survey and automated social media analytics. We also included an extensive set of brand names from diverse product categories. From this analysis, we want to shed some light on (1) whether it is possible to use personal traits to infer an individual’s brand preferences (2) whether the trait features automatically inferred from social media are good proxies for the ground truth character traits in brand preference prediction.
  • Item
    Cross-Domain Error Correction in Personality Prediction
    (IOS Press, 2016) Kılıç, Işıl Doğa Yakut; Pan, Shimei
    In this paper, we analyze domain bias in automated textbased personality prediction, and proposes a novel method to correct domain bias. The proposed approach is very general since it requires neither retraining a personality prediction system using examples from a new domain, nor any knowledge of the original training data used to develop the system. We conduct several experiments to evaluate the effectiveness of the method, and the findings indicate a significant improvement of prediction accuracy.
  • Item
    An Empirical Study of the Effectiveness of using Sentiment Analysis Tools for Opinion Mining
    (SciTePress, 2016) Ding, Tao; Pan, Shimei
    Sentiment analysis is increasingly used as a tool to gauge people’s opinions on the internet. For example, sentiment analysis has been widely used in assessing people’s opinions on hotels, products (e.g., books and consumer electronics), public policies, and political candidates. However, due to the complexity in automated text analysis, today’s sentiment analysis tools are far from perfect. For example, many of them are good at detecting useful mood signals but inadequate in tracking and inferencing the relationships between different moods and different targets. As a result, if not used carefully, the results from sentiment analysis can be meaningless or even misleading. In this paper, we present an empirical analysis of the effectiveness of using existing sentiment analysis tools in assessing people’s opinions in five different domains. We also proposed several effectiveness indicators that can be computed automatically to help avoid the potential pitfalls in misusing a sentiment analysis tool.
  • Item
    Personalized Emphasis Framing for Persuasive Message Generation
    (ACL, 2016-11) Ding, Tao; Pan, Shimei
    In this paper, we present a study on personalized emphasis framing which can be used to tailor the content of a message to enhance its appeal to different individuals. With this framework, we directly model content selection decisions based on a set of psychologically-motivated domainindependent personal traits including personality (e.g., extraversion) and basic human values (e.g., self-transcendence). We also demonstrate how the analysis results can be used in automated personalized content selection for persuasive message generation.
  • Item
    $1 Today or $2 Tomorrow? The Answer is in Your Facebook Likes
    (2017-03-24) Ding, Tao; Bickel, Warren K.; Pan, Shimei
    Delay discounting, a behavioral measure of impulsivity, is often used to quantify the human tendency to choose a smaller, sooner reward (e.g., $1 today) over a larger, later reward ($2 tomorrow). Delay discounting and its relation to human decision making is a hot topic in economics and behavior science since pitting the demands of long-term goals against short term desires is among the most difficult tasks in human decision making [Hirsh et al., 2008]. Previously, small-scale studies based on questionnaires were used to analyze an individual’s delay discounting rate (DDR) and his/her realworld behavior (e.g., substance abuse) [Kirby et al., 1999]. In this research, we employ large-scale social media analytics to study DDR and its relation to people’s social media behavior (e.g., Facebook Likes). We also build computational models to automatically infer DDR from Social Media Likes. Our investigation has revealed interesting results.
  • Item
    The Stability and Usability of Statistical Topic Models
    (ACM, 2016-07-20) Yang, Yi; Pan, Shimei; Lu, Jie; Topkara, Mercan; Song, Yangqiu
    Statistical topic models have become a useful and ubiquitous tool for analyzing large text corpora. One common application of statistical topic models is to support topic-centric navigation and exploration of document collections. Existing work on topic modeling focuses on the inference of model parameters so the resulting model fits the input data. Since the exact inference is intractable, statistical inference methods, such as Gibbs Sampling, are commonly used to solve the problem. However, most of the existing work ignores an important aspect that is closely related to the end user experience: topic model stability. When the model is either re-trained with the same input data or updated with new documents, the topic previously assigned to a document may change under the new model, which may result in a disruption of end users’ mental maps about the relations between documents and topics, thus undermining the usability of the applications. In this article, we propose a novel user-directed non-disruptive topic model update method that balances the tradeoff between finding the model that fits the data and maintaining the stability of the model from end users’ perspective. It employs a novel constrained LDA algorithm to incorporate pairwise document constraints, which are converted from user feedback about topics, to achieve topic model stability. Evaluation results demonstrate the advantages of our approach over previous methods.
  • Item
    Designing Speech, Acoustic and Multimodal Interactions
    (ACM, 2017-05-06) Munteanu, Cosmin; Irani, Pourang; Oviatt, Sharon; Aylett, Matthew; Penn, Gerald; Pan, Shimei; Sharma, Nikhil; Rudzicz, Frank; Gomez, Randy; Cowan, Ben; Nakamura, Keisuke
    Traditional interfaces are continuously being replaced by mobile, wearable, or pervasive interfaces. Yet when it comes to the input and output modalities enabling our interactions, we have yet to fully embrace some of the most natural forms of communication and information processing that humans possess: speech, language, gestures, thoughts. Very little HCI attention has been dedicated to designing and developing spoken language, acoustic-based, or multimodal interaction techniques, especially for mobile and wearable devices. In addition to the enormous, recent engineering progress in processing such modalities, there is now sufficient evidence that many real-life applications do not require 100% accuracy of processing multimodal input to be useful, particularly if such modalities complement each other. This multidisciplinary, one-day workshop will bring together interaction designers, usability researchers, and general HCI practitioners to analyze the opportunities and directions to take in designing more natural interactions especially with mobile and wearable devices, and to look at how we can leverage recent advances in speech, acoustic, and multimodal processing.
  • Item
    Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use Prediction
    (ACL, 2017-09) Ding, Tao; Bickel, Warren K.; Pan, Shimei
    In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different unsupervised feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook “likes” and “status updates” to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for illicit drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user's social media behavior (e.g., word usage) and substance use.
  • Item
    Social Media-based Substance Use Prediction
    (2017-05-31) Ding, Tao; Bickel, Warren K.; Pan, Shimei
    In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook `"likes" and "status updates" to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user's social media behavior (e.g., word usage) and substance use.
  • Item
    Supervising Unsupervised Open Information Extraction Models
    (ACL, 2019-11) Roy, Arpita; Park, Youngja; Lee, Taesung; Pan, Shimei
    We propose a novel supervised open information extraction (Open IE) framework that leverages an ensemble of unsupervised Open IE systems and a small amount of labeled data to improve system performance. It uses the outputs of multiple unsupervised Open IE systems plus a diverse set of lexical and syntactic information such as word embedding, part-of-speech embedding, syntactic role embedding and dependency structure as its input features and produces a sequence of word labels indicating whether the word belongs to a relation, the arguments of the relation or irrelevant. Comparing with existing supervised Open IE systems, our approach leverages the knowledge in existing unsupervised Open IE systems to overcome the problem of insufficient training data. By employing multiple unsupervised Open IE systems, our system learns to combine the strength and avoid the weakness in each individual Open IE system. We have conducted experiments on multiple labeled benchmark data sets. Our evaluation results have demonstrated the superiority of the proposed method over existing supervised and unsupervised models by a significant margin.
  • Item
    Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts
    (2017-09-21) Roy, Arpita; Park, Youngja; Pan, SHimei
    Word embedding is a Natural Language Processing (NLP) technique that automatically maps words from a vocabulary to vectors of real numbers in an embedding space. It has been widely used in recent years to boost the performance of a vari-ety of NLP tasks such as Named Entity Recognition, Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such as Word2Vec and GloVe work well when they are given a large text corpus. When the input texts are sparse as in many specialized domains (e.g., cybersecurity), these methods often fail to produce high-quality vectors. In this pa-per, we describe a novel method to train domain-specificword embeddings from sparse texts. In addition to domain texts, our method also leverages diverse types of domain knowledge such as domain vocabulary and semantic relations. Specifi-cally, we first propose a general framework to encode diverse types of domain knowledge as text annotations. Then we de-velop a novel Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text annotations in word em-bedding. We have evaluated our method on two cybersecurity text corpora: a malware description corpus and a Common Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have demonstrated the effectiveness of our method in learning domain-specific word embeddings.
  • Item
    Mitigating Demographic Bias in AI-based Resume Filtering
    (ACM, 2020-07-13) Deshpande, Ketki V.; Pan, Shimei; Foulds, James
    With increasing diversity in the labor market as well as the work force, employers receive resumes from an increasingly diverse population. However, studies and field experiments have confirmed the presence of bias in the labor market based on gender, race, and ethnicity. Many employers use automated resume screening to filter the many possible matches. Depending on how the automated screening algorithm is trained it can potentially exhibit bias towards a particular population by favoring certain socio-linguistic characteristics. The resume writing style and socio-linguistics are a potential source of bias as they correlate with protected characteristics such as ethnicity. A biased dataset is often translated into biased AI algorithms and de-biasing algorithms are being contemplated. In this work, we study the effects of socio-linguistic bias on resume to job description matching algorithms. We develop a simple technique, called fair-tf-idf, to match resumes with job descriptions in a fair way by mitigating the socio-linguistic bias.
  • Item
    Bayesian Modeling of Intersectional Fairness: The Variance of Bias
    (SIAM, 2020-01) Foulds, James; Islam, Rashidul; Keya, Kamrun Naher; Pan, Shimei
    Intersectionality is a framework that analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including race, gender, sexual orientation, class, and disability. Intersectionality theory therefore implies it is important that fairness in artificial intelligence systems be protected with regard to multi-dimensional protected attributes. However, the measurement of fairness becomes statistically challenging in the multi-dimensional setting due to data sparsity, which increases rapidly in the number of dimensions, and in the values per dimension. We present a Bayesian probabilistic modeling approach for the reliable, data-efficient estimation of fairness with multidimensional protected attributes, which we apply to two existing intersectional fairness metrics. Experimental results on census data and the COMPAS criminal justice recidivism dataset demonstrate the utility of our methodology, and show that Bayesian methods are valuable for the modeling and measurement of fairness in intersectional contexts.
  • Item
    Intersectional AI: A Study of How Information Science Students Think about Ethics and Their Impact
    (ACM, 2020-10-15) McDonald, Nora; Pan, Shimei
    Recent literature has demonstrated the limited and, in some instances, waning role of ethical training in computing classes in the US. The capacity for artificial intelligence (AI) to be inequitable or harmful is well documented, yet it's an issue that continues to lack apparent urgency or effective mitigation. The question we raise in this paper is how to prepare future generations to recognize and grapple with the ethical concerns of a range of issues plaguing AI, particularly when they are combined with surveillance technologies in ways that have grave implications for social participation and restriction—from risk assessment and bail assignment in criminal justice, to public benefits distribution and access to housing and other critical resources that enable security and success within society. The US is a mecca of information and computer science (IS and CS) learning for Asian students whose experiences as minorities renders them familiar with, and vulnerable to, the societal bias that feeds AI bias. Our goal was to better understand how students who are being educated to design AI systems think about these issues, and in particular, their sensitivity to intersectional considerations that heighten risk for vulnerable groups. In this paper we report on findings from qualitative interviews with 20 graduate students, 11 from an AI class and 9 from a Data Mining class. We find that students are not predisposed to think deeply about the implications of AI design for the privacy and well-being of others unless explicitly encouraged to do so. When they do, their thinking is focused through the lens of personal identity and experience, but their reflections tend to center on bias, an intrinsic feature of design, rather than on fairness, an outcome that requires them to imagine the consequences of AI. While they are, in fact, equipped to think about fairness when prompted by discussion and by design exercises that explicitly invite consideration of intersectionality and structural inequalities, many need help to do this empathy 'work.' Notably, the students who more frequently reflect on intersectional problems related to bias and fairness are also more likely to consider the connection between model attributes and bias, and the interaction with context. Our findings suggest that experience with identity-based vulnerability promotes more analytically complex thinking about AI, lending further support to the argument that identity-related ethics should be integrated into IS and CS curriculums, rather than positioned as a stand-alone course.