UMBC College of Engineering and Information Technology Dean's Office
Permanent URI for this collectionhttp://hdl.handle.net/11603/7919
Browse
Recent Submissions
Item What is the Point? Evaluating the Structure, Color, and Semantic Traits of Computer Vision Point Clouds of Vegetation(MDPI, 2017-04-09) Dandois, Jonathan P.; Baker, Matthew; Olano, Marc; Parker, Geoffrey G.; Ellis, Erle C.Remote sensing of the structural and spectral traits of vegetation is being transformed by structure from motion (SFM) algorithms that combine overlapping images to produce three-dimensional (3D) red-green-blue (RGB) point clouds. However, much remains unknown about how these point clouds are used to observe vegetation, limiting the understanding of the results and future applications. Here, we examine the content and quality of SFM point cloud 3D-RGB fusion observations. An SFM algorithm using the Scale Invariant Feature Transform (SIFT) feature detector was applied to create the 3D-RGB point clouds of a single tree and forest patches. The fusion quality was evaluated using targets placed within the tree and was compared to fusion measurements from terrestrial LIDAR (TLS). K-means clustering and manual classification were used to evaluate the semantic content of SIFT features. When targets were fully visible in the images, SFM assigned color in the correct place with a high accuracy (93%). The accuracy was lower when targets were shadowed or obscured (29%). Clustering and classification revealed that the SIFT features highlighted areas that were brighter or darker than their surroundings, showing little correspondence with canopy objects like leaves or branches, though the features showed some relationship to landscape context (e.g., canopy, pavement). Therefore, the results suggest that feature detectors play a critical role in determining how vegetation is sampled by SFM. Future research should consider developing feature detectors that are optimized for vegetation mapping, including extracting elements like leaves and flowers. Features should be considered the fundamental unit of SFM mapping, like the pixel in optical imaging and the laser pulse of LIDAR. Under optimal conditions, SFM fusion accuracy exceeded that of TLS, and the two systems produced similar representations of the overall tree shape. SFM is the lower-cost solution for obtaining accurate 3D-RGB fusion measurements of the outer surfaces of vegetation, the critical zone of interaction between vegetation, light, and the atmosphere from leaf to canopy scales.Item Mitigating Demographic Bias in AI-based Resume Filtering(ACM, 2020-07-13) Deshpande, Ketki V.; Pan, Shimei; Foulds, JamesWith increasing diversity in the labor market as well as the work force, employers receive resumes from an increasingly diverse population. However, studies and field experiments have confirmed the presence of bias in the labor market based on gender, race, and ethnicity. Many employers use automated resume screening to filter the many possible matches. Depending on how the automated screening algorithm is trained it can potentially exhibit bias towards a particular population by favoring certain socio-linguistic characteristics. The resume writing style and socio-linguistics are a potential source of bias as they correlate with protected characteristics such as ethnicity. A biased dataset is often translated into biased AI algorithms and de-biasing algorithms are being contemplated. In this work, we study the effects of socio-linguistic bias on resume to job description matching algorithms. We develop a simple technique, called fair-tf-idf, to match resumes with job descriptions in a fair way by mitigating the socio-linguistic bias.Item Bayesian Modeling of Intersectional Fairness: The Variance of Bias(SIAM, 2020-01) Foulds, James; Islam, Rashidul; Keya, Kamrun Naher; Pan, ShimeiIntersectionality is a framework that analyzes how interlocking systems of power and oppression affect individuals along overlapping dimensions including race, gender, sexual orientation, class, and disability. Intersectionality theory therefore implies it is important that fairness in artificial intelligence systems be protected with regard to multi-dimensional protected attributes. However, the measurement of fairness becomes statistically challenging in the multi-dimensional setting due to data sparsity, which increases rapidly in the number of dimensions, and in the values per dimension. We present a Bayesian probabilistic modeling approach for the reliable, data-efficient estimation of fairness with multidimensional protected attributes, which we apply to two existing intersectional fairness metrics. Experimental results on census data and the COMPAS criminal justice recidivism dataset demonstrate the utility of our methodology, and show that Bayesian methods are valuable for the modeling and measurement of fairness in intersectional contexts.Item Fair Representation Learning for Heterogeneous Information Networks(AAAI, 2021-05-22) Zeng, Ziqian; Islam, Rashidul; Keya, Kamrun Naher; Foulds, James; Song, Yangqiu; Pan, ShimeiRecently, much attention has been paid to the societal impact of AI, especially concerns regarding its fairness. A growing body of research has identified unfair AI systems and proposed methods to debias them, yet many challenges remain. Representation learning methods for Heterogeneous Information Networks (HINs), fundamental building blocks used in complex network mining, have socially consequential applications such as automated career counseling, but there have been few attempts to ensure that it will not encode or amplify harmful biases, e.g. sexism in the job market. To address this gap, we propose a comprehensive set of de-biasing methods for fair HINs representation learning, including sampling-based, projection-based, and graph neural networks (GNNs)-based techniques. We systematically study the behavior of these algorithms, especially their capability in balancing the trade-off between fairness and prediction accuracy. We evaluate the performance of the proposed methods in an automated career counseling application where we mitigate gender bias in career recommendation. Based on the evaluation results on two datasets, we identify the most effective fair HINs representation learning techniques under different conditions.Item Can We Obtain Fairness For Free?(ACM, 2021-07-30) Islam, Rashidul; Pan, Shimei; Foulds, JamesThere is growing awareness that AI and machine learning systems can in some cases learn to behave in unfair and discriminatory ways with harmful consequences. However, despite an enormous amount of research, techniques for ensuring AI fairness have yet to see widespread deployment in real systems. One of the main barriers is the conventional wisdom that fairness brings a cost in predictive performance metrics such as accuracy which could affect an organization's bottom-line. In this paper we take a closer look at this concern. Clearly fairness/performance trade-offs exist, but are they inevitable? In contrast to the conventional wisdom, we find that it is frequently possible, indeed straightforward, to improve on a trained model's fairness without sacrificing predictive performance. We systematically study the behavior of fair learning algorithms on a range of benchmark datasets, showing that it is possible to improve fairness to some degree with no loss (or even an improvement) in predictive performance via a sensible hyper-parameter selection strategy. Our results reveal a pathway toward increasing the deployment of fair AI methods, with potentially substantial positive real-world impacts.Item ALDAS: Audio-Linguistic Data Augmentation for Spoofed Audio Detection(2024-10-21) Khanjani, Zahra; Mallinson, Christine; Foulds, James; Janeja, VandanaSpoofed audio, i.e. audio that is manipulated or AI-generated deepfake audio, is difficult to detect when only using acoustic features. Some recent innovative work involving AI-spoofed audio detection models augmented with phonetic and phonological features of spoken English, manually annotated by experts, led to improved model performance. While this augmented model produced substantial improvements over traditional acoustic features based models, a scalability challenge motivates inquiry into auto labeling of features. In this paper we propose an AI framework, Audio-Linguistic Data Augmentation for Spoofed audio detection (ALDAS), for auto labeling linguistic features. ALDAS is trained on linguistic features selected and extracted by sociolinguistics experts; these auto labeled features are used to evaluate the quality of ALDAS predictions. Findings indicate that while the detection enhancement is not as substantial as when involving the pure ground truth linguistic features, there is improvement in performance while achieving auto labeling. Labels generated by ALDAS are also validated by the sociolinguistics experts.Item Creating Geospatial Trajectories from Human Trafficking Text Corpora(2024-05-09) Karabatis, Saydeh N.; Janeja, VandanaHuman trafficking is a crime that affects the lives of millions of people across the globe. Traffickers exploit the victims through forced labor, involuntary sex, or organ harvesting. Migrant smuggling could also be seen as a form of human trafficking when the migrant fails to pay the smuggler and is forced into coerced activities. Several news agencies and anti-trafficking organizations have reported trafficking survivor stories that include the names of locations visited along the trafficking route. Identifying such routes can provide knowledge that is essential to preventing such heinous crimes. In this paper we propose a Narrative to Trajectory (N2T) information extraction system that analyzes reported narratives, extracts relevant information through the use of Natural Language Processing (NLP) techniques, and applies geospatial augmentation in order to automatically plot trajectories of human trafficking routes. We evaluate N2T on human trafficking text corpora and demonstrate that our approach of utilizing data preprocessing and augmenting database techniques with NLP libraries outperforms existing geolocation detection methods.Item ALDAS: Audio-Linguistic Data Augmentation for Spoofed Audio Detection(2024-10-21) Khanjani, Zahra; Mallinson, Christine; Foulds, James; Janeja, VandanaSpoofed audio, i.e. audio that is manipulated or AI-generated deepfake audio, is difficult to detect when only using acoustic features. Some recent innovative work involving AI-spoofed audio detection models augmented with phonetic and phonological features of spoken English, manually annotated by experts, led to improved model performance. While this augmented model produced substantial improvements over traditional acoustic features based models, a scalability challenge motivates inquiry into auto labeling of features. In this paper we propose an AI framework, Audio-Linguistic Data Augmentation for Spoofed audio detection (ALDAS), for auto labeling linguistic features. ALDAS is trained on linguistic features selected and extracted by sociolinguistics experts; these auto labeled features are used to evaluate the quality of ALDAS predictions. Findings indicate that while the detection enhancement is not as substantial as when involving the pure ground truth linguistic features, there is improvement in performance while achieving auto labeling. Labels generated by ALDAS are also validated by the sociolinguistics experts.Item Striving For More Efficient And Equitable Healthcare: Ian Stockwell Wins Major NIH Grant(UMBC News, 2024-11-12) Meyers, CatherineItem Let Students Take the Wheel: Introducing Post-Quantum Cryptography with Active Learning(2024-10-17) Jamshidi, Ainaz; Kaur, Khushdeep; Gangopadhyay, Aryya; Zhang, LeiQuantum computing presents a double-edged sword: while it has the potential to revolutionize fields such as artificial intelligence, optimization, healthcare, and so on, it simultaneously poses a threat to current cryptographic systems, such as public-key encryption. To address this threat, post-quantum cryptography (PQC) has been identified as the solution to secure existing software systems, promoting a national initiative to prepare the next generation with the necessary knowledge and skills. However, PQC is an emerging interdisciplinary topic, presenting significant challenges for educators and learners. This research proposes a novel active learning approach and assesses the best practices for teaching PQC to undergraduate and graduate students in the discipline of information systems. Our contributions are two-fold. First, we compare two instructional methods: 1) traditional faculty-led lectures and 2) student-led seminars, both integrated with active learning techniques such as hands-on coding exercises and Kahoot games. The effectiveness of these methods is evaluated through student assessments and surveys. Second, we have published our lecture video, slides, and findings so that other researchers and educators can reuse the courseware and materials to develop their own PQC learning modules. We employ statistical analysis (e.g., t-test and chi-square test) to compare the learning outcomes and students' feedback between the two learning methods in each course. Our findings suggest that student-led seminars significantly enhance learning outcomes, particularly for graduate students, where a notable improvement in comprehension and engagement is observed. Moving forward, we aim to scale these modules to diverse educational contexts and explore additional active learning and experiential learning strategies for teaching complex concepts of quantum information science.Item Hearing the Voice of Software Practitioners on Technical Debt Monitoring: Understanding Monitoring Practices and the Practices' Avoidance Reasons(Brazilian Computing Society, 2024-08-30) Freire, Sávio; Rios, Nicolli; Pérez, Boris; Castellanos, Camilo; Correal, Darío; Ramač, Robert; Mandić, Vladimir; Taušan, Nebojša; López, Gustavo; Pacheco, Alexia; Mendonça, Manoel; Falessi, Davide; Izurieta, Clemente; Seaman, Carolyn; Spínola, RodrigoContext. Technical debt (TD) monitoring allows software professionals to track the evolution of debt incurred in their projects. The technical literature has listed several practices used in the software industry to monitor indebtedness. However, there is limited evidence on the use and on the reasons to avoid using these practices. Aims. This work aims to investigate, from the point of view of software practitioners, the practices used for monitoring TD items, and the practice avoidance reasons (PARs) curbing the monitoring of TD items. Method. We analyze quantitatively and qualitatively a set of 653 answers collected with a family of industrial surveys distributed in six countries. Results. Practitioners are prone to monitor TD items, revealing 46 practices for monitoring the debt and 35 PARs for explaining TD non-monitoring. Both practices and PARs are strongly associated with planning and management issues. The study also shows the relationship found among practices, PARs and types of debt and presents a conceptual map that relates practices and PARs with their categories. Conclusion. The results of this study add to a practitioners’ capability to monitor TD items by revealing the monitoring practices, PARs and their relationship with different TD types.Item Flood-ResNet50: Optimized Deep Learning Model for Efficient Flood Detection on Edge Device(IEEE, 2024-03-19) Khan, Md Azim; Ahmed, Nadeem; Padela, Joyce; Raza, Muhammad Shehrose; Gangopadhyay, Aryya; Wang, Jianwu; Foulds, James; Busart, Carl; Erbacher, Robert F.Floods are highly destructive natural disasters that result in significant economic losses and endanger human and wildlife lives. Efficiently monitoring Flooded areas through the utilization of deep learning models can contribute to mitigating these risks. This study focuses on the deployment of deep learning models specifically designed for classifying flooded and non-flooded in UAV images. In consideration of computational costs, we propose modified version of ResNet50 called Flood-ResNet50. By incorporating additional layers and leveraging transfer learning techniques, Flood-ResNet50 achieves comparable performance to larger models like VGG16/19, AlexNet, DenseNet161, EfficientNetB7, Swin(small), and vision transformer. Experimental results demonstrate that the proposed modification of ResNet50, incorporating additional layers, achieves a classification accuracy of 96.43%, F1 score of 86.36%, Recall of 81.11%, Precision of 92.41 %, model size 98MB and FLOPs 4.3 billions for the FloodNet dataset. When deployed on edge devices such as the Jetson Nano, our model demonstrates faster inference speed (820 ms), higher throughput (39.02 fps), and lower average power consumption (6.9 W) compared to larger ResNet101 and ResNet152 models.Item TSSA: Two-Step Semi-Supervised Annotation for Radargrams on the Greenland Ice Sheet(IEEE, 2023-10-20) Jebeli, Atefeh; Tama, Bayu Adhi; Janeja, Vandana; Holschuh, Nicholas; Jensen, Claire; Morlighem, Mathieu; MacGregor, Joseph A.; Fahnestock, Mark A.Ice-penetrating radar surveys have been conducted across the Greenland Ice Sheet since the 1960s, producing radargrams that measure ice thickness and detect the ice sheet’s radiostratigraphy. However, these radargrams are relatively under-explored and not yet fully annotated, mapped, or interpreted glaciologically. We aim to move towards automatic radargram annotation using deep learning-based methods. To provide a training set for these methods, we develop a two-step semi-supervised annotation (TSSA) approach that uses an existing unsupervised layer annotation (ARESELP) method and a deep learning-based segmentation approach (U-Net) to detect surface, and bottom reflectors (representing the bedrock) layers in radargrams. Here we focus on two evaluations of our approach: 1. Surface and bottom annotations; and 2. Data augmentation and transfer learning techniques for improving the performance of deep learning methods. Our study is a foundation for improving the efficacy of AI-based methods for auto-annotation of radargrams, where the training set is generated seamlessly through unsupervised learning.Item A Comprehensive View on TD Prevention Practices and Reasons for not Preventing It(ACM, 2024-06-28) Freire, Sávio; Pacheco, Alexia; Rios, Nicolli; Pérez, Boris; Castellanos, Camilo; Correal, Darío; Rama?, Robert; Mandi?, Vladimir; Taušan, Nebojša; López, Gustavo; Mendonça, Manoel; Falessi, Davide; Izurieta, Clemente; Seaman, Carolyn; Spínola, RodrigoContext. Technical debt (TD) prevention allows software practitioners to apply practices to avoid potential TD items in their projects. Aims. To uncover and prioritize, from the point of view of software practitioners, the practices that could be used to avoid TD items, the relations between these practices and the causes of TD, and the practice avoidance reasons (PARs) that could explain the failure to prevent TD. Method. We analyze data collected from six replications of a global industrial family of surveys on TD, totaling 653 answers. We also conducted a follow up survey to understand the importance level of analyzed data. Results. Most practitioners indicated that TD could be prevented, revealing 89 prevention practices and 23 PARs for explaining the failure to prevent TD. The paper identifies statistically significant relationships between preventive practices and certain causes of TD. Further, it prioritizes the list of practices, PARs, and relationships regarding their level of importance for TD prevention based on the opinion of software practitioners. Conclusion. This work organizes TD prevention practices and PARs in a conceptual map and the relationships between practices and causes of TD in a Sankey diagram to help the visualization of the body of knowledge reported in this study.Item Development of chemically crosslinked PEG-PAA hydrogels suitable for engineering of the vascularized outer retina(ARVO, 2022-06-01) Pandala, Narendra; LaScola, Michael; Mulfaul, Kelly; Stone, Edwin M.; Mullins, Robert F.; Tucker, Budd A.; Lavik, ErinTo engineer a micorphysiologic system that more accurately recapitulates the vascularized outer retina suitable for evaluating AMD pathology and development of novel therapeutics. A hydrogel library based on poly (ethylene glycol) (PEG), poly-L-lysine (PLL) and poly(allylamine) (PAA) was generated using succinimide and free amine reaction chemistry. Cellular compatibility was evaluated using a rat endothelial cell line and human iPSC-derived choroidal endothelial cells generated via directed differentiation and CD31 magnetic bead immunopanning. Cell health and identity was evaluated using a series of live dead assays and immunofluorescence staining. A library of 12 synthetic, chemically crosslinked, hydrogels with tunable mechanical and degradation properties were developed. Hydrogels with a lower amine content were found to have superior endothelial cell compatibility. We hypothesize that this is due to the cell surface disrobing characteristics of the polycations presents in the gels. Hydrogels with a higher polycation concentration showed relatively poor endothelial cell compatibility. Gels with optimal compatibility were found to promote endothelial cell spreading, migration, and capillary network-like formation. In this study novel hydrogels with unique mechanical and degradation properties were generated via chemical crosslinking of PEG, PLL and PAA. Low amine hydrogels were found to be superior for promoting endothelial cell spreading, migration and vascular tube formation. To create in vitro models that more accurately recapitulate the choriocapillaris, optimized hydrogels will be used as a bioink for screen-based printing of rat and human vascular endothelial cells. This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.Item Adopting Foundational Data Science Curriculum with Diverse Institutional Contexts(ACM, 2024-03-07) Janeja, Vandana; Sanchez, Maria; Khoo, Yi Xuan; Von Vacano, Claudia; Chen, Lujie KarenThe prevalence of data across all disciplines and the large workforce demand from industry has led to the rise in interest of data science courses. Educators are increasingly recognizing the value of building communities of practice and adapting and translating courses and programs that have been shown to be successful and sharing lessons learned in increasing diversity in data science education. We describe and analyze our experiences translating a lower-division data science curriculum from one university, University of California, Berkeley, to another setting with very different student populations and institutional context, University of Maryland, Baltimore County (UMBC). We present our findings from student interviews across two semesters of the course offering at UMBC specifically focusing on the challenges and positive experiences that the students had in the UMBC course. We highlight lessons learned to reflect on the existing large scale program at UC Berkeley, its adaptation and opportunities for increasing diversity in new settings. Our findings emphasize the importance of adapting courses and programs to existing curricula, student populations, cyberinfrastructure, and faculty and staff resources. Smaller class sizes open up the possibility of more individualized assignments, tailored to the majors, career interests, and social change motivations of diverse students. While students across institutional contexts may need varying degrees of support, we found that often students from diverse backgrounds, if engaged deeply, show significant enthusiasm for data science and its applications.Item Wearable sensors and infrared cameras: Introducing UMBC’s User Studies Lab(UMBC News, 2020-02-05) Mastrola, Megan HanksItem Narrative to Trajectory (N2T+): Extracting Routes of Life or Death from Human Tra!icking Text Corpora(2023-08-06) Karabatis, Saydeh N.; Janeja, VandanaClimate change and political unrest in certain regions of the world are imposing extreme hardship on many communities and are forcing millions of vulnerable populations to abandon their homelands and seek refuge in safer lands. As international laws are not fully set to deal with the migration crisis, people are relying on networks of exploiting smugglers to escape the devastation in order to live in stability. During the smuggling journey, migrants can become victims of human trafficking if they fail to pay the smuggler and may be forced into coerced labor. Government agencies and anti- trafficking organizations try to identify the trafficking routes based on stories of survivors in order to gain knowledge and help prevent such crimes. In this paper, we propose a system called Narrative to Trajectory (N2T⁺), which extracts trajectories of trafficking routes. N2T⁺ uses Data Science and Natural Language Processing techniques to analyze trafficking narratives, automatically extract relevant location names, disambiguate possible name ambiguities, and plot the trafficking route on a map. In a comparative evaluation we show that the proposed multi-dimensional approach offers significantly higher geolocation detection than other state of the art techniques.Item Audio deepfakes: A survey(Frontiers, 2023-01-09) Khanjani, Zahra; Watson, Gabrielle; Janeja, VandanaA deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English.Item Multi-domain Anomalous Relationships in Heterogeneous Temporal Data(2023-12-06) Ale, Tolulope; Janeja, VandanaThe Arctic region is crucial to global climate stability. However, recent years have witnessed periods of extreme snow and ice melt, with rising temperatures that double the global average. These are not isolated events. They are the result of intricate interconnections across distinct domains. The challenge, therefore, lies not in understanding these individual domains, such as temperature, and radiation, but in decoding the inter-domain relationships inducing these polar anomalies. To address this, our study presents a novel framework aimed at mining these inter-domain relationships to explain such anomalies and the relationship across time series features comprehensively. These features may be selected from the same or different domains. Such anomalous relationships across features could help detect interesting phenomena such as extreme snow melt, and cloud cover and help identify time periods of interest when such relationships are more prevalent. We extracted the anomalous intervals in each domain using the Poisson Distribution model of rSatScan, then leveraged the concept of Direct Overlap and Proximity of anomalies to identify the direct and time-delayed temporal association (delayed correlation) between anomalies across features. The concept helps us understand how events in one domain may be associated with events in another domain during specific time periods using association rule mining. We evaluated our approach using ERA5 reanalysis data, and validated the identified anomalies against ground truth and evaluated the strength of the generated association rules using metrics like confidence and lift. Notably, several of our identified rules were consistent with findings confirmed by domain experts.
- «
- 1 (current)
- 2
- 3
- »