Assurance of Machine Learning for Human-Robot Interaction

Author/Creator ORCID

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Subjects

Abstract

The incredible advancements in artificial intelligence over the past decade have enabled technologies that once lived in research labs to now interact with users from all walks of life. As these agents evolve digitally and expand their physical presence through robotics, the risks associated with human interaction grow—necessitating stronger assurances. These risks stem from the inherent difficulty of deploying machine learning models, which must sense and interpret dynamic environments and human behavior, compared to more predictable, classical software systems. This thesis explores how deep learning can enhance human-robot interaction (HRI) by enabling general, flexible representations that support robust and unconstrained language grounding. Through the development of a neural object representation system, I demonstrate improved performance over prior category-based methods on a challenging, crowd-sourced dataset. Building on this, I introduce joint language-vision modeling, which further enhances generalization and usability, and extends the system to operate directly on speech—broadening accessibility for diverse user populations. However, the generalization power of deep learning introduces new challenges, especially in safety-critical scenarios involving physically embodied robots. To address this, I propose a data-centric threat model for adversarial attacks on vision systems, exposing the limitations of existing defenses. Extending this analysis to human-sensing systems, I identify disparities in adversarial robustness, particularly for users with diverse speech characteristics. Through a comprehensive case study, I show that while robustness training often entails performance trade-offs, rejectionbased defenses—augmented through sampling—can achieve a better balance between robustness, performance, and equity. Finally, I revisit concept-based learning through the lens of assurance, introducing end-to-end differentiable neurosymbolic reasoning to align neural perception with symbolic tasks in both vision and speech. These methods improve interpretability, robustness, and fairness, while enabling alignment verification. Collectively, this work reflects a broader methodology: advancing capabilities, quantifying emerging risks, and designing mitigations that inform new paradigms for assured AI. This cycle—of innovation, analysis, and refinement—serves as a foundation for developing safe, equitable, and assured AI systems.