Speaker-Based Variability in Robotic Spoken Language Grounding
dc.contributor.advisor | Matuszek, Cynthia | |
dc.contributor.author | Kebe, Gaoussou Youssouf | |
dc.contributor.department | Computer Science and Electrical Engineering | |
dc.contributor.program | Computer Science | |
dc.date.accessioned | 2023-07-07T16:02:16Z | |
dc.date.available | 2023-07-07T16:02:16Z | |
dc.date.issued | 2022-01-01 | |
dc.description.abstract | Robots in human spaces need to be able to understand human-provided natural language instructions in the context of their physical environment. Learning to understand grounded language, which connects natural language to percepts, is a critical research area. However, the majority of existing efforts relies on highly curated text and ignores the noise and variance present in end-user speech. Existing speech-based grounded language learning works require an extensive amount of speech data. Additionally, variation in speech characteristics can cause challenges for grounding models, and prior works do not investigate the difference in performance between demographic groups. In this thesis, I train and evaluate language grounding models on collected spoken and textual descriptions of common household objects. I leverage recent work in self-supervised speech representation models to learn groundings without the interference of transcriptions as an intermediate representation. The goal is to eliminate the effects of off-the-shelf speech-to-text models as a potential source of bias. The experimental results suggest that this approach can make language grounding systems more inclusive towards accented speakers and increase general performance. | |
dc.format | application:pdf | |
dc.genre | thesis | |
dc.identifier | doi:10.13016/m272dp-jkbx | |
dc.identifier.other | 12594 | |
dc.identifier.uri | http://hdl.handle.net/11603/28469 | |
dc.language | en | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Collection | |
dc.relation.ispartof | UMBC Theses and Dissertations Collection | |
dc.relation.ispartof | UMBC Graduate School Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.rights | This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu | |
dc.source | Original File Name: Kebe_umbc_0434M_12594.pdf | |
dc.subject | Grounded Language Learning | |
dc.subject | Multimodal Learning | |
dc.subject | Natural Language Processing | |
dc.subject | Spoken Language Grounding | |
dc.title | Speaker-Based Variability in Robotic Spoken Language Grounding | |
dc.type | Text | |
dcterms.accessRights | Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission. | |
dcterms.accessRights | Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission. |
Files
Original bundle
1 - 1 of 1