Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

dc.contributor.authorKebe, Gaoussou Youssouf
dc.contributor.authorRichards, Luke E.
dc.contributor.authorRaff, Edward
dc.contributor.authorFerraro, Francis
dc.contributor.authorMatuszek, Cynthia
dc.date.accessioned2022-01-25T15:36:21Z
dc.date.available2022-01-25T15:36:21Z
dc.date.issued2022-06-28
dc.description.abstractLearning to understand grounded language, which connects natural language to percepts, is a critical research area. Prior work in grounded language acquisition has focused primarily on textual inputs. In this work we demonstrate the feasibility of performing grounded language acquisition on paired visual percepts and raw speech inputs. This will allow interactions in which language about novel tasks and environments is learned from end users, reducing dependence on textual inputs and potentially mitigating the effects of demographic bias found in widely available speech recognition systems. We leverage recent work in self-supervised speech representation models and show that learned representations of speech can make language grounding systems more inclusive towards specific groups while maintaining or even increasing general performance.en_US
dc.description.sponsorshipThis material is based in part upon work supported by the National Science Foundation under Grant Nos. 1813223, 1920079, 1940931, and 2024878.en_US
dc.description.urihttps://ojs.aaai.org/index.php/AAAI/article/view/21335en_US
dc.format.extent10 pagesen_US
dc.genrejournal articlesen_US
dc.identifierdoi:10.13016/m2wpwb-w7d1
dc.identifier.citationKebe, Gaoussou Youssouf, Luke E. Richards, Edward Raff, Francis Ferraro, and Cynthia Matuszek. 2022. “Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (10):10884-93. https://doi.org/10.1609/aaai.v36i10.21335.
dc.identifier.urihttp://hdl.handle.net/11603/24077
dc.identifier.urihttps://doi.org/10.1609/aaai.v36i10.21335
dc.language.isoen_USen_US
dc.publisherPKP
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.rightsAttribution 4.0 International (CC BY 4.0)*
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/*
dc.subjectUMBC Interactive Robotics and Language Laben_US
dc.subjectUMBC Ebiquity Research Group
dc.titleBridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speechen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0001-5744-8736en_US
dcterms.creatorhttps://orcid.org/0000-0002-9900-1972en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
21335-Article Text-25348-1-2-20220628.pdf
Size:
667.7 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: