A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

dc.contributor.authorKebe, Gaoussou Youssouf
dc.contributor.authorHiggins, Padraig
dc.contributor.authorJenkins, Patrick
dc.contributor.authorDarvish, Kasra
dc.contributor.authorSachdeva, Rishabh
dc.contributor.authorBarron, Ryan
dc.contributor.authorWinder, John
dc.contributor.authorEngel, Don
dc.contributor.authorRaff, Edward
dc.contributor.authorFerraro, Francis
dc.contributor.authorMatuszek, Cynthia
dc.date.accessioned2023-06-09T18:08:20Z
dc.date.available2023-06-09T18:08:20Z
dc.date.issued2021-07-29
dc.description35th Conference on Neural Information Processing Systems (NeurIPS 2021), virtual-only conference, December 6-14, 2021.en_US
dc.description.abstractGrounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.en_US
dc.description.sponsorshipThis material is based in part upon work supported by the National Science Foundation under Grant Nos. 1637937, 1813223, 1940931, 2024878, and 1920079. This material is also based on research that is in part supported by the Air Force Research Laboratory (AFRL), DARPA, for the KAIROS program under agreement number FA8750-19-2-1003. The U.S.Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of the Air Force Research Laboratory (AFRL), DARPA, or the U.S. Government.en_US
dc.description.urihttps://openreview.net/forum?id=Yx9jT3fkBaDen_US
dc.format.extent15 pagesen_US
dc.genreconference papers and proceedingsen_US
dc.genrepreprintsen_US
dc.identifierdoi:10.13016/m2vq9w-k2sk
dc.identifier.urihttp://hdl.handle.net/11603/28151
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Office for the Vice President of Research & Creative Achievement (ORCA)
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.titleA Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learningen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0003-2838-0140en_US
dcterms.creatorhttps://orcid.org/0000-0002-9900-1972en_US
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120en_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
41_a_spoken_language_dataset_of_d.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
41_a_spoken_language_dataset_of_d-Supplementary Material.zip
Size:
2.44 MB
Format:
Unknown data format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: