A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

Kebe, Gaoussou Youssouf; Higgins, Padraig; Jenkins, Patrick; Darvish, Kasra; Sachdeva, Rishabh; Barron, Ryan; Winder, John; Engel, Don; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia

A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

dc.contributor.author	Kebe, Gaoussou Youssouf
dc.contributor.author	Higgins, Padraig
dc.contributor.author	Jenkins, Patrick
dc.contributor.author	Darvish, Kasra
dc.contributor.author	Sachdeva, Rishabh
dc.contributor.author	Barron, Ryan
dc.contributor.author	Winder, John
dc.contributor.author	Engel, Don
dc.contributor.author	Raff, Edward
dc.contributor.author	Ferraro, Francis
dc.contributor.author	Matuszek, Cynthia
dc.date.accessioned	2023-06-09T18:08:20Z
dc.date.available	2023-06-09T18:08:20Z
dc.date.issued	2021-07-29
dc.description	35th Conference on Neural Information Processing Systems (NeurIPS 2021), virtual-only conference, December 6-14, 2021.	en
dc.description.abstract	Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.	en
dc.description.sponsorship	This material is based in part upon work supported by the National Science Foundation under Grant Nos. 1637937, 1813223, 1940931, 2024878, and 1920079. This material is also based on research that is in part supported by the Air Force Research Laboratory (AFRL), DARPA, for the KAIROS program under agreement number FA8750-19-2-1003. The U.S.Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of the Air Force Research Laboratory (AFRL), DARPA, or the U.S. Government.	en
dc.description.uri	https://openreview.net/forum?id=Yx9jT3fkBaD	en
dc.format.extent	15 pages	en
dc.genre	conference papers and proceedings	en
dc.genre	preprints	en
dc.identifier	doi:10.13016/m2vq9w-k2sk
dc.identifier.uri	http://hdl.handle.net/11603/28151
dc.language.iso	en	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Office for the Vice President of Research & Creative Achievement (ORCA)
dc.relation.ispartof	UMBC Student Collection
dc.rights	Attribution 4.0 International	*
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.title	A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning	en
dc.type	Text	en
dcterms.creator	https://orcid.org/0000-0003-2838-0140	en
dcterms.creator	https://orcid.org/0000-0002-9900-1972	en
dcterms.creator	https://orcid.org/0000-0003-1383-8120	en