A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning
Links to Fileshttps://openreview.net/forum?id=Yx9jT3fkBaD
MetadataShow full item record
Type of Work2 files
conference papers and proceedings preprints
Citation of Original PublicationKebe, Gaoussou Youssouf et al.; A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning; NeurIPS 2021 Track Datasets and Benchmarks Round1 Submission, 8 June, 2021; https://openreview.net/forum?id=Yx9jT3fkBaD
RightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjectsgrounded language acquisition
natural language processing
Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.