SPEECH VS TEXTUAL DATA FOR GROUNDED LANGUAGE LEARNING

dc.contributor.advisorMatuszek, Cynthia
dc.contributor.authorSachdeva, Rishabh
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:25Z
dc.date.available2021-09-01T13:55:25Z
dc.date.issued2020-01-20
dc.description.abstractIn this theses, we describe the compatibility of audio data with the Grounded Learning system adopted from text-only systems. My theses work lies in the junction of NLP, Speech, and Robotics. First, we conduct in-person user studies to collect audio descriptions of household objects in a controlled environment. In this work, we use category-based Grounded Learning System~\cite{pillai2018}. This system learns the meaning of words used in crowd-sourced descriptions by grounding them in the physical representation of the objects that the workers describe. We compare the performance of the category-based model with the in-lab collected speech data and crowd-sourced text data. We find that the system can learn color, object, and shape words with comparable performance. To expand the analysis, we collect natural language descriptions both in textual as well as speech format for various kitchen, office, and household items using the crowd-sourced platform. Our work involves an in-depth comparative and qualitative analysis of crowd-sourced speech and textual data. We compare the F1-scores generated for learned tokens using the category-based model for speech and text data collected using AMT. We find that the final averaged F1 scores of all the individual tokens learned are comparable in the two cases with no significant difference.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2haoq-ziuq
dc.identifier.other12209
dc.identifier.urihttp://hdl.handle.net/11603/22835
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Sachdeva_umbc_0434M_12209.pdf
dc.subjectGrounded Language Acquisition
dc.subjectMachine Learning
dc.subjectNLP
dc.subjectSpeech
dc.titleSPEECH VS TEXTUAL DATA FOR GROUNDED LANGUAGE LEARNING
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sachdeva_umbc_0434M_12209.pdf
Size:
10.36 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sachdeva-Rishabh_Open.pdf
Size:
475.65 KB
Format:
Adobe Portable Document Format
Description: