Matuszek, CynthiaJenkins, Patrick2021-09-012021-09-012020-01-0112267http://hdl.handle.net/11603/22874Grounded language acquisition is the modeling of language as it relates to physical objects in the world. Grounded language models are useful for creating an interface between robots and humans using natural language, but are ineffective when a robot enters a novel environment due to lack of training data. I create a novel grounded language dataset by capturing multi-angle high resolution color and depth images of household objects, then collecting natural language text and speech descriptions of the objects. This dataset is used to train a model that learns associations between the descriptions and the color and depth percepts. Vision and language domains are embedded into an intermediate lower dimensional space through manifold alignment. The model consists of two simultaneously trained neural nets, one each for vision and language. Triplet loss ensures that the two spaces are closely aligned in the embedded space by attracting positive associations and repelling negative ones. First, separate models are trained using the University of Washington RGB-D and UMBC GLD datasets to get baseline results for grounded language acquisition on domestic objects. Then the baseline model trained on the UW RGB-D data is fine tuned through a second round of training on UMBC GLD. This fine tuned model performs better than the model trained only on UMBC GLD, and in less training time. These experiments represent the first steps of the ability to transfer grounded language knowledge from previously trained models on large datasets onto new models operating on robots operating in novel domains.application:pdfComputer VisionDomain AdaptationGrounded LanguageNatural Language ProcessingRoboticsTransfer LearningTransfer Learning of Grounded Language Models For Use In Robotic SystemsText