Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities

Darvish, Kasra; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia

Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities

dc.contributor.author	Darvish, Kasra
dc.contributor.author	Raff, Edward
dc.contributor.author	Ferraro, Francis
dc.contributor.author	Matuszek, Cynthia
dc.date.accessioned	2024-09-24T08:59:50Z
dc.date.available	2024-09-24T08:59:50Z
dc.date.issued	2023-08-11
dc.description.abstract	Our study is motivated by robotics, where when dealing with robots or other physical systems, we often need to balance competing concerns of relying on complex, multimodal data coming from a variety of sensors with a general lack of large representative datasets. Despite the complexity of modern robotic platforms and the need for multimodal interaction, there has been little research on integrating more than two modalities in a low data regime with the real-world constraint that sensors fail due to obstructions or adverse conditions. In this work, we consider a case in which natural language is used as a retrieval query against objects, represented across multiple modalities, in a physical environment. We introduce extended multimodal alignment (EMMA), a method that learns to select the appropriate object while jointly refining modality-specific embeddings through a geometric (distance-based) loss. In contrast to prior work, our approach is able to incorporate an arbitrary number of views (modalities) of a particular piece of data. We demonstrate the efficacy of our model on a grounded language object retrieval scenario. We show that our model outperforms state-of-the-art baselines when little training data is available. Our code is available at https://github.com/kasraprime/EMMA.
dc.description.sponsorship	We would like to thank the anonymous reviewers for their helpful comments, questions, and suggestions. This material is also based on research that is in part supported by the NSF under Grant Nos. 2007290, 2024878, and 2145642; the Army Research Laboratory, Grant No. W911NF2120076; and by the Air Force Research Laboratory (AFRL), DARPA, for the KAIROS program under agreement number FA8750-19-2- 1003. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of the Air Force Research Laboratory (AFRL), DARPA, or the U.S. Government.
dc.description.uri	https://openreview.net/forum?id=cXa6Xdm0v7
dc.format.extent	17 pages
dc.genre	journal articles
dc.identifier	doi:10.13016/m2ldla-3lwi
dc.identifier.citation	Darvish, Kasra, Edward Raff, Francis Ferraro, and Cynthia Matuszek. “Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities.” Transactions on Machine Learning Research, August 11, 2023. https://openreview.net/forum?id=cXa6Xdm0v7.
dc.identifier.uri	http://hdl.handle.net/11603/36369
dc.language.iso	en_US
dc.publisher	OpenReview
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Data Science
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department
dc.relation.ispartof	UMBC Student Collection
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	UMBC Discovery, Research, and Experimental Analysis of Malware Lab (DREAM Lab)
dc.subject	UMBC Interactive Robotics and Language Lab (IRAL Lab)
dc.subject	UMBC Interactive Robotics and Language Lab
dc.subject	UMBC Ebiquity Research Group
dc.title	Multimodal Language Learning for Object Retrieval in Low Data Regimes in the Face of Missing Modalities
dc.type	Text
dcterms.creator	https://orcid.org/0000-0002-9900-1972
dcterms.creator	https://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1437_Multimodal_Language_Learn.pdf
Size:: 1.83 MB
Format:: Adobe Portable Document Format

Download

Collections

UMBC Faculty Collection
UMBC Computer Science and Electrical Engineering Department
UMBC Data Science
UMBC Student Collection