Practical Cross-modal Manifold Alignment for Grounded Language

Nguyen, Andre T.; Richards, Luke E.; Kebe, Gaoussou Youssouf; Raff, Edward; Darvish, Kasra; Ferraro, Frank; Matuszek, Cynthia

Practical Cross-modal Manifold Alignment for Grounded Language

dc.contributor.author	Nguyen, Andre T.
dc.contributor.author	Richards, Luke E.
dc.contributor.author	Kebe, Gaoussou Youssouf
dc.contributor.author	Raff, Edward
dc.contributor.author	Darvish, Kasra
dc.contributor.author	Ferraro, Frank
dc.contributor.author	Matuszek, Cynthia
dc.date.accessioned	2021-01-26T17:56:59Z
dc.date.available	2021-01-26T17:56:59Z
dc.description	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Nashville, TN, USA 19-25 June 2021
dc.description.abstract	We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.	en
dc.description.uri	https://ieeexplore.ieee.org/document/9522916
dc.format.extent	9 pages	en
dc.genre	preprints
dc.genre	conference papers and proceedings	en
dc.identifier	doi:10.13016/m28ofw-oyoc
dc.identifier.citation	A. T. Nguyen et al., "Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1613-1622, doi: 10.1109/CVPRW53098.2021.00177.	en
dc.identifier.uri	http://hdl.handle.net/11603/20619
dc.language.iso	en	en
dc.publisher	IEEE
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Student Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights	This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. If preprints are posted after publication, use the rights note for the accepted version. Postprint record must include citation, DOI, and rights statement: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.subject	computer vision and pattern recognition	en
dc.subject	machine learning	en
dc.subject	robotics	en
dc.title	Practical Cross-modal Manifold Alignment for Grounded Language	en
dc.type	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2009.05147.pdf
Size:: 1.63 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection
UMBC Student Collection