Practical Cross-modal Manifold Alignment for Grounded Language

dc.contributor.authorNguyen, Andre T.
dc.contributor.authorRichards, Luke E.
dc.contributor.authorKebe, Gaoussou Youssouf
dc.contributor.authorRaff, Edward
dc.contributor.authorDarvish, Kasra
dc.contributor.authorFerraro, Frank
dc.contributor.authorMatuszek, Cynthia
dc.date.accessioned2021-01-26T17:56:59Z
dc.date.available2021-01-26T17:56:59Z
dc.description2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Nashville, TN, USA 19-25 June 2021
dc.description.abstractWe propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.en_US
dc.description.urihttps://ieeexplore.ieee.org/document/9522916
dc.format.extent9 pagesen_US
dc.genreconference papers and proceedingsen_US
dc.genrepreprints
dc.identifierdoi:10.13016/m28ofw-oyoc
dc.identifier.citationA. T. Nguyen et al., "Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 1613-1622, doi: 10.1109/CVPRW53098.2021.00177.en_US
dc.identifier.urihttp://hdl.handle.net/11603/20619
dc.language.isoen_USen_US
dc.publisherIEEE
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rightsThis work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. If preprints are posted after publication, use the rights note for the accepted version.  Postprint record must include citation, DOI, and rights statement: © 2021 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.subjectcomputer vision and pattern recognitionen_US
dc.subjectmachine learningen_US
dc.subjectroboticsen_US
dc.titlePractical Cross-modal Manifold Alignment for Grounded Languageen_US
dc.typeTexten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2009.05147.pdf
Size:
1.63 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: