A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle

dc.contributor.authorRubinstein, Jacob
dc.contributor.authorFerraro, Francis
dc.contributor.authorMatuszek, Cynthia
dc.contributor.authorEngel, Don
dc.date.accessioned2024-03-27T13:26:15Z
dc.date.available2024-03-27T13:26:15Z
dc.date.issued2024-01-01
dc.description.abstractVirtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.
dc.description.urihttps://www.computer.org/csdl/proceedings-article/aixvr/2024/720200a281/1UUdQUVvKG4
dc.format.extent8 pages
dc.genreconference papers and proceedings
dc.genrepostprints
dc.identifierdoi:10.13016/m2rtad-mili
dc.identifier.citationRubinstein, Jacob, Francis Ferraro, Cynthia Matuszek, and Don Engel. “A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle,” 281–88. IEEE Computer Society, 2024. https://doi.org/10.1109/AIxVR59861.2024.00047.
dc.identifier.urihttps://doi.ieeecomputersociety.org/10.1109/AIxVR59861.2024.00047
dc.identifier.urihttp://hdl.handle.net/11603/32688
dc.language.isoen_US
dc.publisherIEEE
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rights© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.titleA Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-0696-3452
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120
dcterms.creatorhttps://orcid.org/0000-0003-2838-0140

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AIxVR_2024-Generating_and_Analyzing_Descriptions_of_3D_Models_From_Multiple_Perspectives_Using_CLIP.pdf
Size:
5.13 MB
Format:
Adobe Portable Document Format