Language and Gesture in Virtual Reality: Is a Gesture Worth 1000 Words?

Department

Program

Citation of Original Publication

Higgins, Padraig, Cory J. Hayes, Stephanie Lukin, and Cynthia Matuszek. “Language and Gesture in Virtual Reality: Is a Gesture Worth 1000 Words?” Proceedings of the AAAI Symposium Series 7, no. 1 (2025): 658–62. https://doi.org/10.1609/aaaiss.v7i1.36947.

Rights

This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain

Abstract

Robots are increasingly incorporating multimodalinformation and human signals to resolve ambiguity inembodied human-robot interaction. Harnessing signals suchas gestures may expedite robot exploration in large,outdoor urban environments for supporting disaster recoveryoperations, where speech may be unclear due to noise or thechallenges of a dynamic and dangerous environment. Despitethis potential, capturing human gesture and properlygrounding it to crowded, outdoor environments remains achallenge. In this work, we propose a method to model humangesture and ground it to spoken language instructions givento a robot for execution in large spaces. We implement ourmethod in virtual reality to develop a workflow for fasterfuture data collection. We present a series of proposedexperiments that compare a language-only baseline to ourproposed language supplemented by gesture approach, anddiscuss how our approach has the potential to reinforce thehuman’s intent and detect discrepancies in gesture andspoken instructions in these large and crowded environments.