PropBank-Powered Data Creation: Utilizing Sense-Role Labelling to Generate Disaster Scenario Data

Author/Creator ORCID

Department

Program

Citation of Original Publication

Shichman, Mollie Frances, Claire Bonial, Taylor A. Hudson, Austin Blodgett, Francis Ferraro, and Rachel Rudinger. “PropBank-Powered Data Creation: Utilizing Sense-Role Labelling to Generate Disaster Scenario Data.” In Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024, pages 1–10. Torino, Italia: ELRA and ICCL, 2024. https://aclanthology.org/2024.dmr-1.1.

Rights

This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain

Subjects

Abstract

For human-robot dialogue in a search-and-rescue scenario, a strong knowledge of the conditions and objects a robot will face is essential for effective interpretation of natural language instructions. In order to utilize the power of large language models without overwhelming the limited storage capacity of a robot, we propose PropBank-Powered Data Creation. PropBank-Powered Data Creation is an expert-in-the-loop data generation pipeline which creates training data for disaster-specific language models. We leverage semantic role labeling and Rich Event Ontology resources to efficiently develop seed sentences for fine-tuning a smaller, targeted model that could operate onboard a robot for disaster relief. We developed 32 sentence templates, which we used to make 2 seed datasets of 175 instructions for earthquake search and rescue and train derailment response. We further leverage our seed datasets as evaluation data to test our baseline fine-tuned models.