PropBank-Powered Data Creation: Utilizing Sense-Role Labelling to Generate Disaster Scenario Data
Links to Files
Permanent Link
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Shichman, Mollie Frances, Claire Bonial, Taylor A. Hudson, Austin Blodgett, Francis Ferraro, and Rachel Rudinger. “PropBank-Powered Data Creation: Utilizing Sense-Role Labelling to Generate Disaster Scenario Data.” In Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024, pages 1–10. Torino, Italia: ELRA and ICCL, 2024. https://aclanthology.org/2024.dmr-1.1.
Rights
This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain
Public Domain
Subjects
Abstract
For human-robot dialogue in a search-and-rescue scenario, a strong knowledge of the conditions and objects a robot will face is essential for effective interpretation of natural language instructions. In order to utilize the power of large language models without overwhelming the limited storage capacity of a robot, we propose PropBank-Powered Data Creation. PropBank-Powered Data Creation is an expert-in-the-loop data generation pipeline which creates training data for disaster-specific language models. We leverage semantic role labeling and Rich Event Ontology resources to efficiently develop seed sentences for fine-tuning a smaller, targeted model that could operate onboard a robot for disaster relief. We developed 32 sentence templates, which we used to make 2 seed datasets of 175 instructions for earthquake search and rescue and train derailment response. We further leverage our seed datasets as evaluation data to test our baseline fine-tuned models.
