Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment
Links to Files
Permanent Link
Author/Creator
Date
Type of Work
Department
Program
Citation of Original Publication
Delafuente, Patricia, Arya Honraopatil, and Lara J. Martin. “Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment.” Paper presented at Wordplay: When Language Meets Games Workshop, Suzhou, China. November 9, 2025. https://wordplay-workshop.github.io/pdfs/29.pdf.
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Abstract
This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.
