Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Department

Program

Citation of Original Publication

Delafuente, Patricia, Arya Honraopatil, and Lara J. Martin. “Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment.” Paper presented at Wordplay: When Language Meets Games Workshop, Suzhou, China. November 9, 2025. https://wordplay-workshop.github.io/pdfs/29.pdf.

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.