Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Delafuente, Patricia, Arya Honraopatil, and Lara J. Martin. “Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment.” Paper presented at Wordplay: When Language Meets Games Workshop, Suzhou, China. November 9, 2025. https://wordplay-workshop.github.io/pdfs/29.pdf.

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Subjects

Computer Science - Computation and Language

Abstract

This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.

Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract