Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

dc.contributor.authorShelat, Shlok
dc.contributor.authorRaval, Jay
dc.contributor.authorRoy, Souvik
dc.contributor.authorGaur, Manas
dc.date.accessioned2026-02-12T16:44:23Z
dc.date.issued2026-01-19
dc.description.abstractLarge language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for deterministic finite automata (DFA) construction from regular languages, comprising factual knowledge questions, seen construction problems from public sources, and two types of unseen problems: hand-crafted instances with multiple interacting constraints and systematically generated problems via Arden's theorem. Models achieve perfect accuracy on factual questions and 84-90% on seen tasks. However, accuracy drops sharply on unseen problems (by 30-64%), with failures stemming from systematic misinterpretation of language constraints, incorrect handling of Kleene-star semantics, and a failure to preserve global consistency. We evaluate a three-stage hint protocol that enables correction of shallow errors but does not reliably resolve globally inconsistent or structurally flawed automata. Our analysis across multiple prompting strategies (direct, Chain-of-Thought, Tree-of-Thought) reveals that errors persist regardless of prompting approach, exposing a fundamental gap between LLMs' ability to generate syntactically plausible DFAs and their capacity for semantically correct formal reasoning.
dc.description.urihttp://arxiv.org/abs/2601.13392
dc.format.extent30 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifierdoi:10.13016/m2krx1-xxb7
dc.identifier.urihttps://doi.org/10.48550/arXiv.2601.13392
dc.identifier.urihttp://hdl.handle.net/11603/41894
dc.language.isoen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectUMBC Ebiquity Research Group
dc.subjectComputer Science - Artificial Intelligence
dc.subjectComputer Science - Computation and Language
dc.subjectComputer Science - Formal Languages and Automata Theory
dc.titleBeyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-5411-2230

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2601.13392v1.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format