Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks
| dc.contributor.author | Shelat, Shlok | |
| dc.contributor.author | Raval, Jay | |
| dc.contributor.author | Roy, Souvik | |
| dc.contributor.author | Gaur, Manas | |
| dc.date.accessioned | 2026-02-12T16:44:23Z | |
| dc.date.issued | 2026-01-19 | |
| dc.description.abstract | Large language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for deterministic finite automata (DFA) construction from regular languages, comprising factual knowledge questions, seen construction problems from public sources, and two types of unseen problems: hand-crafted instances with multiple interacting constraints and systematically generated problems via Arden's theorem. Models achieve perfect accuracy on factual questions and 84-90% on seen tasks. However, accuracy drops sharply on unseen problems (by 30-64%), with failures stemming from systematic misinterpretation of language constraints, incorrect handling of Kleene-star semantics, and a failure to preserve global consistency. We evaluate a three-stage hint protocol that enables correction of shallow errors but does not reliably resolve globally inconsistent or structurally flawed automata. Our analysis across multiple prompting strategies (direct, Chain-of-Thought, Tree-of-Thought) reveals that errors persist regardless of prompting approach, exposing a fundamental gap between LLMs' ability to generate syntactically plausible DFAs and their capacity for semantically correct formal reasoning. | |
| dc.description.uri | http://arxiv.org/abs/2601.13392 | |
| dc.format.extent | 30 pages | |
| dc.genre | journal articles | |
| dc.genre | preprints | |
| dc.identifier | doi:10.13016/m2krx1-xxb7 | |
| dc.identifier.uri | https://doi.org/10.48550/arXiv.2601.13392 | |
| dc.identifier.uri | http://hdl.handle.net/11603/41894 | |
| dc.language.iso | en | |
| dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
| dc.relation.ispartof | UMBC Faculty Collection | |
| dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
| dc.rights | Attribution 4.0 International | |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | UMBC Ebiquity Research Group | |
| dc.subject | Computer Science - Artificial Intelligence | |
| dc.subject | Computer Science - Computation and Language | |
| dc.subject | Computer Science - Formal Languages and Automata Theory | |
| dc.title | Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks | |
| dc.type | Text | |
| dcterms.creator | https://orcid.org/0000-0002-5411-2230 |
Files
Original bundle
1 - 1 of 1
