Executable World Models for ARC-AGI-3 in the Era of Coding Agents

May 6, 20262605.05138

cs.AI

TLDR

This paper evaluates a coding agent for ARC-AGI-3 that uses executable Python world models, verification, and refactoring for planning.

Key contributions

Introduces a coding agent for ARC-AGI-3 using executable Python world models.
Agent verifies its world model against observations and refactors it for simplicity.
Achieved 7 full solves and >75% Relative Human Action Efficiency on 6 games.
Serves as a game-general baseline for ARC-AGI-3 without game-specific code.

Why it matters

This paper presents a novel approach to ARC-AGI-3 using executable world models and verification, offering a game-general baseline. Its preliminary success suggests a promising direction for developing more adaptable and robust AI agents.

Original Abstract

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers