TDD Governance for Multi-Agent Code Generation via Prompt Engineering

April 29, 20262604.26615

Tarlan Hasanli, Shahbaz Siddeeq, Bishwash Khanal, Pyry Kotilainen, Tommi Mikkonen + 1 more

cs.SEcs.AI

TLDR

An AI-native TDD framework uses prompt engineering to enforce TDD principles, improving stability and reproducibility in LLM-assisted software development.

Key contributions

Presents an AI-native TDD framework operationalizing TDD principles via prompt and workflow governance.
Formalizes TDD principles into a machine-readable manifesto for structured enforcement across development stages.
Introduces a layered architecture separating LLM proposals from deterministic engine authority.
Enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control for stability.

Why it matters

LLMs struggle with discipline and stability in software development. This paper addresses this by integrating classical TDD principles directly into LLM workflows. By enforcing structured processes, it promises more reliable and reproducible code generation, crucial for production-ready AI-assisted development.

Original Abstract

Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured Red-Green-Refactor process, existing LLM-based approaches typically use tests as auxiliary inputs rather than enforceable process constraints. We present an AI-native TDD framework that operationalizes classical TDD principles as structured prompt-level and workflow-level governance mechanisms. Extracted principles are formalized in a machine-readable manifesto and distributed across planning, generation, repair, and validation stages within a layered architecture that separates model proposal from deterministic engine authority. The system enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control to improve stability and reproducibility. We describe architecture and discuss encoding software engineering discipline directly into prompt orchestration, which we think offers a promising direction for reliable LLM-assisted development.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers