ArXiv TLDR

From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

🐦 Tweet
2605.06365

Josh Rosen, Seth Rosen

cs.AIcs.MAcs.SE

TLDR

Execution lineage introduces a DAG-based model for AI-native workflows, ensuring reproducible and maintainable work by explicitly managing dependencies and state.

Key contributions

  • Introduces "execution lineage," a DAG model for AI-native work with explicit dependencies and stable intermediate boundaries.
  • Enables identity-based replay, making AI-generated work maintainable and reproducible under change.
  • Outperforms loop-centric baselines in preserving work products and preventing unrelated context contamination.
  • Achieves perfect upstream preservation and cross-artifact consistency when editing intermediate artifacts.

Why it matters

This paper addresses the critical challenge of maintaining and evolving AI-generated work, which current agentic systems struggle with due to implicit state. By introducing execution lineage, it offers a robust solution for reproducible AI-native workflows.

Original Abstract

Large language model systems are increasingly deployed as agentic workflows that interleave reasoning, tool use, memory, and iterative refinement. These systems are effective at producing answers, but they often rely on implicit conversational state, making it difficult to preserve stable work products, isolate irrelevant updates, or propagate changes through intermediate artifacts. We introduce execution lineage: an execution model in which AI-native work is represented as a directed acyclic graph (DAG) of artifact-producing computations with explicit dependencies, stable intermediate boundaries, and identity-based replay. The goal is not to make the model a better one-shot writer, but to make evolving AI-generated work maintainable under change. We compare execution-lineage replay against loop-centric update baselines on two controlled policy-memo update tasks. In an unrelated-branch update, DAG replay preserved the final memo exactly in all runs, with zero churn and zero unrelated-branch contamination, while loop baselines regenerated the memo and frequently imported unrelated context. In an intermediate-artifact edit, all systems reflected the new constraint in the final memo, but only DAG replay achieved perfect upstream preservation, downstream propagation, unaffected-artifact preservation, and cross-artifact consistency. These results show that final answer quality and maintained-state quality are distinct. Strong loop baselines can remain competitive at producing polished final outputs when the task is a bounded synthesis/update problem and all current sources fit in context, but immediate task success can mask partial state inconsistency that may compound over future revisions. Execution lineage provides stronger guarantees about what should change, what should remain stable, and how work evolves across revisions.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.