The Last Human-Written Paper: Agent-Native Research Artifacts
Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu + 32 more
TLDR
Ara is a new protocol for machine-executable research packages, enhancing AI's ability to understand, reproduce, and extend scientific work by preserving full research context.
Key contributions
- Introduces Agent-Native Research Artifact (Ara) to replace narrative papers with machine-executable research packages.
- Ara is structured in four layers: scientific logic, executable code, an exploration graph (failures), and evidence.
- Supported by a Live Research Manager, Ara Compiler for legacy PDFs, and an Ara-native review system.
- Significantly improves AI question-answering accuracy (72.4% to 93.7%) and reproduction success (57.4% to 64.4%).
Why it matters
Traditional papers discard crucial research details, hindering AI agents. Ara addresses this by providing a complete, machine-executable research package. This approach significantly boosts AI's ability to understand, reproduce, and extend scientific findings, making research more accessible and efficient for automated systems.
Original Abstract
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (Ara), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an Ara Compiler that translates legacy PDFs and repos into Aras; and an Ara-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, Ara raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in Ara accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.