ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

April 30, 20262604.27820

cs.AIcs.DBcs.IRcs.MA

TLDR

ObjectGraph introduces a new file format for LLM agents, treating documents as traversable knowledge graphs to improve efficiency and accuracy.

Key contributions

New .og file format treats documents as typed, directed knowledge graphs for LLM agents.
Superset of Markdown, readable by humans & agents, requires minimal query protocol.
Solves Document Consumption Problem, satisfying six unique structural properties.
Reduces agent token consumption by up to 95.3% without accuracy degradation.

Why it matters

Existing document formats are inefficient for LLM agents, forcing them to inject entire documents. ObjectGraph directly addresses this by providing a native format designed for agent retrieval rather than linear reading. This significantly boosts agent efficiency and performance by reducing token usage.

Original Abstract

Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers