WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

April 20, 20262604.18478

cs.AIcs.CL

TLDR

WorldDB is a novel memory engine using recursive, immutable 'worlds' and programmable edges to achieve state-of-the-art long-term memory for AI agents.

Key contributions

Introduces recursive 'worlds' where each node contains its own subgraph, ontology, and composed embedding.
Utilizes content-addressed, immutable nodes, creating a Merkle-style audit trail for all knowledge edits.
Implements programmable edges with write-time handlers for ontology-aware reconciliation (e.g., supersession, contradiction).
Achieves 97.11% task-averaged accuracy on LongMemEval-s, outperforming previous SOTA by +5.61pp.

Why it matters

This paper introduces a significant advancement in persistent memory for AI agents, moving beyond stateless chatbots. WorldDB's novel architecture addresses key limitations of flat vector stores and knowledge graphs, enabling more robust long-term memory, temporal reasoning, and knowledge updates crucial for complex agentic systems.

Original Abstract

Persistent memory is the bottleneck separating stateless chatbots from long-running agentic systems. Retrieval-augmented generation (RAG) over flat vector stores fragments facts into chunks, loses cross-session identity, and has no first-class notion of supersession or contradiction. Recent bitemporal knowledge-graph systems (Graphiti, Memento, Hydra DB) add typed edges and valid-time metadata, but the graph itself remains flat: no recursive composition, no content-addressed invariants on nodes, and edge types carry no behavior beyond a label. We present WorldDB, a memory engine built on three commitments: (i) every node is a world -- a container with its own interior subgraph, ontology scope, and composed embedding, recursive to arbitrary depth; (ii) nodes are content-addressed and immutable, so any edit produces a new hash at the node and every ancestor, giving a Merkle-style audit trail for free; (iii) edges are write-time programs -- each edge type ships on_insert/on_delete/on_query_rewrite handlers (supersession closes validity, contradicts preserves both sides, same_as stages a merge proposal), so no raw append path exists. On LongMemEval-s (500 questions, ~115k-token conversational stacks), WorldDB with Claude Opus 4.7 as answerer achieves 96.40% overall / 97.11% task-averaged accuracy, a +5.61pp improvement over the previously reported Hydra DB state-of-the-art (90.79%) and +11.20pp over Supermemory (85.20%), with perfect single-session-assistant recall and robust performance on temporal reasoning (96.24%), knowledge update (98.72%), and preference synthesis (96.67%). Ablations show that the engine's graph layer -- resolver-unified entities and typed refers_to edges -- contributes +7.0pp task-averaged independently of the underlying answerer.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers