From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Alex Petrov, Alexander Gusak, Denis Mukha, Dima Korolev
TLDR
This paper proposes schema-grounded memory with an iterative, schema-aware write path for reliable AI, outperforming retrieval-based systems.
Key contributions
- Proposes schema-grounded AI memory for reliable facts and state, moving beyond simple retrieval.
- Introduces an iterative, schema-aware write path for robust memory ingestion and validation.
- Achieves 90.42% object accuracy on extraction and 97.10% F1 on end-to-end memory benchmarks.
- Shows architecture is key for stable facts and stateful computation, not just model strength.
Why it matters
This paper addresses the critical need for reliable, structured AI memory in production environments. It offers a robust alternative to traditional retrieval, enabling agents to handle exact facts and stateful operations. The findings highlight that architectural design is crucial for stable AI memory, surpassing the impact of model scale alone.
Original Abstract
Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns. These operations require memory to behave less like search and more like a system of record. This paper argues that reliable external AI memory must be schema-grounded. Schemas define what must be remembered, what may be ignored, and which values must never be inferred. We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control. The result shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose. We evaluate this design on structured extraction and end-to-end memory benchmarks. On the extraction benchmark, the judge-in-the-loop configuration reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines. On our end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines. On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses. The results show that, for memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.