Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

May 11, 20262605.09863

cs.CRcs.AIcs.CLcs.IRcs.LG

TLDR

Nautilus Compass detects persona drift in black-box LLM agents using prompt-text analysis, offering an efficient and accessible memory solution.

Key contributions

Detects persona drift in black-box LLM agents using prompt-text analysis, bypassing model weight access.
Leverages BGE-m3 embeddings and cosine similarity between user prompts and behavioral anchor texts.
Uniquely avoids LLM calls at index time by directly embedding raw conversation text for memory.
Achieves ROC AUC 0.83 for drift detection and is 14x cheaper than comparable LLM-judged systems.

Why it matters

LLM agents in production suffer from persona drift, especially with closed-source APIs. Nautilus Compass provides a novel, black-box solution that is both effective and significantly more cost-efficient than existing methods. Its direct embedding approach makes it uniquely accessible and practical for real-world deployment.

Original Abstract

Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs (Claude, GPT-4) that most users actually interact with. We present Nautilus Compass, a black-box persona drift detector and agent memory layer for production coding agents. The method operates entirely at the prompt-text layer: cosine similarity between user prompts and behavioral anchor texts, aggregated by a weighted top-k mean using BGE-m3 embeddings. Compass is, to our knowledge, the only public agent memory layer (among Mem0, Letta, Cognee, Zep, MemOS, smrti verified May 2026) that does not call an LLM at index time to extract facts or build a graph; raw conversation text is embedded directly. The system ships as a Claude Code plugin, an MCP 2024-11-05 A2A server (Cursor, Cline, Hermes), a CLI, and a REST API on one daemon, with a Merkle-chained audit log for tamper-evident anchor updates. On a held-out test set built from real Claude Code session traces and labeled by an independent LLM judge, Compass reaches ROC AUC 0.83 for drift detection. The embedded retrieval pipeline scores 56.6% on LongMemEval-S v0.8 and 44.4% on EverMemBench-Dynamic (n=500), topping the four published EverMemBench Table 4 baselines. LongMemEval-S 56.6% is ~30 points below recent white-box leaders (90+%); we treat that as the architectural ceiling of the no-extraction design. End-to-end reproduction cost is $3.50 (~14x cheaper than GPT-4o-judged stacks). A paired cross-vendor behavior A/B accompanies these numbers as preliminary system-level evidence. Code, anchors, frozen test data, and audit-log tooling are MIT-licensed at github.com/chunxiaoxx/nautilus-compass.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers