ArXiv TLDR

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

🐦 Tweet
2604.28138

Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang

cs.OScs.AI

TLDR

Crab is a runtime that bridges the agent-OS semantic gap for efficient and correct checkpoint/restore in agent sandboxes.

Key contributions

  • Bridges the agent-OS semantic gap for correct and efficient checkpoint/restore.
  • Uses eBPF to classify OS effects, deciding checkpoint granularity per agent turn.
  • A coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time.
  • Achieves 100% recovery correctness and cuts checkpoint traffic by up to 87%.

Why it matters

Autonomous agents need robust checkpoint/restore for fault tolerance and efficiency, but existing methods are either incorrect or too expensive due to an agent-OS semantic gap. Crab solves this by transparently understanding agent-OS interactions. This enables highly efficient and correct state recovery, crucial for scalable and reliable agent deployments.

Original Abstract

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-OS semantic gap: agent frameworks see tool calls but not their OS effects; the OS sees state changes but lacks turn-level context to judge recovery relevance. This gap hides massive sparsity: over 75% of agent turns produce no recovery-relevant state, so most checkpoints are unnecessary. Crab (Checkpoint-and-Restore for Agent SandBoxes) is a transparent host-side runtime that bridges this gap without modifying agents or C/R backends. An eBPF-based inspector classifies each turn's OS-visible effects to decide checkpoint granularity; a coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time; and a host-scoped engine schedules checkpoint traffic across co-located sandboxes. On shell-intensive and code-repair workloads, Crab raises recovery correctness from 8% (chat-only) to 100%, cuts checkpoint traffic by up to 87%, and stays within 1.9% of fault-free execution time.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.