RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark
Huashuo Lei, Wenxuan Song, Huarui Zhang, Jieyuan Pei, Jiayi Chen + 8 more
TLDR
RoboMemArena is a new, large-scale robotic memory benchmark with 26 tasks, real-world evaluation, and VLM-generated annotations, alongside the PrediMem VLA.
Key contributions
- Introduces RoboMemArena, a large-scale robotic memory benchmark with 26 complex, long-horizon tasks.
- Features VLM-generated multimodal annotations and supports both simulated and real-world evaluations.
- Presents PrediMem, a dual-system VLA with a VLM planner and predictive coding for improved task dynamics.
- PrediMem outperforms baselines on RoboMemArena, providing insights into memory systems and scaling.
Why it matters
This paper fills a critical gap in robotic memory benchmarks with RoboMemArena, a challenging platform for developing and evaluating memory-dependent robotic intelligence in real-world tasks. The PrediMem model offers a novel approach to memory management, advancing state-of-the-art for long-horizon tasks.
Original Abstract
Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.