ArXiv TLDR

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

🐦 Tweet
2605.00702

Derong Xu, Shuochen Liu, Pengfei Luo, Pengyue Jia, Yingyi Zhang + 6 more

cs.CL

TLDR

MemCoE is a cognition-inspired two-stage optimization framework that learns how to organize and what to update in LLM memory for better personalization.

Key contributions

  • MemCoE: A cognition-inspired two-stage framework for learning LLM memory organization and updates.
  • Stage 1: Memory Guideline Induction optimizes a global memory guideline via contrastive feedback.
  • Stage 2: Guideline-Aligned Memory Policy Optimization learns memory policy using structured rewards and multi-turn RL.

Why it matters

LLM agents need better long-term memory for personalization, which current methods lack. MemCoE, a cognition-inspired two-stage optimization, learns memory organization and updates, boosting personalization, robustness, and efficiency.

Original Abstract

Large language model (LLM) agents require long-term user memory for consistent personalization, but limited context windows hinder tracking evolving preferences over long interactions. Existing memory systems mainly rely on static, hand-crafted update rules; although reinforcement learning (RL)-based agents learn memory updates, sparse outcome rewards provide weak supervision, resulting in unstable long-horizon optimization. Drawing on memory schema theory and the functional division between prefrontal regions and hippocampus regions, we introduce MemCoE, a cognition-inspired two-stage optimization framework that learns how memory should be organized and what information to update. In the first stage, we propose Memory Guideline Induction to optimize a global guideline via contrastive feedback interpreted as textual gradients; in the second stage, Guideline-Aligned Memory Policy Optimization uses the induced guideline to define structured process rewards and performs multi-turn RL to learn a guideline-following memory evolution policy. We evaluate on three personalization memory benchmarks, covering explicit/implicit preference and different sizes and noise, and observe consistent improvements over strong baselines with favorable robustness, transferability, and efficiency.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.