ArXiv TLDR

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

🐦 Tweet
2604.19572

Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu + 6 more

cs.CL

TLDR

TACO is a self-evolving framework that efficiently compresses observational context for terminal agents, reducing token costs and improving performance.

Key contributions

  • Introduces TACO, a self-evolving framework for efficient terminal agent context compression.
  • Automatically discovers and refines compression rules from agent interaction trajectories.
  • Reduces token overhead by ~10% while consistently improving agent performance.
  • Achieves 1-4% performance gains on TerminalBench and other benchmarks across various models.

Why it matters

Long-horizon agent tasks are bottlenecked by growing token costs from redundant environmental feedback. TACO provides a generalizable solution by dynamically learning compression rules, making terminal agents more efficient and effective. This advancement is crucial for scaling agent capabilities in complex, multi-turn environments.

Original Abstract

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.