Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen
TLDR
AggAgent uses an aggregation agent to efficiently scale long-horizon agentic tasks by synthesizing information from parallel trajectories.
Key contributions
- Proposes AggAgent, an aggregation agent for parallel test-time scaling of long-horizon tasks.
- Treats parallel trajectories as an environment, using lightweight tools for inspection and search.
- Outperforms existing aggregation methods by up to 10.3% on deep research tasks.
- Adds minimal overhead, with aggregation cost bounded by a single agentic rollout.
Why it matters
AggAgent offers a breakthrough in scaling long-horizon agentic tasks by efficiently synthesizing parallel trajectories. It overcomes context window limitations and significantly boosts performance on complex tasks like deep research. This makes parallel test-time scaling practical and cost-effective for advanced AI agents.
Original Abstract
We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique challenges: trajectories are long, multi-turn, and tool-augmented, and outputs are often open-ended. Aggregating only final answers discards rich information from trajectories, while concatenating all trajectories exceeds the model's context window. To address this, we propose AggAgent, an aggregation agent that treats parallel trajectories as an environment. We equip it with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand. Across six benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5), AggAgent outperforms all existing aggregation methods-by up to 5.3% absolute on average and 10.3% on two deep research tasks-while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout. Our findings establish agentic aggregation as an effective and cost-efficient approach to parallel test-time scaling.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.