Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

April 13, 20262604.11753

Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen

cs.CL

TLDR

AggAgent uses an aggregation agent to efficiently scale long-horizon agentic tasks by synthesizing information from parallel trajectories.

Key contributions

Proposes AggAgent, an aggregation agent for parallel test-time scaling of long-horizon tasks.
Treats parallel trajectories as an environment, using lightweight tools for inspection and search.
Outperforms existing aggregation methods by up to 10.3% on deep research tasks.
Adds minimal overhead, with aggregation cost bounded by a single agentic rollout.

Why it matters

AggAgent offers a breakthrough in scaling long-horizon agentic tasks by efficiently synthesizing parallel trajectories. It overcomes context window limitations and significantly boosts performance on complex tasks like deep research. This makes parallel test-time scaling practical and cost-effective for advanced AI agents.

Original Abstract

We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique challenges: trajectories are long, multi-turn, and tool-augmented, and outputs are often open-ended. Aggregating only final answers discards rich information from trajectories, while concatenating all trajectories exceeds the model's context window. To address this, we propose AggAgent, an aggregation agent that treats parallel trajectories as an environment. We equip it with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand. Across six benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5), AggAgent outperforms all existing aggregation methods-by up to 5.3% absolute on average and 10.3% on two deep research tasks-while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout. Our findings establish agentic aggregation as an effective and cost-efficient approach to parallel test-time scaling.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers