ArXiv TLDR

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

🐦 Tweet
2605.04036

Yuwen Du, Rui Ye, Shuo Tang, Keduan Huang, Xinyu Zhu + 2 more

cs.AIcs.CL

TLDR

OpenSeeker-v2 achieves state-of-the-art search agent performance using simple SFT and high-difficulty trajectories, surpassing complex industrial models.

Key contributions

  • Demonstrates simple SFT can achieve SOTA for frontier search agents with informative, high-difficulty trajectories.
  • Introduces three data synthesis modifications: scaled knowledge graph, expanded tool set, and strict low-step filtering.
  • Achieves state-of-the-art performance across 4 benchmarks, surpassing complex industrial models.
  • Developed by an academic team using only SFT, making advanced search agent research more accessible.

Why it matters

This paper shows that advanced search agents don't require resource-intensive industrial pipelines. A simple SFT approach, when fueled with high-quality data, can achieve state-of-the-art results. This democratizes frontier LLM agent research, making it accessible to academic teams.

Original Abstract

Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.