ArXiv TLDR

BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine

🐦 Tweet
2605.05985

Remigiusz Kinas, Joanna Krawczyk, Rafał Powalski, Przemysław Pietrzak, Agnieszka Kowalewska + 4 more

cs.AIcs.MAq-bio.QM

TLDR

BioResearcher is a multi-agent AI system that uses scenario-guided playbooks and specialized tools to automate complex translational medicine research.

Key contributions

  • Scenario-guided multi-agent system for translational medicine, using versioned research playbooks.
  • Delegates to 30+ specialized subagents, tools, and ML endpoints for diverse data sources.
  • Integrates structured database access with sandboxed code for genome-scale analyses.
  • Achieves high performance across unit-level, open-ended reasoning, and end-to-end clinical discovery benchmarks.

Why it matters

Translational medicine requires auditable, scenario-specific AI workflows that general models lack. BioResearcher provides a specialized multi-agent system, validated by strong benchmark performance, to accelerate clinical discovery.

Original Abstract

Translational medicine turns underspecified development goals into evidence synthesis that must combine literature, trials, patents, and quantitative multi-omics analysis while preserving identifiers, uncertainty, and retrievable provenance. General-purpose foundation models and off-the-shelf tool-augmented or multi-agent systems are not built for this: they tend to produce single-shot answers or run open-endedly, and fall short on the auditable, scenario-specific workflows that heterogeneous biomedical sources demand. This paper introduces Ingenix BioResearcher, a scenario-guided multi-agent system that maps queries to versioned research playbooks, delegates to specialized subagents over 30+ tools and machine-learning endpoints, mixes structured database access with sandboxed code for genome-scale analyses, and applies claim-level multi-model reconciliation before editorial assembly. We evaluate BioResearcher across unit-level capabilities, open-ended biomedical reasoning, and end-to-end clinical discovery. It leads evaluated baselines on 109 single-step tests (83.49% pass rate; 0.892 average score), achieves strong biomedical benchmark performance (89.33% on BixBench-Verified-50 and the top 0.758 mean score on BaisBench Scientific Discovery), and leads on a 30-query clinical end-to-end benchmark with the highest positive hit rate (74.7% $\pm$ 3.3%) and negative clear rate (96.8% $\pm$ 0.2%). These results show broad, competitive performance across unit-level, open-ended, and end-to-end clinical evaluations.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.