Towards Autonomous Mechanistic Reasoning in Virtual Cells

April 13, 20262604.11661

Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini + 1 more

cs.LGcs.AI

TLDR

This paper introduces VCR-Agent, a multi-agent framework for autonomous, verifiable mechanistic reasoning in virtual cells, improving gene expression prediction.

Key contributions

Introduces a structured explanation formalism (mechanistic action graphs) for biological reasoning in virtual cells.
Proposes VCR-Agent, a multi-agent framework for autonomous generation and validation of mechanistic reasoning.
Releases VC-TRACES dataset of verified mechanistic explanations derived from the Tahoe-100M atlas.
Demonstrates improved factual precision and gene expression prediction using these verified explanations.

Why it matters

LLMs often lack factual grounding in open-ended scientific domains like biology. This work addresses that by enabling autonomous, verifiable mechanistic reasoning in virtual cells. It provides a robust framework and dataset, leading to more accurate biological predictions and accelerating scientific discovery.

Original Abstract

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers