Towards Autonomous Mechanistic Reasoning in Virtual Cells
Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini + 1 more
TLDR
This paper introduces VCR-Agent, a multi-agent framework for autonomous, verifiable mechanistic reasoning in virtual cells, improving gene expression prediction.
Key contributions
- Introduces a structured explanation formalism (mechanistic action graphs) for biological reasoning in virtual cells.
- Proposes VCR-Agent, a multi-agent framework for autonomous generation and validation of mechanistic reasoning.
- Releases VC-TRACES dataset of verified mechanistic explanations derived from the Tahoe-100M atlas.
- Demonstrates improved factual precision and gene expression prediction using these verified explanations.
Why it matters
LLMs often lack factual grounding in open-ended scientific domains like biology. This work addresses that by enabling autonomous, verifiable mechanistic reasoning in virtual cells. It provides a robust framework and dataset, leading to more accurate biological predictions and accelerating scientific discovery.
Original Abstract
Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.