ArXiv TLDR

AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling

🐦 Tweet
2604.20263

Zhenyu Wang, Geyan Ye, Wei Liu, Man Tat Alexander Ng

q-bio.QMcs.AIcs.LG

TLDR

AROMA is a multimodal AI that accurately and interpretably models genetic perturbations in virtual cells, outperforming prior methods.

Key contributions

  • Integrates textual evidence, graph topology, and protein sequence features.
  • Uses a two-stage optimization for accurate and interpretable predictions.
  • Outperforms existing methods across cell lines and in zero-shot scenarios.
  • Introduces PerturbReason dataset and two knowledge graphs for the domain.

Why it matters

Virtual cell modeling is crucial for understanding biological mechanisms. AROMA addresses key limitations of current methods by providing more reliable and interpretable predictions. This advancement could accelerate drug discovery and personalized medicine by better simulating genetic interventions.

Original Abstract

Virtual cell modeling predicts molecular state changes under genetic perturbations in silico, which is essential for biological mechanism studies. However, existing approaches suffer from unconstrained reasoning, uninterpretable predictions, and retrieval signals that are weakly aligned with regulatory topology. To address these limitations, we propose AROMA, an Augmented Reasoning Over a Multimodal Architecture for virtual cell genetic perturbation modeling. AROMA integrates textual evidence, graph-topology information, and protein sequence features to model perturbation-target dependencies, and is trained with a two-stage optimization strategy to yield predictions that are both accurate and interpretable. We also construct two knowledge graphs and a perturbation reasoning dataset, PerturbReason, containing more than 498k samples, as reusable resources for the virtual cell domain. Experiments show that AROMA outperforms existing methods across multiple cell lines, and remains robust under zero-shot evaluation on an unseen cell line, as well as in knowledge-sparse, long-tail scenarios. Overall, AROMA demonstrates that combining knowledge-driven multimodal modeling with evidence retrieval provides a promising pathway toward more reliable and interpretable virtual cell perturbation prediction. Model weights are available at https://huggingface.co/blazerye/AROMA. Code is available at https://github.com/blazerye/AROMA.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.