ArXiv TLDR

NeuroClaw Technical Report

🐦 Tweet
2604.24696

Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu + 3 more

cs.CV

TLDR

NeuroClaw is a multi-agent AI system designed to make neuroimaging research more executable and reproducible by handling diverse data and complex pipelines.

Key contributions

  • Processes diverse neuroimaging data (sMRI, fMRI, dMRI, EEG) using BIDS metadata.
  • Manages execution environments with pinned Python, Docker, and automated tool installers.
  • Enhances reproducibility through checkpointing, verification, and structured audit traces.
  • Introduces NeuroBench, a system-level benchmark for evaluating neuroimaging AI agents.

Why it matters

This paper addresses critical challenges in applying AI agents to neuroimaging, such as data heterogeneity and reproducibility. NeuroClaw offers a robust framework that streamlines complex workflows and ensures research transparency. Its environment management and verification features significantly advance reliable scientific discovery.

Original Abstract

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent research assistant for executable and reproducible neuroimaging research. NeuroClaw operates directly on raw neuroimaging data across formats and modalities, grounding decisions in dataset semantics and BIDS metadata so users need not prepare curated inputs or bespoke model code. The platform combines harness engineering with end-to-end environment management, including pinned Python environments, Docker support, automated installers for common neuroimaging tools, and GPU configuration. In practice, this layer emphasizes checkpointing, post-execution verification, structured audit traces, and controlled runtime setup, making toolchains more transparent while improving reproducibility and auditability. A three-tier skill/agent hierarchy separates user-facing interaction, high-level orchestration, and low-level tool skills to decompose complex workflows into safe, reusable units. Alongside the NeuroClaw framework, we introduce NeuroBench, a system-level benchmark for executability, artifact validity, and reproducibility readiness. Across multiple multimodal LLMs, NeuroClaw-enabled runs yield consistent and substantial score improvements compared with direct agent invocation. Project homepage: https://cuhk-aim-group.github.io/NeuroClaw/index.html

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.