MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration
Chenmu Zhang, Boris I. Yakobson
TLDR
MatClaw is a code-first LLM agent that autonomously writes and executes Python for complex materials science workflows, achieving guided autonomy.
Key contributions
- MatClaw is a code-first LLM agent that autonomously writes and executes Python for multi-code materials science workflows.
- Uses a four-layer memory architecture and retrieval-augmented generation for sustained, accurate multi-day workflows.
- Successfully performs end-to-end materials exploration, reliably generating code but needing tacit domain knowledge.
- Achieves "guided autonomy" through literature self-learning and expert constraints, bridging critical knowledge gaps.
Why it matters
This paper introduces MatClaw, an advanced LLM agent that significantly pushes the boundaries of autonomous computational materials science. By enabling direct Python code generation and multi-code orchestration, it overcomes limitations of previous agents. The "guided autonomy" model, combining LLM capabilities with expert input, promises to accelerate materials discovery beyond manual workflows.
Original Abstract
Existing LLM agents for computational materials science are constrained by pipeline-bounded architectures tied to specific simulation codes and by dependence on manually written tool functions that grow with task scope. We present MatClaw, a code-first agent that writes and executes Python directly, composing any installed domain library to orchestrate multi-code workflows on remote HPC clusters without predefined tool functions. To sustain coherent execution across multi-day workflows, MatClaw uses a four-layer memory architecture that prevents progressive context loss, and retrieval-augmented generation over domain source code that raises per-step API-call accuracy to ${\sim}$99 %. Three end-to-end demonstrations on ferroelectric CuInP2S6 (machine-learning force field training via active learning, Curie temperature prediction, and heuristic parameter-space search) reveal that the agent handles code generation reliably but struggles with tacit domain knowledge. The missing knowledge, such as appropriate simulation timescales, equilibration protocols, and sampling strategies, is the kind that researchers accumulate through experience but rarely formalize. Two lightweight interventions, literature self-learning and expert-specified constraints, bridge these gaps, defining a guided autonomy model in which the researcher provides high-level domain knowledge while the agent handles workflow execution. Our results demonstrate that the gap between guided and fully autonomous computational materials research is narrower than ever before: LLMs already handle code generation and scientific interpretation reliably, and the rapid improvement in their capabilities will accelerate materials discovery beyond what manual workflows can achieve. All code and benchmarks are open-source.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.