ArXiv TLDR

Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs

🐦 Tweet
2605.13788

Eszter Varga-Umbrich, Zachary Weller-Davies, Paul Duckworth, Jules Tilly, Olivier Peltre + 1 more

cs.LG

TLDR

This paper introduces force-aware Neural Tangent Kernels and a scalable acquisition framework for robust active learning of MLIPs.

Key contributions

  • Introduces a linearly scaling acquisition framework for screening ~200k structures in hours.
  • Extends Neural Tangent Kernels (NTK) to be force-aware via mixed parameter-coordinate derivatives.
  • Achieves state-of-the-art energy and force accuracy on OC20 using the joint energy-force NTK.
  • Demonstrates improved efficiency and robustness over committee-based active learning methods.

Why it matters

This work makes active learning for machine-learning interatomic potentials (MLIPs) practical by addressing key challenges in scalability, force supervision, and robustness. It enables efficient and accurate fine-tuning of foundation models, which is crucial for advancing materials science and chemistry research.

Original Abstract

Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework based on chunked feature-space posterior-variance shortlisting. By avoiding materialisation of the candidate and train set kernels, this approach enables screening of ~200k structures within hours and applies broadly to acquisition strategies that score candidates based on molecular similarity metrics. We then extend the Neural Tangent Kernel (NTK) to a force-aware setting via mixed parameter-coordinate derivatives, yielding a force NTK and a joint energy-force NTK that provide natural similarity metrics for vector-field prediction. We demonstrate the effectiveness of the joint energy-force NTK on the OC20 dataset, where force-aware acquisition is crucial: it achieves the lowest energy and force MAE and RMSE across all metrics and distribution splits. Across T1x, PMechDB, and RGD benchmarks, our force NTK methods remain competitive with established baselines while being significantly more efficient than committee-based approaches. Under a controlled candidate-pool shift case study on T1x, acquisition based on pretrained MLIP embeddings and NTKs remains robust, whereas committee-based methods exhibit higher variance. Overall, these results show that a single pretrained MLIP can enable scalable, force-aware, and distribution-robust active learning for foundation-model fine-tuning.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.