EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
Yi-Lun Liao, Alexander J. Hoffman, Sabrina C. Shen, Alexandre Duval, Sam Walton Norwood + 1 more
TLDR
EquiformerV3 enhances SE(3)-equivariant graph attention Transformers for 3D atomistic modeling, boosting efficiency, expressivity, and generality.
Key contributions
- Achieves 1.75x speedup through optimized software implementation.
- Introduces equivariant merged layer normalization and smooth radius cutoff attention.
- Proposes SwiGLU-$S^2$ activations for many-body interactions and strict equivariance.
- Enables accurate modeling of smoothly varying potential energy surfaces (PES).
Why it matters
This paper significantly advances SE(3)-equivariant GNNs, crucial for large-scale 3D atomistic modeling. By improving efficiency, expressivity, and generality, EquiformerV3 enables more accurate and faster simulations. Its ability to model complex energy surfaces and achieve state-of-the-art results makes it a powerful tool for materials science and drug discovery.
Original Abstract
As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: efficiency, expressivity, and generality. Building on EquiformerV2, we have the following three key advances. First, we optimize the software implementation, achieving $1.75\times$ speedup. Second, we introduce simple and effective modifications to EquiformerV2, including equivariant merged layer normalization, improved feedforward network hyper-parameters, and attention with smooth radius cutoff. Third, we propose SwiGLU-$S^2$ activations to incorporate many-body interactions for better theoretical expressivity and to preserve strict equivariance while reducing the complexity of sampling $S^2$ grids. Together, SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces (PES), generalizing EquiformerV3 to tasks requiring energy-conserving simulations and higher-order derivatives of PES. With these improvements, EquiformerV3 trained with the auxiliary task of denoising non-equilibrium structures (DeNS) achieves state-of-the-art results on OC20, OMat24, and Matbench Discovery.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.