ArXiv TLDR

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

🐦 Tweet
2604.08894

Zecheng Hao, Shenghao Xie, Kang Chen, Wenxuan Liu, Zhaofei Yu + 1 more

cs.NEcs.AIcs.CV

TLDR

Ge$^2$mS-T introduces multi-dimensional grouping for Spiking Transformers, achieving ultra-high energy efficiency and superior performance.

Key contributions

  • Introduces Ge$^2$mS-T, a Spiking Transformer architecture with multi-dimensional grouped computation.
  • Proposes ExpG-IF for lossless ANN-SNN conversion and precise spike pattern regulation.
  • Develops Group-wise Spiking Self-Attention (GW-SSA) for reduced complexity via multi-scale token grouping.
  • Achieves superior performance and ultra-high energy efficiency in Spiking Vision Transformers.

Why it matters

Spiking Vision Transformers face challenges in memory, accuracy, and energy. This paper introduces the first systematic multi-dimensional grouping approach to concurrently optimize these factors, significantly advancing energy-efficient SNNs.

Original Abstract

Spiking Neural Networks (SNNs) offer superior energy efficiency over Artificial Neural Networks (ANNs). However, they encounter significant deficiencies in training and inference metrics when applied to Spiking Vision Transformers (S-ViTs). Existing paradigms including ANN-SNN Conversion and Spatial-Temporal Backpropagation (STBP) suffer from inherent limitations, precluding concurrent optimization of memory, accuracy and energy consumption. To address these issues, we propose Ge$^\text{2}$mS-T, a novel architecture implementing grouped computation across temporal, spatial and network structure dimensions. Specifically, we introduce the Grouped-Exponential-Coding-based IF (ExpG-IF) model, enabling lossless conversion with constant training overhead and precise regulation for spike patterns. Additionally, we develop Group-wise Spiking Self-Attention (GW-SSA) to reduce computational complexity via multi-scale token grouping and multiplication-free operations within a hybrid attention-convolution framework. Experiments confirm that our method can achieve superior performance with ultra-high energy efficiency on challenging benchmarks. To our best knowledge, this is the first work to systematically establish multi-dimensional grouped computation for resolving the triad of memory overhead, learning capability and energy budget in S-ViTs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.