Integrated electro-optic attention nonlinearities for transformers
Luis Mickeler, Kai Lion, Alfonso Nardi, Jost Kellner, Pierre Didier + 3 more
TLDR
Electro-optic TFLN modulators replace digital Softmax in Transformers, drastically reducing inference latency while maintaining accuracy.
Key contributions
- Introduces thin-film lithium niobate (TFLN) Mach-Zehnder modulators for analog nonlinear computation.
- Implements electro-optic alternatives to Softmax and Sigmoid functions for attention mechanisms.
- Achieves competitive accuracy in Vision Transformers and LLMs, even with 4-bit input-output quantization.
- Demonstrates high-speed (10 GBaud) and energy-efficient nonlinear computation for AI hardware.
Why it matters
This paper addresses a critical bottleneck in Transformer inference by leveraging novel electro-optic hardware. It paves the way for faster, more energy-efficient AI models, crucial for deploying large language models and vision transformers at scale.
Original Abstract
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Zehnder modulators (MZMs) as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations. We implement electro-optic alternatives to digital Softmax and Sigmoid, and evaluate their performance in Vision Transformers and Large Language Models. Our system maintains highly competitive accuracy, even under aggressive 4-bit input-output quantization of the analog units. We further characterize system noise at encoding speeds up to 10 GBaud and assess model robustness under various noise conditions. Our findings suggest that TFLN modulators can serve as nonlinear function units within hybrid co-packaged hardware, enabling high-speed and energy-efficient nonlinear computation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.