Spiking Sequence Machines and Transformers

May 1, 20262605.00662

cs.NEcs.LG

TLDR

This paper reveals that Spiking Sequence Machines and Transformers independently implement the same five functional operations using cosine similarity.

Key contributions

Shows Spiking Sequence Machines (2007) and Transformers (2017) share five functional operations.
Formalizes a Phase-Latency Isomorphism linking sinusoidal positional phase and spike timing linearly.
Proves dot product attention is invariant to this phase-latency mapping up to a global scale factor.
Empirically demonstrates learned rank-based embeddings outperform sinusoidal encoding on positional tasks.

Why it matters

This paper unifies two seemingly disparate sequence models, Spiking Sequence Machines and Transformers, by showing their shared underlying computational principles. It provides a theoretical framework and empirical evidence that challenges conventional positional encoding, suggesting that distance discriminability is key. This could lead to new insights for designing more efficient and biologically plausible sequence models.

Original Abstract

Sequence learning reduces to similarity-based retrieval over a temporally indexed representation space, a constraint on any sequence model, not a property of a specific architecture. We show that a spiking Sparse Distributed Memory sequence machine (2007) and the transformer (2017) independently instantiate the same five functional operations (encoding, context maintenance, associative retrieval, storage, and decoding), with cosine similarity as the shared retrieval primitive in both. We formalise a Phase-Latency Isomorphism showing that sinusoidal positional phase and spike timing are linearly related, and prove that dot product attention is invariant to this mapping up to a global scale factor on the positional component (Lemma 1). Empirically, frequency-compressed positional encoding fails to converge on a positionally demanding copy task, while a learned rank-based embedding matches or exceeds sinusoidal encoding, indicating that the critical property for positional representation is distance discriminability under dot-product similarity, not sinusoidal form. Time, phase, and rank are three instantiations of the same computational primitive, an ordered index whose structure survives similarity-based retrieval.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers