Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

April 29, 20262604.26762

Zhangzhi Xiong, Haoyi Wu, You Wu, Shuqi Gu, Kan Ren + 1 more

cs.LGcs.AI

TLDR

Introduces Spatial-Temporal Probabilistic Transformer (ST-PT), a programmable factor graph framework, for robust time series modeling.

Key contributions

Lifts Probabilistic Transformer (PT) to Spatial-Temporal PT (ST-PT) for time series, adding channel axis and per-step semantics.
Demonstrates injecting symbolic time-series priors via structural graph modifications for improved performance.
Shows how external conditions can program factor matrices for structural, sample-specific conditional generation.
Explores using MFVI as a principled Bayesian update for latent AR forecasting and CRF teacher distillation.

Why it matters

This paper transforms the black-box Transformer into an interpretable, programmable factor graph for time series. By explicitly engineering its components, ST-PT offers a novel way to inject domain knowledge and achieve more robust, conditional, and principled time-series forecasting, especially in challenging data scenarios.

Original Abstract

The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the Transformer ceases to be a black-box neural network and becomes a programmable factor graph: graph topology, factor potentials, and the message-passing schedule are all explicit and inspectable primitives that can be engineered. PT was originally developed for natural language and in this report we investigate its potential for time series. We first lift PT into the Spatial-Temporal Probabilistic Transformer (ST-PT) to repair PT's missing channel axis and weak per-step semantics, and adopt ST-PT as a shared cornerstone backbone. We then identify three distinct properties that PT/ST-PT offers as a factor-graph model and derive three Research Questions, one per property, that probe how each property can be exploited in time series: RQ1. The graph topology and potentials are direct programmable primitives. Can this be used to inject symbolic time-series priors into ST-PT through structural graph modifications, especially under data scarcity and noise? RQ2. The CRF's factor matrices are the operator's potentials. Can an external condition program these factor matrices on a per-sample basis, so that conditional generation becomes structural rather than feature-level modulation of a fixed one? RQ3. Each MFVI iteration is a Bayesian posterior update on the factor graph. Can this turn the latent transition of latent-space AutoRegressive (AR) forecasting from an opaque MLP into a principled posterior update, and can a CRF teacher distill its latents into the AR student to counter cumulative error? We give one empirical study per question. Together, these three studies position ST-PT as a programmable framework for time-series modeling.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers