Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework
Ziling Ma, Ángel López Oriona, Hernando Ombao, Ying Sun
TLDR
This paper proposes a validation-driven clustering framework for adaptive pooling in multivariate time series forecasting under predictive heterogeneity.
Key contributions
- Proposes a validation-driven framework for adaptive pooling in high-dimensional multivariate time series.
- Clusters time series based on out-of-sample predictive performance, not representation similarity.
- Iteratively updates cluster assignments using validation losses for both point and probabilistic forecasts.
- Includes a leakage-free fallback mechanism to revert to a global model if specialization fails.
Why it matters
This framework offers a principled and reliable way to handle predictive heterogeneity in complex time series forecasting. It consistently improves performance over baselines while safeguarding against degradation when heterogeneity is weak.
Original Abstract
We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.