ArXiv TLDR

Order-Agnostic Autoregressive Modelling with Missing Data

🐦 Tweet
2605.06355

Ignacio Peis, Pablo M. Olmos, Jes Frellsen

cs.LGstat.ML

TLDR

This paper introduces MO-ARM, an order-agnostic autoregressive model for robust imputation and active information acquisition in datasets with missing data.

Key contributions

  • Shows standard training of order-agnostic models implicitly imputes under MCAR, yielding robust performance.
  • Introduces the first principled framework to train these models directly on incomplete datasets.
  • Leverages amortized conditional density estimation for active information acquisition.
  • MO-ARM consistently outperforms established imputation baselines on real-world benchmarks.

Why it matters

This work extends powerful order-agnostic autoregressive models to handle incomplete data, a pervasive challenge in real-world datasets. It provides a robust framework for imputation and active information acquisition, significantly improving performance over existing baselines, making these generative models more practical.

Original Abstract

Order-Agnostic autoregressive models have demonstrated strong performance in deep generative modeling, yet their use in settings with incomplete data remains largely unexplored. In this work, we reinterpret them through the lens of missing data. First, we show that their standard training procedure on fully observed data implicitly performs imputation under a missing completely at random mechanism, resulting in robust out-of-sample imputation performance in settings with high missingness. Second, we introduce the first principled framework for training them directly on incomplete datasets under general missingness mechanisms. Third, we leverage their amortized conditional density estimation to perform active information acquisition, i.e., sequentially selecting the most informative missing variables for downstream prediction or inference. Across a suite of real-world benchmarks, our Missingness-Aware Order-Agnostic Autoregressive Model (MO-ARM) consistently outperforms established imputation baselines.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.