Statistical Machine Learning

Statistical approaches to machine learning, Bayesian methods, and theoretical foundations.

stat.ML · 377 papers

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

This paper provides the first conditions for sparse recovery using mixed-quality data, revealing differences between information-theoretic and algorithmic thresholds.

2605.10713May 11, 2026Youssef Chaabouni, David Gamarnik

Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

This paper shows Natural Policy Gradient is a doubly smoothed policy iteration, proving its global geometric convergence and optimal complexity.

2605.10671May 11, 2026Phalguni Nanda, Zaiwei Chen

When Can Digital Personas Reliably Approximate Human Survey Findings?

This paper evaluates when LLM-powered digital personas can reliably substitute human survey respondents, finding they align distributionally but struggle with individual predictions.

2605.10659May 11, 2026Mumin Jia, Yilin Chen, Divya Sharma +1

Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks

An amortized, in-context learning method for causal sensitivity analysis drastically speeds up computation compared to per-instance methods.

2605.10590May 11, 2026Emil Javurek, Dennis Frauen, Marie Brockschmidt +2

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

This paper introduces a multi-modal fusion framework for long-tailed recognition in class-imbalanced data, outperforming single-modal methods.

2605.10498May 11, 2026Heegeon Yoon, Heeyoung Kim

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

This paper reveals that common semi-simulated benchmarks and counterfactual metrics for treatment effect estimation don't align with real-world performance.

2605.10430May 11, 2026George Panagopoulos

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

This paper reveals sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks, unifying two distinct learning regimes.

2605.10395May 11, 2026Minh-Toan Nguyen, Jean Barbier

Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs

This paper introduces a novel regret analysis framework for guided-diffusion black-box optimization, explaining its strong performance on structured inputs.

2605.10385May 11, 2026Masaki Adachi, Anita Yang, Yakun Wang +1

Multifidelity Gaussian process regression for solving nonlinear partial differential equations

This paper proposes a multifidelity Gaussian process regression method using cokriging for learning optimal kernels to solve nonlinear PDEs efficiently.

2605.10383May 11, 2026Fatima-Zahrae El-Boukkouri, Josselin Garnier, Olivier Roustant

Uncertainty in Physics and AI: Taxonomy, Quantification, and Validation

This paper offers a unified taxonomy, quantification, and validation framework for uncertainty in ML for physics applications.

2605.10378May 11, 2026Manuel Haußmann, Ramon Winterhalder, Maria Ubiali

Quantifying the Risk-Return Tradeoff in Forecasting

This paper introduces a framework to quantify forecast reliability using risk-adjusted financial performance measures, revealing professional forecasters' robust performance.

2605.09712May 10, 2026Philippe Goulet Coulombe

A Note on Non-Negative $L_1$-Approximating Polynomials

This paper proves the existence of non-negative $L_1$-approximating polynomials for Gaussian distributions, matching optimal degree bounds.

2605.08072May 8, 2026Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

DR-ME is a new semiparametrically efficient test that identifies specific locations where treatment effects alter outcome distributions, unlike global tests.

2605.08034May 8, 2026Houssam Zenati, Arthur Gretton

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

This paper introduces penalty-based first-order methods for bilevel optimization with minimax lower-level problems, achieving improved complexity bounds.

2605.08006May 8, 2026Yiyang Shen, Yutian He, Weiran Wang +1

Flow Matching for Count Data

Introduces count-FM, a novel flow-matching framework for high-dimensional count data, achieving better sample quality and efficiency.

2605.07746May 8, 2026Ganchao Wei, John Pearson

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

Feedforward neural networks definable in o-minimal structures, including MLPs, CNNs, and transformers, possess finite PAC sample complexity.

2605.07097May 8, 2026Anastasis Kratsios, Gregory Cousins, Haitz Sáez de Ocáriz Borde +2

Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

Causal EpiNets use an anchored neural architecture and Epistemic Neural Networks to provide precision-corrected, valid bounds on individual treatment effects.

2605.07065May 8, 2026Gandharv Patil, Keyi Tang, Raquel Aoki +1

Response Time Enhances Alignment with Heterogeneous Preferences

This paper shows that using user response times can accurately align LLMs with diverse human preferences, overcoming limitations of standard choice-only methods.

2605.06987May 7, 2026Federico Echenique, Alireza Fallah, Baihe Huang +1

Online Bayesian Calibration under Gradual and Abrupt System Changes

BRPC is an online Bayesian calibration method that adapts to gradual and abrupt system changes, improving accuracy and robustness in digital twin applications.

2605.06612May 7, 2026Yang Xu, Chiwoo Park

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

This paper reveals attention sink origins in LLMs: variance discrepancy, super neurons, and dimension disparity, proposing `head-wise RMSNorm` for mitigation.

2605.06611May 7, 2026Siquan Li, Kaiqi Jiang, Jiacheng Sun +1

PreviousPage 2 of 19Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.