ArXiv TLDR

Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes

🐦 Tweet
2604.27394

Eichi Uehara

stat.MLcs.LG

TLDR

Bayesian X-Learner offers calibrated posterior inference for heterogeneous treatment effects, robust to heavy-tailed outcomes, a gap in existing methods.

Key contributions

  • Introduces Bayesian X-Learner for CATE, combining heterogeneous effects, calibrated uncertainty, and heavy-tail robustness.
  • Leverages cross-fitted doubly robust pseudo-outcomes and a full MCMC posterior for τ(x).
  • Utilizes a Welsch redescending pseudo-likelihood for robust inference under heavy-tailed data.
  • Demonstrates competitive performance on IHDP and robust recovery on contaminated "whale" DGPs.

Why it matters

This paper addresses a critical gap in CATE estimation by providing a method that simultaneously offers heterogeneous effects, calibrated uncertainty, and robustness to heavy-tailed outcomes. This is crucial for real-world applications where outcome data is often contaminated, leading to more reliable and practical causal inference.

Original Abstract

Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects $τ(x)$, calibrated uncertainty over them, and robustness to the heavy tails that contaminate real outcome data. Meta-learners (Künzel et al., 2019) give (i); causal forests and BART give (i)-(ii) with Gaussian-tail assumptions; no widely used tool gives all three. We present Bayesian X-Learner, an X-Learner built on cross-fitted doubly robust pseudo-outcomes (Kennedy, 2020) with a full MCMC posterior over $τ(x)$ via a Welsch redescending pseudo-likelihood. On Hill's IHDP benchmark the default configuration attains mean $\sqrt{\varepsilon_{\mathrm{PEHE}}} = 0.56$ on 5 replications (lowest mean; differences from S-/T-/X-learners, full-config Causal BART, and a causal forest baseline are not significant at $α=0.05$, and rank ordering is unstable at 10 replications -- IHDP comparisons are competitive rather than dominant). On contaminated "whale" DGPs with up to 20-25% tail density, a one-flag extension (contamination_severity) that selects a Huber-$δ$ nuisance loss per Huber's minimax-$δ$ relation recovers RMSE $\approx 0.13$ with tight credible intervals (single-cross-fit 30-seed coverage 83% [Wilson 66%, 93%] at 20% density; modular-Bayes pooling with Bayesian-bootstrap nuisance draws restores nominal 95% coverage).

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.