Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

April 15, 20262604.13410

stat.MEcs.LGstat.ML

TLDR

A two-stage kernel ridge regression method estimates continuous treatment effects by correcting for confounding and adapting to simpler effect function structures.

Key contributions

Proposes a two-stage kernel ridge regression method for estimating continuous treatment effects.
First stage models response; second stage uses pseudo-outcomes to correct for confounding bias.
Adapts to the simpler structure of the induced effect function by averaging over covariates.
Features data-driven model selection, adaptive to unknown overlap and kernel regularity.

Why it matters

This paper addresses the critical challenge of confounding bias when estimating continuous treatment effects, a common problem in fields like medicine and economics. It offers a robust two-stage kernel ridge regression method that provides accurate and adaptive estimation. The data-driven approach ensures reliable results even with varying data characteristics.

Original Abstract

We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the regularity (eigenvalue decay) of the underlying kernel.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers