Measuring Differences between Conditional Distributions using Kernel Embeddings

May 4, 20262605.02260

Peter Moskvichev, Siu Lun Chau, Dino Sejdinovic

stat.MLcs.LG

TLDR

This paper unifies kernel-based methods for comparing conditional distributions by introducing the Conditional Maximum Mean Discrepancy (CMMD) framework.

Key contributions

Introduces CMMD, a unified framework for kernel-based comparison of conditional distributions.
Defines CMMD levels (CMMD₀, CMMD₁, CMMD₂) and their mathematical connections.
Proposes a novel doubly robust estimator for CMMD, ensuring consistency if one model is correct.
Demonstrates CMMD's effectiveness in capturing complex conditional dependencies for statistical testing.

Why it matters

Comparing conditional distributions is crucial in many ML applications, but existing methods are fragmented. This paper provides a much-needed unified theoretical framework and a robust estimator, advancing non-parametric techniques for statistical testing.

Original Abstract

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers