Convolutional Maximum Mean Discrepancy for Inference in Noisy Data
Ritwik Vashistha, Jeff M. Phillips, Abhra Sarkar, Arya Farahi
TLDR
This paper introduces Convolutional Maximum Mean Discrepancy (convMMD) for robust statistical inference in data contaminated by measurement noise.
Key contributions
- Introduces Convolutional MMD (convMMD) for robust inference with noisy, heteroscedastic data.
- Establishes finite-sample deviation bounds for convMMD, unaffected by measurement error.
- Proves an equivalence between testing under noise and kernel smoothing techniques.
- Presents a consistent and asymptotically normal convMMD estimator with efficient SGD implementation.
Why it matters
Measurement noise degrades inference, and existing corrections are often costly. This paper introduces Convolutional MMD, an efficient framework for robust, distribution-free inference in noisy, heteroscedastic data, offering a practical solution for science.
Original Abstract
Modern data analyses frequently encounter settings where samples of variables are contaminated by measurement error. Ignoring measurement noise can substantially degrade statistical inference, while existing correction techniques are often computationally costly and inefficient. Recent advances in kernel methods, particularly those based on Maximum Mean Discrepancy (MMD), have enabled flexible, distribution-free inference, yet typically assume precise data and overlook contamination by measurement error. In this work, we introduce a novel framework for inference with samples corrupted by potentially heteroscedastic noise from a known distribution. Central to our approach is the convolutional MMD (convMMD), which compares distributions after noise convolution and retains metric validity under standard kernel conditions. We establish finite-sample deviation bounds that are unaffected by measurement error and prove an equivalence between testing under noise and kernel smoothing. Leveraging these insights, we introduce a convMMD-based estimator for inference with noisy, heteroscedastic observations. We establish its consistency and asymptotic normality, and provide an efficient implementation using stochastic gradient descent. We demonstrate the practical effectiveness of our approach through simulations and applications in astronomy and social sciences.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.