Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

April 20, 20262604.17984

Junyoung Yang, Kyungmin Kim, Sangdon Park

cs.LGstat.ML

TLDR

This paper proposes a new online conformal prediction method for adversarial semi-bandit feedback, ensuring coverage via regret minimization.

Key contributions

Introduces online conformal prediction for adversarial semi-bandit feedback, where labels are partially observed.
Formulates the problem as an adversarial bandit, treating each candidate prediction set as an arm.
Establishes long-run coverage guarantees by explicitly linking them to the learner's regret.
Empirically demonstrates effective miscoverage rate control and reasonable prediction set sizes.

Why it matters

Uncertainty quantification is vital for safety-critical systems. Current online conformal prediction methods assume full feedback, limiting their real-world applicability. This work extends these methods to handle partial, adversarial feedback, making them more robust and practical for dynamic, uncertain environments.

Original Abstract

Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a full feedback setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with partial feedback from an adaptive adversary-a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings, showing that it successfully controls the miscoverage rate while maintaining a reasonable size of the prediction set.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers