Michal Valko
22 papers · Latest:
Conditional outlier detection for clinical alerting
This paper presents a data-driven method for detecting anomalous patient-management actions in EHRs to alert for potential errors.
Bandits attack function optimization
This paper introduces Simultaneous Optimistic Optimization (SOO), a bandit-inspired algorithm for efficient function optimization under budget constraints.
Adaptive graph-based algorithms for conditional anomaly detection and semi-supervised learning
Adaptive graph-based algorithms are introduced for semi-supervised learning and conditional anomaly detection, with an online approximation and clinical application.
Bandits on graphs and structures
This thesis explores graph and structured bandit problems, addressing practical challenges in sequential decision-making with large action spaces.
Middle-mile logistics through the lens of goal-conditioned reinforcement learning
This paper applies goal-conditioned reinforcement learning and graph neural networks to optimize parcel routing in middle-mile logistics networks.
Evolutionary feature selection for spiking neural network pattern classifiers
This paper extends evolutionary feature selection to JASTAP spiking neural networks, enabling smaller, more robust classifiers for noisy data.
Spectral bandits
This paper introduces "spectral bandits," an online learning framework for graph-based problems like recommendations, using smooth payoffs and effective dimension.
Online learning with Erdős-Rényi side-observation graphs
This paper introduces two novel algorithms for multi-armed bandits with probabilistic side observations, achieving near-optimal regret bounds for unknown observation rates.
Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
SQUEAK is a new algorithm for kernel ridge regression that uses adaptive dictionary learning to achieve efficient Nystrom approximations with reduced space complexity.
Pliable rejection sampling
Pliable Rejection Sampling (PRS) learns proposals via kernel estimation, providing i.i.d. samples with high probability and guaranteed acceptance rates.
A single algorithm for both restless and rested rotting bandits
RAW-UCB is a novel algorithm that achieves near-optimal regret in both restless and rested rotting bandit settings, unifying previously distinct problems.
On two ways to use determinantal point processes for Monte Carlo integration
This paper explores and generalizes two determinantal point process (DPP) methods for Monte Carlo integration, offering improved variance rates.
Planning in entropy-regularized Markov decision processes and games
SmoothCruiser is a new planning algorithm for entropy-regularized MDPs and games, achieving O~(1/epsilon^4) sample complexity.
Budgeted Online Influence Maximization
This paper introduces a new budgeted framework for online influence maximization, optimizing total campaign cost over influencer count.
Adaptive multi-fidelity optimization with fast learning rates
Kometo is a new adaptive multi-fidelity optimization algorithm that achieves fast learning rates and improves upon prior guarantees without needing problem-specific knowledge.
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
This paper establishes sample complexity bounds for learning ε-optimal policies in Stochastic Shortest Path problems, revealing challenges when minimum costs are zero.
The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
New algorithms achieve optimal last-iterate convergence rates for uncoupled learning in zero-sum games with bandit feedback, despite inherent challenges.
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
This paper uses log-barrier regularization to achieve optimal O-tilde(t^{-1/4}) last-iterate convergence in zero-sum matrix games.
Best of both worlds: Stochastic & adversarial best-arm identification
This paper introduces an algorithm for best-arm identification that performs optimally in stochastic bandit problems while remaining robust to adversarial rewards.
Online learning with noisy side observations
This paper introduces a new online learning model with noisy side observations and an efficient, parameter-free algorithm achieving $\widetilde{O}(\sqrt{\alpha^* T})$ regret.
Spectral Thompson sampling
SpectralTS efficiently solves graph bandit problems by leveraging an effective dimension, achieving comparable regret with improved computational performance.
The Llama 3 Herd of Models
Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.