ArXiv TLDR

The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

🐦 Tweet
2604.26651

Pedro R. Pires, Gregorio F. Azevedo, Rafael T. Sereicikas, Pietro L. Campos, Tiago A. Almeida

cs.IRcs.LG

TLDR

User state representation in CMAB recommender systems is more critical than the bandit algorithm itself, often yielding greater performance improvements.

Key contributions

  • Investigated impact of various embedding-based user state representations on CMAB recommender systems.
  • Demonstrated that user state representation variations can improve performance more than changing the bandit algorithm.
  • Revealed no single embedding or aggregation strategy consistently outperforms others across diverse datasets.
  • Highlights the critical need to prioritize embedding quality and state construction in bandit-based recommenders.

Why it matters

This paper exposes a critical blind spot in bandit-based recommender systems: the overlooked importance of user state representation. It demonstrates that improving state representation can yield greater performance gains than just optimizing algorithms. This work provides crucial guidance for advancing personalized recommendation by shifting focus to data quality.

Original Abstract

With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.