ArXiv TLDR

Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

🐦 Tweet
2604.07739

Cathy Jiao, Juan Elenter, Praveen Ravichandran, Bernd Huber, Joseph Cauteruccio + 5 more

cs.IRcs.LG

TLDR

This paper proposes efficient data selection using gradient-based representations and distribution-matching to continually adapt generative recommenders.

Key contributions

  • Explores targeted data selection to counter performance degradation from temporal data drift.
  • Evaluates representation choices and sampling strategies for curating small, informative data subsets.
  • Shows gradient-based representations with distribution-matching boost model performance.
  • Achieves training efficiency and robustness to drift in generative recommenders.

Why it matters

Full retraining of recommendation systems is impractical due to data volume. This work offers a scalable solution via efficient data curation, enabling continuous adaptation and robust performance in production environments.

Original Abstract

Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection can mitigate performance degradation caused by temporal distributional drift while maintaining scalability. We evaluate a range of representation choices and sampling strategies for curating small but informative subsets of user interaction data. Our results demonstrate that gradient-based representations, coupled with distribution-matching, improve downstream model performance, achieving training efficiency gains while preserving robustness to drift. These findings highlight data curation as a practical mechanism for scalable monitoring and adaptive model updates in production-scale recommendation systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.