ArXiv TLDR

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

🐦 Tweet
2604.20749

Dongding Lin, Jian Wang, Yongqi Li, Wenjie Li

cs.AI

TLDR

SiPeR is a new framework for situated conversational recommendation that reasons about dynamic, implicit user preferences using scene transitions and Bayesian inference.

Key contributions

  • Addresses dynamic and implicit user preferences in situated conversational recommendation (SCR).
  • Introduces SiPeR, a framework with scene transition estimation for guiding users to suitable scenes.
  • Utilizes Bayesian inverse inference with MLLMs to predict item preferences within a scene.
  • Achieves superior recommendation accuracy and response generation quality on benchmarks.

Why it matters

This paper tackles the complex challenge of understanding evolving user preferences in real-world conversational recommendation systems. By integrating scene awareness and advanced inference, SiPeR significantly improves recommendation relevance and dialogue quality. This advancement is crucial for developing more intuitive and effective AI assistants.

Original Abstract

Situated conversational recommendation (SCR), which utilizes visual scenes grounded in specific environments and natural language dialogue to deliver contextually appropriate recommendations, has emerged as a promising research direction due to its close alignment with real-world scenarios. Compared to traditional recommendations, SCR requires a deeper understanding of dynamic and implicit user preferences, as the surrounding scene often influences users' underlying interests, while both may evolve across conversations. This complexity significantly impacts the timing and relevance of recommendations. To address this, we propose situated preference reasoning (SiPeR), a novel framework that integrates two core mechanisms: (1) Scene transition estimation, which estimates whether the current scene satisfies user needs, and guides the user toward a more suitable scene when necessary; and (2) Bayesian inverse inference, which leverages the likelihood of multimodal large language models (MLLMs) to predict user preferences about candidate items within the scene. Extensive experiments on two representative benchmarks demonstrate SiPeR's superiority in both recommendation accuracy and response generation quality. The code and data are available at https://github.com/DongdingLin/SiPeR.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.