ArXiv TLDR

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

🐦 Tweet
2604.23640

Yuchuan Zhao, Tong Chen, Junliang Yu, Zongwei Wang, Lizhen Cui + 1 more

cs.IR

TLDR

This paper introduces PUDA, a novel black-box attack framework that promotes items in LLM-based sequential recommenders without knowing the model or prompt.

Key contributions

  • Addresses item promotion attacks on LLM-SRSs under a realistic full black-box setting.
  • Proposes PUDA, which infers system prompts via an LLM-based evolutionary strategy.
  • Trains a surrogate model to enable adversarial text revision and poisoning sequence generation.
  • Demonstrates superior performance in boosting unpopular items compared to state-of-the-art methods.

Why it matters

This paper uncovers significant security vulnerabilities in modern LLM-based sequential recommender systems, even when their internal prompts and models are fully protected. It highlights the urgent need for more robust defensive mechanisms against sophisticated black-box attacks like PUDA.

Original Abstract

Large language model-powered sequential recommender systems (LLM-SRSs) have recently demonstrated remarkable performance, enabling recommendations through prompt-driven inference over user interaction sequences. However, this paradigm also introduces new security vulnerabilities, particularly text-level manipulations, rendering them appealing targets for promotion attacks that purposely boost the ranking of specific target items. Although such security risks have been receiving increasing attention, existing studies typically rely on an unrealistic assumption of access to either the victim model or prompt to unveil attack mechanisms. In this work, we investigate the item promotion attack in LLM-SRSs under a more realistic setting where both the system prompt and victim model are unknown to the attacker, and propose a Prompt-Unknown Dual-poisoning Attack (PUDA) framework. To simulate attacks under this full black-box setting, we introduce an LLM-based evolutionary refinement strategy that infers discrete system prompts, enabling the training of an effective surrogate model that mimics the behaviors of the victim model. Leveraging the distilled prompt and surrogate model, we devise a promotion attack that adversarially revises target item texts under semantic constraints, which is further complemented by the highly plausible, surrogate-generated poisoning sequences to enable cost-effective target item promotion. Extensive experiments on real-world datasets demonstrate that PUDA consistently outperforms state-of-the-art competitors in boosting the exposure of unpopular target items. Our findings reveal critical security risks in modern LLM-SRSs even when both prompts and models are protected, and highlight the need for more robust defensive means.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.