Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models

May 2, 20262605.01591

Amin Bigdeli, Amir Khosrojerdi, Radin Hamidi Rad, Morteza Zihayat, Charles L. A. Clarke + 1 more

cs.IRcs.CL

TLDR

CRAFT is an LLM-powered black-box framework for adversarial attacks on Neural Ranking Models, outperforming baselines and showing transferability.

Key contributions

Introduces CRAFT, an LLM-powered black-box framework for adversarial attacks on Neural Ranking Models.
CRAFT uses RAG, self-refinement, fine-tuning, and preference-guided optimization for content injection.
Significantly outperforms state-of-the-art baselines on MS MARCO and TREC benchmarks.
Demonstrates effective transferability across diverse NRM architectures, including LLM-based rankers.

Why it matters

This paper offers a principled framework for studying adversarial threats in Neural Ranking Models. It highlights the risks of generative AI in rank manipulation and lays a foundation for building more robust retrieval systems.

Original Abstract

Neural Ranking Models (NRMs) are central to modern information retrieval but remain highly vulnerable to adversarial manipulation. Existing attacks often rely on heuristics or surrogate models, limiting effectiveness and transferability. We propose CRAFT, a supervised framework for black-box adversarial rank attacks powered by large language models (LLMs). CRAFT operates in three stages: adversarial dataset generation via retrieval-augmented generation and self-refinement, supervised fine-tuning on curated adversarial examples, and preference-guided optimization to align generations with rank-promotion objectives. Extensive experiments on the MS MARCO passage dataset, TREC Deep Learning 2019, and TREC Deep Learning 2020 benchmarks show that CRAFT significantly outperforms state-of-the-art baselines, achieving higher promotion rates and rank boosts while preserving fluency and semantic fidelity. Moreover, CRAFT transfers effectively across diverse ranking architectures, including cross-encoder, embedding-based, and LLM-based rankers, underscoring vulnerabilities in real-world retrieval systems. This work provides a principled framework for studying adversarial threats in NRMs, underscores the risks of generative AI in rank manipulation, and provides a foundation for developing more robust retrieval systems. To support reproducibility, we publicly release our source code, trained models, and prompt templates.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers