Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
Haochun Tang, Yuliang Yan, Jiahua Lu, Huaxiao Liu, Enyan Dai
TLDR
R$^2$A is a novel black-box attack that manipulates LLM routers to select expensive models, demonstrating a new security vulnerability.
Key contributions
- Introduces R$^2$A, a black-box adversarial suffix optimization attack on LLM routers.
- Proposes a hybrid ensemble surrogate router to effectively mimic black-box routing systems.
- Adapts a suffix optimization algorithm for the ensemble surrogate to generate attack suffixes.
- Demonstrates R$^2$A's effectiveness on open-source and commercial LLM routing systems.
Why it matters
This paper highlights a critical security vulnerability in cost-aware LLM routing systems. By demonstrating a practical black-box attack, it urges developers to enhance the robustness of their routing mechanisms. The findings are crucial for securing LLM deployments against malicious cost manipulation.
Original Abstract
Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing strategy introduces a new security concern that adversaries may manipulate the router to consistently select expensive high-capability models. Existing routing attacks depend on either white-box access or heuristic prompts, rendering them ineffective in real-world black-box scenarios. In this work, we propose R$^2$A, which aims to mislead black-box LLM routers to expensive models via adversarial suffix optimization. Specifically, R$^2$A deploys a hybrid ensemble surrogate router to mimic the black-box router. A suffix optimization algorithm is further adapted for the ensemble-based surrogate. Extensive experiments on multiple open-source and commercial routing systems demonstrate that {R$^2$A} significantly increases the routing rate to expensive models on queries of different distributions. Code and examples: https://github.com/thcxiker/R2A-Attack.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.