RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems
Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta
TLDR
RUBEN is an interactive tool that uses minimal rule sets and novel pruning to explain retrieval-augmented LLM outputs and test their safety and resilience.
Key contributions
- Introduces RUBEN, an interactive tool for explaining retrieval-augmented LLM outputs.
- Leverages novel pruning strategies to find minimal, subsuming rule sets efficiently.
- Applies discovered rules to test LLM safety, including training resilience and adversarial prompts.
Why it matters
Explaining retrieval-augmented LLMs is crucial for trust and debugging. RUBEN offers a novel approach to uncover the underlying logic, enhancing transparency and providing a powerful method to rigorously test LLM safety against adversarial attacks. This improves the reliability and robustness of LLM systems.
Original Abstract
This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.