RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

May 11, 20262605.10862

Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

cs.CL

TLDR

RUBEN is an interactive tool that uses minimal rule sets and novel pruning to explain retrieval-augmented LLM outputs and test their safety and resilience.

Key contributions

Introduces RUBEN, an interactive tool for explaining retrieval-augmented LLM outputs.
Leverages novel pruning strategies to find minimal, subsuming rule sets efficiently.
Applies discovered rules to test LLM safety, including training resilience and adversarial prompts.

Why it matters

Explaining retrieval-augmented LLMs is crucial for trust and debugging. RUBEN offers a novel approach to uncover the underlying logic, enhancing transparency and providing a powerful method to rigorously test LLM safety against adversarial attacks. This improves the reliability and robustness of LLM systems.

Original Abstract

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers