ArXiv TLDR

Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation

🐦 Tweet
2604.26768

Weihang Su, Hanwen Zhang, Qingyao Ai, Yiqun Liu

cs.CL

TLDR

This paper introduces Orthogonal Subspace Decomposition (OSD) to disentangle task and knowledge in PRAG, improving compositional robustness.

Key contributions

  • Identifies entanglement of task behavior and document knowledge as a PRAG composition issue.
  • Proposes Orthogonal Subspace Decomposition (OSD) to separate task and document knowledge in adapters.
  • Trains a Task LoRA for reusable behavior and document LoRAs in an orthogonal subspace.
  • Demonstrates improved compositional robustness in multi-document PRAG through orthogonalization.

Why it matters

Current PRAG methods struggle with reliable adapter composition due to entangled knowledge and task behaviors. This paper introduces a novel approach, OSD, to disentangle these components, leading to more stable and focused knowledge retrieval. This significantly advances PRAG's potential for scalable and robust knowledge integration.

Original Abstract

Parametric Retrieval-Augmented Generation (PRAG) encodes external documents into lightweight parameter modules that can be retrieved and merged at inference time, offering a promising alternative to in-context retrieval augmentation. Despite its potential, many PRAG implementations train document adapters with task-supervised objectives, which may cause each adapter to encode both document-specific facts and reusable task-solving behavior. This entanglement may make adapter composition less reliable: when multiple adapters are merged at inference time, their overlapping task behaviors can accumulate together with document-specific updates, potentially making the merged adapter less stable and less focused on the intended document knowledge. To examine this issue, we explore Orthogonal Subspace Decomposition (OSD), an adapter-training setup that separates reusable task behavior from document-specific knowledge adapters. Concretely, we first train a Task LoRA to capture reusable task behavior, and then train document LoRAs to encode document-specific knowledge in a orthogonal subspace. This setup provides a controlled way to examine how orthogonalizing task and document LoRA updates affects adapter composition in multi-document PRAG. Experiments across multiple knowledge-intensive tasks and model scales suggest that this orthogonalization strategy can improve compositional robustness in parametric RAG, especially when multiple document adapters are merged.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.