DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation
Valeriy Kovalskiy, Nikita Belov, Nikita Miteyko, Igor Reshetnikov, Max Maximov
TLDR
DCD introduces a domain-oriented design for RAG systems, using hierarchical knowledge and multi-stage routing to improve accuracy and relevance.
Key contributions
- Structures RAG knowledge hierarchically (Domain-Collection-Document) for better organization.
- Employs multi-stage routing to progressively restrict retrieval and generation scopes.
- Integrates smart chunking, hybrid retrieval, and generation guardrails for enhanced performance.
Why it matters
This paper addresses critical limitations of naive RAG in complex, real-world applications. By introducing a structured, domain-oriented approach, DCD significantly enhances the reliability and accuracy of LLM-based knowledge retrieval. It offers a practical solution for building more robust RAG systems.
Original Abstract
Retrieval-Augmented Generation (RAG) is widely used to ground large language models in external knowledge sources. However, when applied to heterogeneous corpora and multi-step queries, Naive RAG pipelines often degrade in quality due to flat knowledge representations and the absence of explicit workflows. In this work, we introduce DCD (Domain-Collection-Document), a domain-oriented design to structure knowledge and control query processing in RAG systems without modifying the underlying language model. The proposed approach relies on a hierarchical decomposition of the information space and multi-stage routing based on structured model outputs, enabling progressive restriction of both retrieval and generation scopes. The architecture is complemented by smart chunking, hybrid retrieval, and integrated validation and generation guardrail mechanisms. We describe the DCD architecture and workflow and discuss evaluation results on synthetic evaluation dataset, highlighting their impact on robustness, factual accuracy, and answer relevance in applied RAG scenarios.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.