H-RAG at SemEval-2026 Task 8: Hierarchical Parent-Child Retrieval for Multi-Turn RAG Conversations
Passant Elchafei, Hossam Emam, Mohamed Alansary, Monorama Swain, Markus Schedl
TLDR
H-RAG introduces a hierarchical parent-child retrieval pipeline for multi-turn RAG conversations, improving both retrieval and generation.
Key contributions
- Proposes H-RAG, a hierarchical parent-child RAG pipeline for multi-turn conversational settings.
- Separates fine-grained child-level retrieval from parent-level context reconstruction for generation.
- Utilizes hybrid dense-sparse search with rescoring over child chunks, aggregating evidence at parent level.
- Achieves strong results on SemEval-2026 Task 8, highlighting parent-level aggregation's importance.
Why it matters
This paper presents a significant advancement in multi-turn RAG by introducing a hierarchical retrieval method. It improves conversational AI by ensuring more coherent context and accurate information retrieval, crucial for robust RAG systems.
Original Abstract
We present H-RAG, our submission to SemEval-2026 Task 8 (MTRAGEval), addressing both Task A (Retrieval) and Task C (Generation with Retrieved Passages). Task A evaluates standalone retrieval quality, while Task C assesses end-to-end retrieval-augmented generation (RAG) in multi-turn conversational settings, requiring both accurate answer generation and faithful grounding in retrieved evidence. Our approach implements a hierarchical parent-child RAG pipeline that separates fine-grained child-level retrieval from parent-level context reconstruction during generation. Documents are segmented into overlapping sentence-based child chunks, while full documents are preserved as parent units to provide coherent context. Retrieval combines hybrid dense-sparse search, tunable weighting, and embedding-based similarity rescoring over child chunks. Retrieved evidence is aggregated at the parent level and supplied to an instruction-tuned language model for response generation. H-RAG achieves an nDCG@5 score of 0.4271 on Task A and a harmonic mean score of 0.3241 on Task C (RB_agg: 0.2488, RL_F: 0.2703, RB_llm: 0.6508), underscoring the importance of retrieval configuration and parent-level aggregation in multi-turn RAG performance.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.