ArXiv TLDR

A Reproducibility Study of Metacognitive Retrieval-Augmented Generation

🐦 Tweet
2604.19899

Gabriel Iturra-Bocaz, Petra Galuscakova

cs.IR

TLDR

This paper reproduces MetaRAG, confirming its benefits but finding lower absolute scores, and shows it improves with reranking.

Key contributions

  • Reproduced MetaRAG, confirming its relative benefits over standard RAG.
  • Observed lower absolute scores than reported, citing LLM updates and missing details.
  • Showed MetaRAG substantially improves with PointWise and ListWise rerankers.
  • Found MetaRAG more robust than SIM-RAG when extended with retrieval features.

Why it matters

This reproducibility study provides crucial insights into MetaRAG's real-world performance and implementation challenges. It highlights the impact of external factors like LLM updates on research reproducibility. The findings also demonstrate practical ways to enhance MetaRAG's effectiveness through reranking.

Original Abstract

Recently, Retrieval Augmented Generation (RAG) has shifted focus to multi-retrieval approaches to tackle complex tasks such as multi-hop question answering. However, these systems struggle to decide when to stop searching once enough information has been gathered. To address this, \citet{zhou2024metacognitive} introduced Metacognitive Retrieval Augmented Generation (MetaRAG), a framework inspired by metacognition that enables Large Language Models to critique and refine their reasoning. In this reproducibility paper, we reproduce MetaRAG following its original experimental setup and extend it in two directions: (i) by evaluating the effect of PointWise and ListWise rerankers, and (ii) by comparing with SIM-RAG, which employs a lightweight critic model to stop retrieval. Our results confirm MetaRAG's relative improvements over standard RAG and reasoning-based baselines, but also reveal lower absolute scores than reported, reflecting challenges with closed-source LLM updates, missing implementation details, and unreleased prompts. We show that MetaRAG is partially reproduced, gains substantially from reranking, and is more robust than SIM-RAG when extended with additional retrieval features.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.