ArXiv TLDR

R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability

🐦 Tweet
2604.22432

Yifei Wang, Jacky Keung, Xiaoxue Ma, Zhenyu Mao, Kehui Chen + 1 more

cs.SE

TLDR

R2Code is an LLM framework that boosts requirement-to-code traceability accuracy and cuts inference costs via semantic alignment and adaptive context retrieval.

Key contributions

  • Decomposition-enhanced Bidirectional Alignment Network (BAN) aligns requirement semantics with code structures.
  • Self-Reflective Consistency Verification (SRCV) module improves link reliability through explanation-guided checking.
  • Dynamic Context-Adaptive Retrieval (DCAR) mechanism efficiently filters contexts, reducing token consumption.

Why it matters

Accurate requirement-to-code traceability is vital for software maintenance. R2Code offers a more accurate and cost-effective LLM-based solution, overcoming lexical similarity issues. Its novel components significantly improve link reliability and reduce computational overhead, making it a practical advancement.

Original Abstract

Accurate requirement-to-code traceability is crucial for software maintenance. However, existing IR- and embedding-based methods are heavily dependent on lexical similarity, often yielding incomplete or inconsistent links across projects and languages and incurring high cost from long-context retrieval and prompting. This paper presents R2Code, an LLM-based semantic traceability framework designed to improve trace link accuracy while reducing inference cost. R2Code integrates three components: 1) a decomposition-enhanced Bidirectional Alignment Network (BAN) that aligns four-layer requirement semantics with corresponding code structures to support cross-level semantic matching; 2) a Self-Reflective Consistency Verification (SRCV) module that conducts explanation-guided consistency checking to calibrate link reliability; and 3) a Dynamic Context-Adaptive Retrieval (DCAR) mechanism that adjusts retrieval granularity and filters contexts using semantic-overlap weighting for efficient context utilization. Experiments on five public datasets spanning multiple domains and two programming languages demonstrate that R2Code consistently outperforms the strongest baselines, achieving an average F1 gain of 7.4%, while reducing token consumption by up to 41.7% through adaptive context control.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.