CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference
TLDR
CoDe-R refines decompiler output using LLMs with rationale guidance and adaptive inference, achieving SOTA re-executability for lightweight models.
Key contributions
- Introduces CoDe-R, a two-stage framework for refining decompiler output using LLMs to overcome logical hallucinations.
- Semantic Cognitive Enhancement (SCE) trains the model to recover high-level algorithmic intent via rationale guidance.
- Dynamic Dual-Path Fallback (DDPF) adaptively balances semantic recovery and syntactic stability during inference.
- Achieves new SOTA on HumanEval-Decompile, being the first 1.3B model to exceed 50% average re-executability.
Why it matters
Binary decompilation is crucial for reverse engineering, yet LLMs struggle with generating re-executable code. CoDe-R significantly improves the reliability of LLM-based decompilation, making efficient models more practical. This advancement helps bridge the gap between lightweight LLMs and expert-level performance in this critical task.
Original Abstract
Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical hallucinations" and "semantic misalignment" due to the irreversible semantic loss during compilation, resulting in generated code that fails to re-execute. In this study, we propose Cognitive Decompiler Refinement with Robustness (CoDe-R), a lightweight two-stage code refinement framework. The first stage introduces Semantic Cognitive Enhancement (SCE), a Rationale-Guided Semantic Injection strategy that trains the model to recover high-level algorithmic intent alongside code. The second stage introduces a Dynamic Dual-Path Fallback (DDPF) mechanism during inference, which adaptively balances semantic recovery and syntactic stability via a hybrid verification strategy. Evaluation on the HumanEval-Decompile benchmark demonstrates that CoDe-R (using a 1.3B backbone) establishes a new State-of-the-Art (SOTA) in the lightweight regime. Notably, it is the first 1.3B model to exceed an Average Re-executability Rate of 50.00%, significantly outperforming the baseline and effectively bridging the gap between efficient models and expert-level performance. Our code is available at https://github.com/Theaoi/CoDe-R.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.