DuET: Dual Execution for Test Output Prediction with Generated Code and Pseudocode
Hojae Han, Jaejin Kim, Seung-won Hwang, Yu Jin Kim, Moontae Lee
TLDR
DuET introduces a dual-execution framework for test output prediction, combining direct code execution and LLM-based pseudocode reasoning for improved reliability.
Key contributions
- Proposes DuET, a dual-execution framework for reliable test output prediction.
- Combines direct code execution with LLM-based pseudocode reasoning for grounding.
- Uses functional majority voting to overcome code errors and pseudocode hallucinations.
- Achieves state-of-the-art performance on LiveCodeBench, improving Pass@1 by 13.6 pp.
Why it matters
This paper addresses a critical challenge in test case generation by improving LLM-predicted output reliability. DuET's novel dual-execution approach significantly boosts prediction accuracy, making test generation more robust. Its state-of-the-art results offer a practical advancement for developers.
Original Abstract
This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more error-resilient pseudocode and simulates execution via LLM reasoning. We further propose DuET, a dual-execution framework that combines both approaches by functional majority voting. Our analysis shows the two approaches are complementary in overcoming the limitations of direct execution suffering from code errors, and pseudocode reasoning from hallucination. On LiveCodeBench, DuET achieves the state-of-the-art performance, improving Pass@1 by 13.6 pp.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.