ArXiv TLDR

DuET: Dual Execution for Test Output Prediction with Generated Code and Pseudocode

🐦 Tweet
2604.11514

Hojae Han, Jaejin Kim, Seung-won Hwang, Yu Jin Kim, Moontae Lee

cs.SEcs.CL

TLDR

DuET introduces a dual-execution framework for test output prediction, combining direct code execution and LLM-based pseudocode reasoning for improved reliability.

Key contributions

  • Proposes DuET, a dual-execution framework for reliable test output prediction.
  • Combines direct code execution with LLM-based pseudocode reasoning for grounding.
  • Uses functional majority voting to overcome code errors and pseudocode hallucinations.
  • Achieves state-of-the-art performance on LiveCodeBench, improving Pass@1 by 13.6 pp.

Why it matters

This paper addresses a critical challenge in test case generation by improving LLM-predicted output reliability. DuET's novel dual-execution approach significantly boosts prediction accuracy, making test generation more robust. Its state-of-the-art results offer a practical advancement for developers.

Original Abstract

This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more error-resilient pseudocode and simulates execution via LLM reasoning. We further propose DuET, a dual-execution framework that combines both approaches by functional majority voting. Our analysis shows the two approaches are complementary in overcoming the limitations of direct execution suffering from code errors, and pseudocode reasoning from hallucination. On LiveCodeBench, DuET achieves the state-of-the-art performance, improving Pass@1 by 13.6 pp.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.