Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

April 9, 20262604.08083

Li Hu, Xiuwei Shang, Jieke Shi, Shaoyin Cheng, Junqi Zhang + 4 more

cs.SE

TLDR

LLMs can deobfuscate binary code, but performance relies on reasoning and task-specific fine-tuning, not just model size.

Key contributions

Introduces BinDeObfBench, the first comprehensive benchmark for LLM-based binary deobfuscation.
Deobfuscation performance depends more on LLM reasoning and domain expertise than model scale.
Task-specific fine-tuning consistently outperforms broad domain pre-training for this task.
Reasoning models show robustness under severe obfuscation and generalize across ISAs.

Why it matters

This paper provides the first systematic evaluation of LLMs for binary code deobfuscation, a critical challenge in reverse engineering. It highlights that specialized training and reasoning capabilities are key, not just model size, guiding future LLM development in this domain.

Original Abstract

Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have shown promise in recovering semantics from obfuscated binaries, a systematic evaluation of their effectiveness is still lacking. In this work, we present BinDeObfBench, the first comprehensive benchmark for assessing LLM-based binary deobfuscation across diverse transformations spanning pre-compilation, compile-time, and post-compilation stages. Our evaluation shows that deobfuscation performance depends more on reasoning capability and domain expertise than on model scale, and that task-specific supervised fine-tuning consistently outperforms broad domain pre-training. Reasoning models can maintain robustness under severe obfuscation, generalize across different instruction set architectures (ISAs) and optimization levels. In-context learning benefits standard models but yields limited gains for reasoning models. Overall, our study highlights the importance of task-specific fine-tuning and reasoning-driven strategies, and positions BinDeObfBench as a basis for future work in binary deobfuscation.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers