ArXiv TLDR

Adding Compilation Metadata To Binaries To Make Disassembly Decidable

🐦 Tweet
2604.19628

Daniel Engel, Freek Verbeek, Pranav Kumar, Binoy Ravindran

cs.CRcs.PL

TLDR

This paper proposes adding compiler-intent metadata to binaries to improve software safety, analysis, and re-compilation without performance impact.

Key contributions

  • Introduces a binary format augmented with compiler-intent metadata.
  • Provides a tool to generate and insert this metadata into binaries.
  • Enables reliable lifting, analysis, instrumentation, and recompilation.
  • Metadata is 17% the size of DWARF with no runtime performance impact.

Why it matters

Binaries are opaque, hindering analysis and security. This paper improves software safety and maintainability by adding compiler-intent metadata, enabling more reliable analysis and recompilation. It offers a practical middle ground between stripped binaries and open source.

Original Abstract

The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramatically improve the safety and maintainability of software. In this paper, we propose a binary format that is a middle ground between a stripped black-box binary and open source. We provide a tool that generates metadata capturing the compiler's intent and inserts it into the binary. This metadata enables lifting to a correct and recompilable higher-level representation and makes analysis and instrumentation more reliable. Our evaluation shows that adding metadata does not affect runtime behavior or performance. Compared to DWARF, our metadata is roughly 17% of its size. We validate correctness by compiling a comprehensive set of real-world C and C++ binaries and demonstrating that they can be lifted, instrumented, and recompiled without altering their behavior.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.