ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs
Hongyan Gao, Yibiao Yang, Maolin Sun, Jiangchang Wu, Yuming Zhou + 1 more
TLDR
ClozeMaster uses LLMs to infill masked Rust code from historical bugs, effectively fuzzing the Rust compiler and finding new bugs.
Key contributions
- Introduces `clozeMask`, a bracket-based masking and LLM infilling strategy for Rust compiler fuzzing.
- Extracts test code from historical bug reports and masks specific structures to guide LLM generation.
- Implemented as CLOZEMASTER, which found 27 confirmed bugs (10 fixed) in `rustc` and `mrustc`.
- CLOZEMASTER significantly outperforms existing fuzzers in terms of code coverage and effectiveness.
Why it matters
Ensuring Rust compiler reliability is crucial due to its growing use in critical systems. This paper presents an innovative LLM-guided fuzzing approach that overcomes challenges of generating valid Rust tests. Its success in finding real bugs and outperforming existing tools makes it a significant advancement in compiler testing.
Original Abstract
Ensuring the reliability of the Rust compiler is of paramount importance, given increasing adoption of Rust for critical systems development, due to its emphasis on memory and thread safety. However, generating valid test programs for the Rust compiler poses significant challenges, given Rust's complex syntax and strict requirements. With the growing popularity of large language models (LLMs), much research in software testing has explored using LLMs to generate test cases. Still, directly using LLMs to generate Rust programs often results in a large number of invalid test cases. Existing studies have indicated that test cases triggering historical compiler bugs can assist in software testing. Our investigation into Rust compiler bug issues supports this observation. Inspired by existing work and our empirical research, we introduce a bracket-based masking and filling strategy called clozeMask. The clozeMask strategy involves extracting test code from historical issue reports, identifying and masking code snippets with specific structures, and using an LLM to fill in the masked portions for synthesizing new test programs. This approach harnesses the generative capabilities of LLMs while retaining the ability to trigger Rust compiler bugs. It enables comprehensive testing of the compiler's behavior, particularly exploring edge cases. We implemented our approach as a prototype CLOZEMASTER. CLOZEMASTER has identified 27 confirmed bugs for rustc and mrustc, of which 10 have been fixed by developers. Furthermore, our experimental results indicate that CLOZEMASTER outperforms existing fuzzers in terms of code coverage and effectiveness.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.