MASFuzzer: Fuzz Driver Generation and Adaptive Scheduling via Multidimensional API Sequences
Xingyu Liu, Zengqin Huang, Xiang Gao, Hailong Sun
TLDR
MASFuzzer automates fuzz driver generation using multidimensional API sequences and adaptive scheduling to find more bugs and improve code coverage.
Key contributions
- Synthesizes context-relevant API call sequences using usage examples and semantic-aware mining.
- Uses generated API sequences as a basis for LLMs to create effective initial fuzz drivers.
- Prioritizes promising drivers and mutates them to systematically explore untested code regions.
- Achieves 8.54% higher code coverage and found 16 new vulnerabilities, 9 with CVEs.
Why it matters
MASFuzzer addresses the limitations of manual and LLM-based fuzz driver generation by systematically exploring complex program behaviors. It significantly improves code coverage and uncovers critical vulnerabilities in widely used libraries, making fuzz testing more efficient and practical.
Original Abstract
Fuzz testing of software libraries relies on fuzz drivers to invoke library APIs. Traditionally, these drivers are written manually by developers - a process that is time-consuming and often inadequate for exercising complex program behaviors. While recent studies have explored the use of Large Language Models (LLMs) to automate fuzz driver generation, the resulting drivers often fail to cover deep program branches. To address these challenges, we propose MASFUZZER, a fuzzing framework that integrates multidimensional API sequence construction with adaptive fuzzing scheduling strategies to improve library testing. At its core, MASFUZZER synthesizes context-relevant API call sequences by referring to API usage examples from the codebase and applying mutation-propagation-based and semantic-aware API sequence mining. These multidimensional API sequences serve as the basis for LLMs to generate effective initial drivers. In addition, MASFUZZER incorporates a coverage-guided scheduler that prioritizes testing time for the most promising drivers, along with a driver mutation strategy to evolve them. This enables systematic generation of fuzz drivers to explore previously untested code regions. We evaluate MASFUZZER on 12 widely used open-source libraries. The results show that MASFUZZER achieves 8.54 percent higher code coverage than state-of-the-art techniques. Moreover, MASFUZZER uncovers 16 previously unknown vulnerabilities in extensively tested libraries, with 14 confirmed by developers and 9 assigned CVE identifiers. These results indicate that MASFUZZER provides an efficient and practical approach for fuzzing software libraries.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.