FunFuzz: An LLM-Powered Evolutionary Fuzzing Framework

May 4, 20262605.02789

Mario Rodríguez Béjar, B. Romera-Paredes, Jose L. Hernández-Ramos

cs.CRcs.CL

TLDR

FunFuzz is an LLM-powered evolutionary fuzzer that uses a multi-island approach to enhance exploration, achieving superior compiler coverage and unique failures.

Key contributions

Introduces FunFuzz, a multi-island evolutionary fuzzing framework powered by LLMs.
Addresses LLM-driven fuzzing sensitivity by running parallel, isolated searches with candidate migration.
Dynamically adapts LLM prompts using documentation-derived initial prompts and feedback-guided selection.
Achieves higher compiler coverage and discovers more unique failures on GCC and Clang than baselines.

Why it matters

LLM-driven fuzzing often struggles with prompt sensitivity and redundant inputs. FunFuzz offers a novel multi-island evolutionary approach to overcome these limitations, significantly improving exploration efficiency and bug discovery in compilers. This advancement makes LLM-powered fuzzing more robust and effective for complex software systems.

Original Abstract

Modern fuzzers increasingly use Large Language Models (LLMs) to generate structured inputs, but LLM-driven fuzzing is sensitive to prompt initialization and sampling variance, which can reduce exploration efficiency and lead to redundant inputs. We present FunFuzz, a multi-island evolutionary fuzzing framework that runs several isolated searches in parallel and periodically migrates high-value candidates to maintain diversity. FunFuzz derives initial generation prompts from documentation and initializes islands with topic-specific instructions, then continuously adapts prompts using feedback-guided selection. During fuzzing, candidates are prioritized by incremental compiler coverage, while compiler-internal failure signals are used to identify crash-inducing inputs. We evaluate FunFuzz on compiler fuzzing, where inputs are source programs and success is measured by compiler coverage and unique compiler-internal failures. Across repeated 24-hour campaigns on GCC and Clang, FunFuzz achieves higher compiler coverage than previous LLM-driven baselines and discovers more unique failure-triggering inputs.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers