Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
P Akilesh, Leuson Da Silva, Foutse Khomh, Sridhar Chimalakonda
TLDR
This paper uses reinforcement learning with dynamic fuzzing to significantly reduce false positives in static memory safety analysis of Rust programs.
Key contributions
- Developed an RL agent that learns to suppress false warnings in Rust static analysis using MIR features.
- Incorporated dynamic validation via cargo-fuzz to provide auxiliary feedback and improve RL agent decisions.
- Achieved 65.2% accuracy and 0.659 F1 score, outperforming LLM baselines by 17.1%.
- Boosted precision from 25.6% to 59.0% and maintained 74.6% recall for true bugs in Rudra.
Why it matters
High false positive rates in Rust static analysis tools diminish developer trust and increase manual review. This novel RL-based hybrid approach significantly improves the practicality and usability of these tools by substantially reducing false positives. This makes memory safety verification more reliable for safety-critical Rust applications.
Original Abstract
Static analysis tools are essential for ensuring memory safety in Rust programs, particularly as Rust gains adoption in safety-critical domains. However, existing tools such as Rudra and MirChecker suffer from high false positive rates, which diminish developer trust, increase manual review effort, and may obscure genuine vulnerabilities. This paper presents a novel reinforcement learning (RL)-based approach for automatically classifying and suppressing spurious warnings in static memory safety analysis for Rust. To achieve this, we design an RL agent that learns a warning suppression policy by extracting contextual features from Rust's Mid-level Intermediate Representation (MIR) and optimizing its decisions through interaction with static analysis outputs. To improve decision quality, we integrate dynamic validation via cargo-fuzz as an auxiliary feedback mechanism, allowing the agent to selectively validate suspicious warnings through targeted fuzz testing. Our evaluation shows that the proposed approach significantly outperforms state-of-the-art LLM-based baselines, achieving 65.2% accuracy and an F1 score of 0.659, an improvement of 17.1% over the best LLM baseline. With a recall of 74.6%, our method successfully identifies nearly three-quarters of true bugs while substantially reducing false positives, improving precision from 25.6% in raw Rudra output to 59.0%. Incorporating dynamic fuzzing further boosts performance, yielding additional improvements of 10.7 percentage points in accuracy and 8.6 percentage points in F1 score over the RL-only variant. Overall, our work demonstrates that combining reinforcement learning with hybrid static-dynamic analysis can substantially reduce false positives and improve the practical usability of memory safety verification tools for Rust.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.