CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits
Tommy Ho, Khashayar Etemadi, Zhendong Su
TLDR
CppPerf provides an automated pipeline and dataset of 347 real-world C++ performance-improving commits to benchmark and advance performance bug repair.
Key contributions
- CppPerf-Mine: An automated pipeline to mine performance-improving C++ commits from GitHub using LLMs and containerized testing.
- CppPerf-DB: A dataset of 347 manually verified C++ performance patches from 42 repos, supporting repository-level tool evaluation.
- Provides reproducible Docker images for each performance-improving C++ patch found.
- Shows existing tools fix only 13.5% of CppPerf-DB patches, highlighting the challenge in C++ performance repair.
Why it matters
This paper fills a critical gap by providing a pipeline and dataset of real-world C++ performance-improving commits. It enables more realistic benchmarking and development of automated repair tools, highlighting that C++ performance repair remains a significant open challenge.
Original Abstract
Recent progress in automated repair of performance bugs demands realistic, executable benchmarks. However, existing C++ performance benchmarks are largely built from competitive programming submissions, and recent real-world benchmarks predominantly target Python and .NET. To fill this gap, we present CppPerf-Mine, a configurable pipeline that mines execution-time-improving patches from open-source C++ repositories on GitHub by combining structural commit filtering, an LLM-based commit classifier, and a containerized build & test stage that produces fully reproducible Docker images for each patch. Using CppPerf-Mine, we build CppPerf-DB, a benchmark comprising 347 manually verified patches from 42 mature C++ repositories, 39% of which are multi-file, enabling the evaluation of repository-level repair tools. In our preliminary study, OpenHands correctly fixes only 13.5% of the patches in CppPerf-DB, confirming that real-world C++ performance repair remains an open challenge. CppPerf-Mine and CppPerf-DB are open-source and publicly available at: https://doi.org/10.5281/zenodo.20097425. In addition, a demonstration video is available at: https://www.youtube.com/watch?v=nixlupIgSdM.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.