ArXiv TLDR

CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

🐦 Tweet
2605.10890

Tommy Ho, Khashayar Etemadi, Zhendong Su

cs.SE

TLDR

CppPerf provides an automated pipeline and dataset of 347 real-world C++ performance-improving commits to benchmark and advance performance bug repair.

Key contributions

  • CppPerf-Mine: An automated pipeline to mine performance-improving C++ commits from GitHub using LLMs and containerized testing.
  • CppPerf-DB: A dataset of 347 manually verified C++ performance patches from 42 repos, supporting repository-level tool evaluation.
  • Provides reproducible Docker images for each performance-improving C++ patch found.
  • Shows existing tools fix only 13.5% of CppPerf-DB patches, highlighting the challenge in C++ performance repair.

Why it matters

This paper fills a critical gap by providing a pipeline and dataset of real-world C++ performance-improving commits. It enables more realistic benchmarking and development of automated repair tools, highlighting that C++ performance repair remains a significant open challenge.

Original Abstract

Recent progress in automated repair of performance bugs demands realistic, executable benchmarks. However, existing C++ performance benchmarks are largely built from competitive programming submissions, and recent real-world benchmarks predominantly target Python and .NET. To fill this gap, we present CppPerf-Mine, a configurable pipeline that mines execution-time-improving patches from open-source C++ repositories on GitHub by combining structural commit filtering, an LLM-based commit classifier, and a containerized build & test stage that produces fully reproducible Docker images for each patch. Using CppPerf-Mine, we build CppPerf-DB, a benchmark comprising 347 manually verified patches from 42 mature C++ repositories, 39% of which are multi-file, enabling the evaluation of repository-level repair tools. In our preliminary study, OpenHands correctly fixes only 13.5% of the patches in CppPerf-DB, confirming that real-world C++ performance repair remains an open challenge. CppPerf-Mine and CppPerf-DB are open-source and publicly available at: https://doi.org/10.5281/zenodo.20097425. In addition, a demonstration video is available at: https://www.youtube.com/watch?v=nixlupIgSdM.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.