A Universal Textual Merge Strategy Based on Tokens for Version Control Systems
TLDR
Summer is a universal token-based merge algorithm for version control systems that reduces conflicts and improves merge accuracy across various document formats.
Key contributions
- Introduces Summer, a novel token-based merge algorithm for version control systems.
- Operates independently of document formats and programming languages.
- Uses token-level string-rewriting and move rules to construct merges.
- Achieved 36% verbatim accuracy on ConflictBench, outperforming five other tools.
Why it matters
Traditional line-based merges often create spurious conflicts, while syntax-aware methods have limitations. Summer offers a universal, format-independent solution, significantly improving merge accuracy. This reduces developer effort and enhances code quality in version control systems.
Original Abstract
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.