Investigating Code Reuse in Software Redesign: A Case Study
Xiaowen Zhang, Huaien Zhang, Shin Hwei Tan
TLDR
This paper investigates code reuse in software redesign, identifying challenges and proposing a clone detection approach to improve migration efficiency.
Key contributions
- Reveals non-linear migration, deferred reuse, neglected test porting, and bug propagation in redesigns.
- Identifies tracking corresponding code and tests as a key challenge in software redesign.
- Proposes Semantic Alignment Heuristics and hierarchical clone detection for code mapping.
- Achieves 33-99% reduction in irrelevant clones and up to 86% precision improvement in evaluations.
Why it matters
Manual code reuse in software redesign is costly and error-prone. This paper provides empirical insights into common challenges and offers a practical, evaluated solution to improve code and test migration. Its findings and tools can significantly streamline complex redesign efforts.
Original Abstract
Software redesign preserves functionality while improving quality attributes, but manual reuse of code and tests is costly and error-prone, especially in crossrepository redesigns. Focusing on static analyzers where cross-repo redesign needs often arise, we conduct a bidirectional study of the ongoing Soot/SootUp redesign case using an action research methodology that combines empirical investigation with validated open-source contributions. Our study reveals: (1) non-linear migration which necessitates bidirectional reuse, (2) deferred reuse via TODOs, (3) neglected test porting, and (4) residual bug propagation during migrations. We identify tracking corresponding code and tests as the key challenge, and address it by retrofitting clone detection to derive code mappings between original and redesigned projects. Guided by semantic reuse patterns derived in our study, we propose Semantic Alignment Heuristics and a scalable hierarchical detection strategy. Evaluations on two redesigned project pairs (Soot/SootUp and FindBugs/SpotBugs) show that our approach achieves an average reduction of 33-99% in likely irrelevant clones at a SAS threshold of 0.5 across all tool results, and improves precision up to 86% on our benchmark of 1,749 samples. Moreover, we contribute to the redesigned projects by submitting five issues and 10 pull requests, of which eight have been merged.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.