Zero-shot Evaluation of Deep Learning for Java Code Clone Detection
TLDR
This paper finds that deep learning models for Java code clone detection show limited generalizability in zero-shot evaluations, often beaten by conventional tools.
Key contributions
- Evaluated five state-of-the-art DL-based Java code clone detectors in a zero-shot setting.
- Found deep learning models exhibit limited generalizability to unseen code and functionalities.
- Demonstrated that a conventional tool, NiCad, outperformed DL models in zero-shot scenarios.
Why it matters
This paper challenges the perceived superiority of deep learning in code clone detection, revealing significant generalization issues in zero-shot scenarios. It highlights the critical need for DL models to improve their ability to handle unseen code effectively.
Original Abstract
Deep Learning (DL) is becoming more and more widespread in clone detection, motivated by achieving near-perfect performance for this task. In particular in case of semantic code clones, which share only limited syntax but implement the same or similar functionality, Deep Learning appears to outperform conventional tools. In this paper, we want to investigate the generalizability of DL-based clone detectors for Java. We therefore replicate and evaluate the performance of five state-of-the-art DL-based clone detectors, including Transformers like CodeBERT and single-task models like FA-AST+GMN, in a zero-shot evaluation scenario, where we train/fine-tune and evaluate on different datasets and functionalities. Our experiments demonstrate that the models' generalizability to unseen code is limited. Further analysis reveals that the conventional clone detector NiCad even outperforms the DL-based clone detectors in such a zero-shot evaluation scenario.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.