ArXiv TLDR

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

🐦 Tweet
2605.04894

Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan

cs.SE

TLDR

SynConfRoute uses syntax-aware routing and token confidence to efficiently combine small local CodeLLMs with larger models for high-quality, private code completion.

Key contributions

  • Small CodeLLMs (3B) can match 32B models, showing specialized training matters more than size.
  • SynConfRoute: a training-free method using token confidence and syntax validation for routing.
  • Improves pass@1 by 6.4% (routine) to 31% (harder tasks), outperforming large models alone.
  • Reduces accelerator usage by 58% and generalizes across Python, Java, and C++.

Why it matters

This paper offers a practical solution for enterprises seeking high-quality, private code completion without the high cost of large models. SynConfRoute enables efficient, secure, and high-performance code completion by intelligently combining small local LLMs with larger self-hosted ones. It's immediately deployable.

Original Abstract

Enterprises want AI code completion that is both high-quality and private, but they face a tension: proprietary models yield better results yet risk exposing proprietary code, while self-hosting large models is expensive and hard to maintain. As a lighter alternative, small CodeLLMs (1B-3B) can run on a developer's workstation accelerator with code never leaving the machine, but they fail on harder tasks. A practical solution is to use the small model for most requests and selectively route difficult ones to a larger self-hosted model. In this study, we evaluate 29 code specialized LLMs (0.5B-480B) from 12 families on execution-based fill-in-the-middle (FIM) code completion benchmarks across Python, Java, and C++, and find that model family and code specialized training matter more than size: a 3B model matches a 32B model despite being 10x smaller. Analyzing the 3B model's failures, we discover that 46% of its incorrect completions are not valid code. To enable efficient code completion, we propose SynConfRoute, a training-free method that combines token confidence with syntax validation to automatically decide per-request whether to keep the local completion or escalate to a larger self-hosted model. SynConfRoute improves pass@1 by 6.4% over confidence only routing on routine completions and by up to 31% on harder multi-language tasks, and the resulting pipeline achieves 78.9% on routine completions, 7.4% higher than always using the 480B model alone, while reducing accelerator usage by 58%. SynConfRoute generalizes across Python, Java, and C++, improving over confidence only routing on all three languages without ever rejecting a correct local completion. The pipeline uses off-the-shelf models with no custom training, making it immediately deployable in practice.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.