ArXiv TLDR

Compact Constraint Encoding for LLM Code Generation: An Empirical Study of Token Economics and Constraint Compliance

🐦 Tweet
2604.07192

Hanzhang Tang

cs.SE

TLDR

Compact constraint encoding significantly reduces prompt tokens for LLM code generation without impacting constraint compliance.

Key contributions

  • Compact constraint headers reduce prompt tokens by 25-30% and constraint tokens by ~71%.
  • No significant difference in constraint compliance rate found across various encoding forms or propagation modes.
  • Constraint type (conventional vs. counter-intuitive) and task domain are major sources of compliance variance.
  • Model self-assessments systematically overestimate compliance, highlighting a gap in execution.

Why it matters

Compact constraint encoding saves LLM code generation tokens but doesn't boost compliance. Engineering effort should prioritize robust constraint design, especially for non-default rules. Prompt formatting is less impactful for LLM adherence.

Original Abstract

LLMs used for code generation are typically guided by engineering constraints--technology choices, dependency restrictions, and architectural patterns--expressed in verbose natural language. We investigate whether compact, structured constraint headers can reduce prompt token consumption without degrading constraint compliance. Across six experimental rounds spanning 11 models, 16 benchmark tasks, and over 830 LLM invocations, we find that compact headers reduce constraint-portion tokens by approximately 71% and full-prompt tokens by 25--30%, replicated across three independent rounds. However, we detect no statistically significant differences in constraint satisfaction rate (CSR) across three encoding forms or four propagation modes; observed effect sizes are negligible (Cliff's $δ$ < 0.01, 95% CI spanning $\pm$2.6 percentage points). This null pattern holds across two models from different capability tiers. A supplementary experiment with four non-CSS tasks provides additional cross-domain support for the encoding null result. The largest observed sources of compliance variance are constraint type ($Δ$ = 9 percentage points between normal and counter-intuitive constraints) and task domain: counter-intuitive constraints opposing model defaults fail at 10--100%, while conventional constraints achieve 99%+ compliance regardless of encoding. Model self-assessments systematically overestimate compliance relative to rule-based scoring, revealing a gap between constraint understanding and execution. Under the tested conditions, the primary benefit of compact constraint encoding is token reduction rather than compliance improvement, and engineering effort toward compliance is better directed at constraint design than prompt formatting.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.