ArXiv TLDR

Babbling Suppression: Making LLMs Greener One Token at a Time

🐦 Tweet
2604.06755

Lola Solovyeva, Fernando Castor

cs.SE

TLDR

Babbling Suppression reduces LLM energy consumption and extraneous output in code generation by terminating generation once tests pass, making AI greener.

Key contributions

  • Introduces Babbling Suppression (BS) to reduce extraneous LLM output in code generation.
  • BS integrates test execution, terminating generation once code passes tests, saving tokens.
  • Reduces energy consumption by up to 65% (Python) and 62% (Java) across various LLMs.
  • Significantly decreases generated token count with minimal GPU overhead, yielding net energy savings.

Why it matters

LLMs often "babble" in code generation, wasting energy and time. Babbling Suppression integrates test execution to halt generation once tests pass, cutting energy and token output, making AI programming more efficient and sustainable.

Original Abstract

Context: Large Language Models (LLMs) are increasingly used in modern software development, aiding in code generation, code completion, and refactoring through AI-powered assistants. While they accelerate development workflows, they often produce extraneous output, referred to as "babbling", which incurs additional cognitive, economic, and energy costs. Objective: This work investigates the problem of babbling in LLM-based code generation and proposes a practical, model-agnostic approach to reduce unnecessary output without compromising solution accuracy. Method: We introduce Babbling Suppression (BS), a method that integrates test execution into the LLM generation process by evaluating intermediate outputs and terminating generation once a solution passes all tests. This prevents excessive token generation while having no impact on model accuracy. An empirical study was conducted across two Python and two Java benchmarks, targeting four 3-4B parameter models and six 6-7B parameter models. Results: Our findings show that babbling occurs across all tested models, with higher frequency in Java than in Python. Applying BS significantly reduces energy consumption by up to 65% for Python and 62% for Java in models prone to babbling. Across 40 model-benchmark pairs, 29 showed reduced mean energy consumption, with reductions exceeding 20% in 22 cases. Generated token count decreased in 35 pairs, while the GPU energy-per-token overhead of BS remained below 10% for 26 pairs, decreased for 2, and reached a maximum of 24%, yielding net energy savings in most cases. Implications: BS can make AI-assisted programming more efficient and sustainable by reducing energy consumption and minimizing cognitive effort by developers. Its model-agnostic design allows easy integration, suggesting broad applicability.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.