Make Your LVLM KV Cache More Lightweight

May 1, 20262605.00789

Xihao Chen, Yangyang Guo, Roger Zimmermann

cs.CVcs.AIcs.LG

TLDR

LightKV significantly reduces the KV cache size and computation for Large Vision-Language Models by compressing redundant vision tokens.

Key contributions

LightKV reduces LVLM KV cache size by exploiting vision token redundancy.
Employs prompt-aware cross-modality message passing for progressive compression.
Halves vision-token KV cache size and cuts computation by up to 40%.
Achieves performance preservation using only 55% of original vision tokens.

Why it matters

Large Vision-Language Models (LVLMs) suffer from high GPU memory overhead due to KV cache. This paper introduces LightKV, a method that significantly reduces this overhead and computation, making LVLMs more efficient. It enables better deployment and scaling of powerful LVLMs.

Original Abstract

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers