ArXiv TLDR

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

🐦 Tweet
2604.24391

Zihao Zheng, Xingyue Zhou, Zhihao Mao, Songyu Sun, Lingyue Zhang + 5 more

cs.RO

TLDR

FreqCache accelerates Vision-Language-Navigation (VLN) models by using adaptive frequency-guided token caching, achieving 1.59x speedup.

Key contributions

  • Identifies limitations of visual-domain token caching for VLN, including viewpoint migration and temporal variation.
  • Proposes FreqCache, a novel frequency-guided framework for optimal token cache management in VLN.
  • Utilizes frequency domain properties for adaptive cache establishment, refreshment, and adjustment.
  • Achieves 1.59x speedup in VLN models with ignorable overhead, demonstrating practical efficiency.

Why it matters

VLN models are powerful but computationally intensive. FreqCache significantly speeds them up (1.59x) without accuracy loss, making them more practical. It pioneers frequency domain analysis for token caching, opening new optimization avenues in embodied AI.

Original Abstract

Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost by reusing token computation results; however, existing token caching approaches rely on visual domain methods for cacheable token selection, leading to challenges when adapted to VLN models. 1) Visual domain methods become invalid when there is viewpoint migration. 2) Visual domain methods neglect critical edge information without the aid of additional algorithms. 3) Visual domain methods overlook the temporal variation of scenarios and lack adjustability in cache budgets. In this paper, we develop detailed analyses and find that the impacts of these challenges exhibit invariance and analyzability in the frequency domain. Based on these, we propose a frequency-guided token caching framework, called FreqCache. Utilizing the inherent properties of the frequency domain, FreqCache achieves optimal token cache establishment, refreshment, and adaptive adjustment. Experiments show that FreqCache achieves 1.59x speedup with ignorable overhead, showing the effect of integrating frequency domain methods in VLN token caching.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.