Roger Zimmermann

2 papers · Latest: May 5, 2026

Audio-Visual Intelligence in Large Foundation Models

This survey provides the first comprehensive review of Audio-Visual Intelligence (AVI) in large foundation models, unifying tasks, methods, and challenges.

2605.04045May 5, 2026

Computer Vision

Make Your LVLM KV Cache More Lightweight

LightKV significantly reduces the KV cache size and computation for Large Vision-Language Models by compressing redundant vision tokens.

2605.00789May 1, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.