Roger Zimmermann
2 papers ยท Latest:
Computer Vision
Audio-Visual Intelligence in Large Foundation Models
This survey provides the first comprehensive review of Audio-Visual Intelligence (AVI) in large foundation models, unifying tasks, methods, and challenges.
2605.04045
Computer VisionMake Your LVLM KV Cache More Lightweight
LightKV significantly reduces the KV cache size and computation for Large Vision-Language Models by compressing redundant vision tokens.
2605.00789
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.