What Matters in Practical Learned Image Compression

May 6, 20262605.05148

Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang + 3 more

cs.CVcs.AIcs.LG

TLDR

This paper introduces a new learned image codec that achieves superior perceptual quality and speed, outperforming both traditional and learned alternatives.

Key contributions

Comprehensive study of modeling choices for practical, perceptual learned image codecs.
Novel performance-aware neural architecture search optimizes for runtime and perceptual quality.
New codec offers 2.3-3x bitrate savings over AV1/VVC and 20-40% over best learned codecs.
Achieves fast on-device performance: 12MP encode in 230ms, decode in 150ms on iPhone 17 Pro Max.

Why it matters

This paper addresses the critical need for practical, perceptually-optimized learned image codecs. It delivers a new codec that significantly advances the state-of-the-art in compression efficiency and real-world speed, making high-quality image compression viable on mobile devices.

Original Abstract

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics. We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers