Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

May 4, 20262605.02849

Amirhosein Javadi, Shirin Saeedi Bidokhti, Tara Javidi

cs.CV

TLDR

ActDiff-VC is a diffusion-based framework for ultra-low-bitrate video compression, using sparse signals for perceptual reconstruction.

Key contributions

Introduces ActDiff-VC, a diffusion-based video compression framework for ultra-low bitrates.
Uses variable-length segments, transmitting keyframes only when necessary.
Summarizes temporal dynamics via compact tracked point trajectories.
Employs content-adaptive keyframe and budget-aware trajectory selection.

Why it matters

This paper introduces a novel diffusion-based approach for ultra-low-bitrate video compression. It achieves significant bitrate reduction (up to 64.6%) while maintaining high perceptual quality, outperforming strong learned codecs. This advances efficient video streaming and storage.

Original Abstract

Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Our method partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints. To support this design, we introduce two mechanisms: content-adaptive keyframe selection and budget-aware sparse trajectory selection, which together enable compact yet effective conditioning for generative reconstruction. Experiments on the UVG and MCL-JCV benchmarks show that ActDiff-VC achieves up to 64.6\% bitrate reduction at matched NIQE, improves KID by up to 64.6\% and FID by up to 37.7\% at comparable bitrates against strong learned codecs, and delivers favorable perceptual rate--distortion trade-offs relative to learned and diffusion-based baselines in the ultra-low-bitrate regime.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers