ArXiv TLDR

Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

🐦 Tweet
2605.02849

Amirhosein Javadi, Shirin Saeedi Bidokhti, Tara Javidi

cs.CV

TLDR

ActDiff-VC is a diffusion-based framework for ultra-low-bitrate video compression, using sparse signals for perceptual reconstruction.

Key contributions

  • Introduces ActDiff-VC, a diffusion-based video compression framework for ultra-low bitrates.
  • Uses variable-length segments, transmitting keyframes only when necessary.
  • Summarizes temporal dynamics via compact tracked point trajectories.
  • Employs content-adaptive keyframe and budget-aware trajectory selection.

Why it matters

This paper introduces a novel diffusion-based approach for ultra-low-bitrate video compression. It achieves significant bitrate reduction (up to 64.6%) while maintaining high perceptual quality, outperforming strong learned codecs. This advances efficient video streaming and storage.

Original Abstract

Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Our method partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints. To support this design, we introduce two mechanisms: content-adaptive keyframe selection and budget-aware sparse trajectory selection, which together enable compact yet effective conditioning for generative reconstruction. Experiments on the UVG and MCL-JCV benchmarks show that ActDiff-VC achieves up to 64.6\% bitrate reduction at matched NIQE, improves KID by up to 64.6\% and FID by up to 37.7\% at comparable bitrates against strong learned codecs, and delivers favorable perceptual rate--distortion trade-offs relative to learned and diffusion-based baselines in the ultra-low-bitrate regime.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.