A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models
Max Defez, Filippo Quarenghi, Mathieu Vrac, Stephan Mandt, Tom Beucler
TLDR
A scale-adaptive framework uses diffusion models for joint spatiotemporal super-resolution, reusing one architecture across various spatial and temporal factors.
Key contributions
- Decomposes spatiotemporal SR into a deterministic prediction and a residual conditional diffusion model.
- Achieves scale adaptivity by retuning three factor-dependent hyperparameters, including diffusion noise and context length.
- Incorporates an optional mass-conservation transform to preserve aggregated totals in outputs.
- Demonstrates a single architecture spanning wide super-resolution factors (1-25 space, 1-6 time).
Why it matters
Existing video super-resolution models are often limited to single spatiotemporal factors, restricting their applicability. This paper introduces a novel scale-adaptive framework that reuses a single architecture across diverse super-resolution scales. This significantly improves transferability and simplifies deployment for climate applications.
Original Abstract
Deep-learning video super-resolution has progressed rapidly, but climate applications typically super-resolve (increase resolution) either space or time, and joint spatiotemporal models are often designed for a single pair of super-resolution (SR) factors (upscaling spatial and temporal ratio between the low-resolution sequence and the high-resolution sequence), limiting transfer across spatial resolutions and temporal cadences (frame rates). We present a scale-adaptive framework that reuses the same architecture across factors by decomposing spatiotemporal SR into a deterministic prediction of the conditional mean, with attention, and a residual conditional diffusion model, with an optional mass-conservation (same precipitation amount in inputs and outputs) transform to preserve aggregated totals. Assuming that larger SR factors primarily increase underdetermination (hence required context and residual uncertainty) rather than changing the conditional-mean structure, scale adaptivity is achieved by retuning three factor-dependent hyperparameters before retraining: the diffusion noise schedule amplitude beta (larger for larger factors to increase diversity), the temporal context length L (set to maintain comparable attention horizons across cadences) and optionally a third, the mass-conservation function f (tapered to limit the amplification of extremes for large factors). Demonstrated on reanalysis precipitation over France (Comephore), the same architecture spans super-resolution factors from 1 to 25 in space and 1 to 6 in time, yielding a reusable architecture and tuning recipe for joint spatiotemporal super-resolution across scales.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.