Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models
Srinivas Soumitri Miriyala, Sowmya Vajrala, Sravanth Kodavanti, Vikram Nelvoy Rajendiran, Sharan Kumar Allur
TLDR
This paper introduces a hybrid framework for edge-efficient image restoration, distilling transformers into state-space models for faster inference.
Key contributions
- Proposes a hybrid framework combining transformers and state-space models for image restoration.
- Distills transformer features into lightweight SSM blocks for improved edge efficiency.
- Introduces Efficient Network Search (ENS) to discover optimal hybrid architectures.
- Achieves up to 3.4x faster inference on edge CPUs with competitive restoration quality.
Why it matters
Transformers excel in image restoration but are slow on edge devices. This paper tackles this by creating efficient hybrid models that combine transformers and state-space models. It enables high-quality image restoration on mobile hardware, crucial for real-time applications.
Original Abstract
We propose a modular framework for hybrid image restoration that integrates transformer and state-space model (SSM) blocks with a focus on improving runtime efficiency on edge hardware. While transformers provide strong global modeling through self-attention, their attention kernels incur substantial latency on mobile devices, especially for high-resolution inputs. In contrast, SSMs such as Mamba offer lineartime sequence modeling with lower runtime overhead but may underperform on fine grained restoration tasks. To balance accuracy and efficiency, we train lightweight SSM blocks as feature-distilled surrogates of transformer blocks and use them to construct hybrid U-Net-style architectures. To automatically discover effective block combinations, we introduce Efficient Network Search (ENS), a multi-objective search strategy that selects task-specific hybrid configurations from pre-aligned components. ENS optimizes restoration quality while penalizing transformer usage, serving as a lightweight proxy for latency and enabling architecture discovery without repeated hardware profiling. On a Snapdragon 8 Elite CPU, the Restormer baseline requires 10119.52 ms for inference. In contrast, ENS-discovered hybrids significantly reduce runtime: ENS-Deblurring runs in 2973 ms (3.4x faster), ENS-Deraining in 5816 ms (1.74x faster), and ENS-Denoising in 8666 ms (1.17x faster), while maintaining competitive restoration quality.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.