ArXiv TLDR

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

🐦 Tweet
2605.08073

Wei Yu, Yunhang Qian

cs.CVcs.AI

TLDR

EmambaIR introduces an efficient visual State Space Model for event-guided image reconstruction, outperforming SOTA with reduced costs.

Key contributions

  • Introduces EmambaIR, an efficient visual State Space Model for event-guided image reconstruction.
  • Proposes Cross-modal Top-k Sparse Attention Module (TSAM) for efficient cross-modal feature fusion.
  • Develops Gated State-Space Module (GSSM) to enhance temporal representation and capture global context.
  • Achieves SOTA performance across motion deblurring, deraining, and HDR enhancement tasks.

Why it matters

CNNs and ViTs have limitations in event-based image reconstruction, struggling with global correlations or high computational costs. EmambaIR addresses these by combining sparse attention and gated state-space models. This leads to superior performance and efficiency across various reconstruction tasks.

Original Abstract

Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams. Our framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM). TSAM efficiently performs pixel-level top-k sparse attention to guide cross-modal interactions, yielding rich yet sparse fusion features. Subsequently, GSSM utilizes a nonlinear gated unit to enhance the temporal representation of vanilla linear-complexity ($O(n)$) SSMs, effectively capturing global contextual dependencies without the typical computational overhead. Extensive experiments on six datasets across three diverse image reconstruction tasks - motion deblurring, deraining, and High Dynamic Range (HDR) enhancement - demonstrate that EmambaIR significantly outperforms state-of-the-art methods while offering substantial reductions in memory consumption and computational cost. The source code and data are publicly available at: https://github.com/YunhangWickert/EmambaIR

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.