Res2Net: A New Multi-scale Backbone Architecture
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang + 1 more
TLDR
Res2Net introduces a novel CNN building block that enhances multi-scale feature representation within a single residual block, improving performance across various vision tasks.
Key contributions
- Proposes Res2Net block with hierarchical residual-like connections to capture multi-scale features granularly within one residual block.
- Demonstrates consistent performance improvements when integrated into popular backbones like ResNet, ResNeXt, and DLA on datasets such as CIFAR-100 and ImageNet.
- Validates superiority through extensive experiments on tasks including object detection, class activation mapping, and salient object detection.
Why it matters
This paper matters because it advances the way convolutional neural networks handle multi-scale feature representation by embedding multi-scale processing inside a single residual block rather than across layers. This granular multi-scale approach broadens the receptive field and enhances feature richness, leading to improved accuracy and robustness in various computer vision applications, making it a valuable architectural innovation for deep learning practitioners.
Original Abstract
Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.