Res2Net: A New Multi-scale Backbone Architecture

April 2, 20191904.01169

Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang + 1 more

cs.CV

TLDR

Res2Net introduces a novel CNN building block that enhances multi-scale feature representation within a single residual block, improving performance across various vision tasks.

Key contributions

Proposes Res2Net block with hierarchical residual-like connections to capture multi-scale features granularly within one residual block.
Demonstrates consistent performance improvements when integrated into popular backbones like ResNet, ResNeXt, and DLA on datasets such as CIFAR-100 and ImageNet.
Validates superiority through extensive experiments on tasks including object detection, class activation mapping, and salient object detection.

Why it matters

This paper matters because it advances the way convolutional neural networks handle multi-scale feature representation by embedding multi-scale processing inside a single residual block rather than across layers. This granular multi-scale approach broadens the receptive field and enhances feature richness, leading to improved accuracy and robustness in various computer vision applications, making it a valuable architectural innovation for deep learning practitioners.

Original Abstract

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers