CNN-ViT Fusion with Adaptive Attention Gate for Brain Tumor MRI Classification: A Hybrid Deep Learning Model

April 25, 20262604.23137

Syed Ibad Hasnain, Muhammad Faris, Hafiza Syeda Yusra Tirmizi, Rabail Khowaja, Hafsa Israr

cs.CVcs.AIq-bio.QM

TLDR

A hybrid CNN-ViT model with an Adaptive Attention Gate significantly improves brain tumor MRI classification accuracy by dynamically merging local and global features.

Key contributions

Proposes a hybrid CNN-ViT architecture for brain tumor MRI classification.
Introduces an Adaptive Attention Gate to dynamically merge local (CNN) and global (ViT) features.
Achieves 97.60% accuracy and 0.9946 AUC on brain tumor MRI dataset, outperforming baselines.
Demonstrates improved medical image classification through context-sensitive feature weighting.

Why it matters

This paper introduces a novel hybrid deep learning model that effectively combines the strengths of CNNs and ViTs for brain tumor classification. Its Adaptive Attention Gate dynamically weighs local and global features, leading to significantly improved accuracy. This approach offers a promising direction for more precise and reliable medical image diagnosis.

Original Abstract

Early detection and classifying brain tumors using Magnetic Resonance Imaging (MRI) images is highly important but difficult to extract in medical images. Convolutional Neural Networks (CNNs) are good at capturing both local texture and spatial information whereas Vision Transformers (ViTs) are good at capturing long-range global dependencies. We propose a new hybrid architecture that combines a SqueezeNet-style CNN branch with a MobileViT-style global transformer branch, through an Adaptive Attention Gate mechanism, in this paper. The gate learns dynamically per-sample, per-feature weights to weight the contribution of each branch, allowing context-sensitive merging of local and global representations. The proposed model has a test accuracy of 97.60, a precision of 97.30, a recall of 97.50, an F1-score of 97.40, and a macro-average area under the curve (AUC) of 0.9946 with a trained and evaluated on the Brain Tumor MRI Dataset (Kaggle). These scores are higher than single CNN and ViT baselines, and current competitive fusion methods, showing that dynamic feature weighting is an effective way to classify medical images.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers