VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence
Guney Tombak, Ertunc Erdil, Ender Konukoglu
TLDR
VoxCor provides training-free volumetric features from frozen 2D ViTs for robust multimodal 3D medical image voxel correspondence.
Key contributions
- Introduces VoxCor, a training-free fit-transform method for reusable volumetric features.
- Combines triplanar ViT inference with WPLS projection to select modality-stable anatomical directions.
- Enables direct voxel correspondence querying via nearest-neighbor search without fine-tuning or registration.
- Improves cross-subject/cross-modality transfer and yields competitive registration performance.
Why it matters
This paper matters because VoxCor offers a novel, training-free approach to generate reusable, anatomically consistent volumetric features for multimodal 3D medical images. It addresses limitations of prior methods by enabling robust cross-modal, cross-subject transfer and providing a foundational layer for diverse downstream analyses beyond pairwise registration.
Original Abstract
Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.