UniCorrn: Unified Correspondence Transformer Across 2D and 3D
Prajnan Goswami, Tianye Ding, Feng Liu, Huaizu Jiang
TLDR
UniCorrn unifies 2D-2D, 2D-3D, and 3D-3D geometric matching using a single Transformer model with shared weights, outperforming SOTA.
Key contributions
- First unified Transformer model for 2D-2D, 2D-3D, and 3D-3D geometric correspondence.
- Proposes a dual-stream decoder to maintain separate appearance and positional feature streams.
- Employs modality-specific backbones with shared encoder and decoder components.
- Achieves SOTA on 2D-3D (7Scenes +8%) and 3D-3D (3DLoMatch +10%) registration recall.
Why it matters
Current methods use separate models for different 2D/3D correspondence tasks. UniCorrn offers a unified solution, simplifying development and improving efficiency. Its strong performance across modalities advances the foundation for numerous 3D vision tasks.
Original Abstract
Visual correspondence across image-to-image (2D-2D), image-to-point cloud (2D-3D), and point cloud-to-point cloud (3D-3D) geometric matching forms the foundation for numerous 3D vision tasks. Despite sharing a similar problem structure, current methods use task-specific designs with separate models for each modality combination. We present UniCorrn, the first correspondence model with shared weights that unifies geometric matching across all three tasks. Our key insight is that Transformer attention naturally captures cross-modal feature similarity. We propose a dual-stream decoder that maintains separate appearance and positional feature streams. This design enables end-to-end learning through stack-able layers while supporting flexible query-based correspondence estimation across heterogeneous modalities. Our architecture employs modality-specific backbones followed by shared encoder and decoder components, trained jointly on diverse data combining pseudo point clouds from depth maps with real 3D correspondence annotations. UniCorrn achieves competitive performance on 2D-2D matching and surpasses prior state-of-the-art by 8% on 7Scenes (2D-3D) and 10% on 3DLoMatch (3D-3D) in registration recall. Project website: https://neu-vi.github.io/UniCorrn
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.